From oneiros at grace.nascom.nasa.gov  Fri Apr  8 14:48:47 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Fri, 8 Apr 2011 14:48:47 -0400 (EDT)
Subject: [Rdap] Google Exacycle grant program
Message-ID: <alpine.OSX.2.00.1104081443460.23558@moat.nascom.nasa.gov>


If you (or you know of researchers) who need some high CPU, but low I/O, 
for highly parallelized processing, Google's offering time on their 
cluster.

It looks like it's only open to academics, and you have to write the code 
in C or C++.  Deadline for proposals is May 31st.

More details at:

 	http://research.google.com/university/exacycle_program.html

-Joe


From oneiros at grace.nascom.nasa.gov  Fri Apr  8 17:24:48 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Fri, 8 Apr 2011 17:24:48 -0400 (EDT)
Subject: [Rdap] possible 'relaunch' of data.gov?
Message-ID: <alpine.OSX.2.00.1104081715340.23558@moat.nascom.nasa.gov>


I know it was mentioned during the RDAP meeting that data.gov was getting 
de-funded, but there's tenatively some workshops scheduled next week 
(assuming the furlough is over by Thursday), for comments by:

 	"Agency POCs, Open Government Working Group, Data Stewards and New
 	Media community in the Federal Government"

If you have a '.gov' email address, you can register at:

 	http://www.socrata.com/datagov/workshop/

(and then manage to get the Washington, DC on the morning of the 14th, if 
the furlough's been lifted).

But, if anyone else has comments to make, I'd be willing to relay them, as 
I've already registered.

(and I should say ... as it's a '.com' address, I have no idea if this is 
from someone who's actually contracted to work on data.gov, if this is 
going to be the equivalent to someone trying to sell me a timeshare, or 
what ... I'll try to report back when I know more)

-Joe


-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center


From tinajayroe at gmail.com  Sat Apr  9 14:18:45 2011
From: tinajayroe at gmail.com (Tina Jayroe)
Date: Sat, 9 Apr 2011 12:18:45 -0600
Subject: [Rdap] "Save the Data" action we can take
Message-ID: <BANLkTimDDGU1dy_4=t7rAjUEz=-ef-B8hQ@mail.gmail.com>

Hello Everyone,

This topic came up at RDAP and I saw it retweeted by @TomBaker. You may want
to use the link below to take some action. Here's the tweet as well as the
gist of the site:

*RT @**timberners_lee* <http://twitter.com/timberners_lee>*: Sign at **
http://sunlightfoundation.com/savethedata/*<http://sunlightfoundation.com/savethedata/>
* and retweet please** *


Save the Data!

Some of the most important technology programs that keep Washington
accountable are in danger of being eliminated. Data.gov, USASpending.gov,
the IT Dashboard and other federal data transparency and government
accountability programs are facing a massive budget cut, despite only being
a tiny fraction of the national budget. Help save the data and make sure
that Congress doesn?t leave the American people in the dark.
 Things you can do now!

   1.  Sign onto our Letter to Congress

   Sign on to our open letter to Congressional leaders asking them to
   protect funding for the Electronic Government Fund. You can read the full
   letter here.
   Sign Now! <http://sunlightfoundation.com/savethedata/letter/>
    2.  Call your representatives

   Making sure your Representative and Senators hear from you directly is
   one of the most powerful actions you can take. Call now and tell them to
   Save the Data by protecting the Electronic Government fund and the crucial
   transparency programs it supports.
   Call Now <http://publicequalsonline.com/call/>
    3.  Write a letter to the editor

   Writing a letter to the editor of your local paper is an easy way to
   raise awareness and a powerful way to make your voice heard. It?s easy, too
   ? we?ll help you each step of the way.
   Get Started on Your
Letter<http://local.publicequalsonline.com/page/speakout/savethedata>
    4.  Spread the word

   On Twitter or Facebook? Please share this far and wide. Are you a
   blogger? You can help by writing a blog post ? and share the link with us
   here so we can help promote it!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110409/39ec4e1d/attachment.html>

From mweaver at asis.org  Fri Apr 15 14:12:51 2011
From: mweaver at asis.org (Melissa Weaver)
Date: Fri, 15 Apr 2011 11:12:51 -0700
Subject: [Rdap] Call for Work-in-Progress Papers -- DC-SAM workshop on
 Research and Best Practices in Linking Scientific Metadata
Message-ID: <BANLkTik1pTP=7u9jGSOE6oV9aYiE0pXXeg@mail.gmail.com>

*Message to RDAP discussion list from Dr. Jian Qin, Syracuse University:*

Please feel free to distribute the call and excuse the multiple postings.

*
*

*A DC-SAM Workshop: Research and Best Practices in Linking Scientific
Metadata*

September 29, 2011, Berlin, Germany

To be held as part of the

Theory and Practice in Digital Libraries (TPDL)
Conference<http://www.tpdl2011.org/>
 (September 25-29, 2011, Berlin, Germany)

* *

*Call for Work-in-Progress Papers*

Describing scientific research data can be challenging due to their
complexity and diversity. Standards for describing scientific datasets
include not only entities responsible for data collection, processing, and
distribution, but also information for data users to assess the relevancy to
their data needs, quality of datasets, as well as technicalities regarding
data file manipulation. Although scientific metadata schemes address a range
of needs for data identification, quality assessment, verifiability, and
dissemination, they do not fully address the challenges related to metadata
generation and islands of information exist within and across scientific
metadata records. One step towards addressing these challenges and problems
is to have information scientists and domain scientists collaborate to
evolve existing solutions in web-friendly ways. This one-day workshop will
feature invited speakers from science and information science in the morning
sessions and selected work-in-progress reports and interactive discussion in
the afternoon.

The DC-SAM (DCMI Science and Metadata
Community)<http://dublincore.org/groups/sam/>
 workshop will include three parts: a morning  session consists of invited
speakers from both science and information science, a working lunch with
focus group discussion, and an afternoon session for work-in-progress
reports.  This call is soliciting submissions of work-in-progress reports
for the afternoon session. We are especially interested in, but not limited
to, the following topics:

?         -- Identification systems and standards for scientific metadata

?         -- Scientific metadata architecture and models in Semantic Web

?         -- Interoperable taxonomies and vocabularies in [biology,
astronomy, etc.]

?         -- Metadata linking mechanisms and technologies and their
applications in scientific metadata

?         -- Organizational and technical challenges in linking scientific
metadata

The outcomes of this workshop are expected to be a collection of research
papers/reports and a research agenda in this increasingly important area,
which will be made available on DC-SAM community website.

Researchers are invited to submit reports for their projects relevant to the
theme of this workshop, which are either work in progress or completed. The
work-in-progress papers should be no more than six (6) pages, single spaced,
(approximately 3,000 words including abstract and references). The
submission should include the following components:

?         -- Title of the paper

?         -- Author(s) and affiliations

?         -- Email addresses of authors

?         -- Project website

?         -- Abstract

?         -- Tags or keywords

?         -- Body of paper

?         -- References cited

The papers will be reviewed by the workshop program committee and selected
based on originality, coherence, clarity, and appropriateness for the
workshop. Each selected paper will be given a 15-minute slot for oral
presentation.

*Important dates:*

?         Deadline for submitting papers: June 12, 2011, midnight (your
local time)

?         Notification of acceptance: June 30, 2011

?         Final version of accepted papers: August 15, 2011

Please submit your paper to dc.sam.workshop at gmail.com. All inquiries and
questions can be sent to the same email address. A copy of this call can be
found on the workshop website http://eslib.ischool.syr.edu/SAM/. Information
about the workshop will be updated as it becomes available.


*Workshop co-organizers:*

Jian Qin <http://eslib.ischool.syr.edu/jqin/>, Syracuse University, USA

Jane Greenberg <http://www.ils.unc.edu/%7Ejaneg/>, University of North
Carolina at Chapel Hill, USA

Norman Gray <http://nxg.me.uk/>, University of Glasgow, UK


Jian Qin, Ph.D.

Associate Professor

School of Information Studies

Syracuse University

311 Hinds Hall

Syracuse, NY 13244

Phone: 315-443-5642

Fax: 315-443-5806

http://ischool.syr.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110415/cddf95c7/attachment.html>

From oneiros at grace.nascom.nasa.gov  Fri Apr 15 14:58:56 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Fri, 15 Apr 2011 14:58:56 -0400 (EDT)
Subject: [Rdap] Next Generation Data.gov
Message-ID: <alpine.OSX.2.00.1104151325390.49883@moat.nascom.nasa.gov>


Yesterday, I attended a workshop at the GSA on the 'next generation 
data.gov'.  After a survey in the breakout session, I realized I was the 
only non-manager who attended, and it was for the session on APIs that 
they said was the 'more technical' session.  (I admit, I wasn't actually 
invited directly, I was forwarded the invite by my ATR (the civil servant 
who directs my work ... pretty much my boss))

Anyway, the basic summary is this:

   *  data.gov is being upgraded.

   *  they're using a platform from Socrata : http://www.socrata.com/
      It's basically a shared dataset hosting platform, with some built in
      tools for interaction & visualization.

   *  people will be able to interact with the data, not just download
      files, provided that as part of the submission process, you actually
      describe the columns, etc.  You can also pre-define common filters,
      sorting, views, visualizations, etc.

   *  there's also a 'social' component (one of those things that we
      mentioned in the RDAP 'Future of Digital Libraries').  They said that
      the owner for each dataset could define what level of social
      interaction was allowed.  I'm not sure all of what was allowed (I
      think they mentioned commenting and defining views for other people
      to use), but they could be moderated or disallowed.

   *  you can run a local server to expose your data using their API, and
      then just register it with them, and it'll make calls to your server
      to get the data.

   *  application developers can register to get an ID to use the API.
      it'll automatically rate throttle any that are being too abusive, but
      it also allows for dataset owners to see who's using their data, or
      for end users to see what tools have been built to use the data.
      (there was also a request for data owners to be able to send a
      message to all of the developers using their data, so they could warn
      of possible upcoming changes)


Anyway ...

They showed off a lot of cool features that'd probably be useful for most 
tabular data.  They showed an import screen that had options for 
'dataset', 'chart', 'calendar', and there might've been a forth, but I got 
up too early, and can't remember.  I told 'em I had a few million images, 
and it didn't sound like they were really geared towards that ... maybe to 
serve the catalog of the data, but not the data itself)

I asked about putting NSF research data in there, and I got a kinda 
roundabout answer about how it'd have to be approved through the 'normal 
agency channels', and I don't know if NSF would want us mixing this type 
of research 'data' in with their other 'data'.

Due to the nature of what's being done, I don't think it'd qualify under 
TRAC, so you'd likely want a separate archival copy of the data, but I 
could be mistaken.

They offer a generic API for serving tabular data, (Socrata Open Data 
API) so it's possible that other people could implement it, even if you 
don't want to license their product, or you could write something to 
harvest the various data sources.  I haven't looked into the spec, so I 
don't know how hard it'd be to try to translate between something like 
IVOA TAP (http://www.ivoa.net/Documents/TAP/).

...

It looks like Socrata's updating their website right now, but I wasn't 
given any sort of an NDA to sign, and I'm assuming they'd need some load 
testing, etc, so once it's back up:

Beta of the new site.  (you have to register ... it let me in immediately, 
but I used a '.gov' address, and it's down right now, so I can't test with 
one of my other addresses):

 	http://datagov.socrata.com/

Documentation (for data submission, API usage, etc)

 	http://dev.socrata.com/

-Joe


---------- Forwarded message ----------
Date: Thu, 14 Apr 2011 21:05:22 -0500
From: "hyon.kim at gsa.gov" <hyon.kim at gsa.gov>
To: "marion.royal at gsa.gov" <marion.royal at gsa.gov>
Cc: "chris.metcalf at socrata.com" <chris.metcalf at socrata.com>,
     "charles at socrata.com" <charles at socrata.com>,
     "saf.rabah at socrata.com" <saf.rabah at socrata.com>
Subject: Next Generation Data.gov Platform - Link to Workshop Materials

Thank you for your interest in the Next Generation Data.gov Platform.? We have posted the agenda, presentations and the Getting
Started Guide at the following link:

http://www.socrata.com/datagov/workshop/presentations/

We will be following up with those of you who expressed interest in participating in the new platform.

We will keep you informed of our progress as we move toward the launch of the Next Generation Data.gov Platform.

Thank you.

Hyon Kim
Deputy Program Director
Data.gov
(202) 694 8148


From ahmorgan at vpr.rutgers.edu  Thu Apr 21 16:43:20 2011
From: ahmorgan at vpr.rutgers.edu (Aletia Morgan)
Date: Thu, 21 Apr 2011 16:43:20 -0400 (EDT)
Subject: [Rdap] Deciding when a Data Management Plan is not required?
Message-ID: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>

Greetings - I'm working with PIs on reviewing DMP documents that are part
of NSF proposals, and I am wondering about whether a DMP is needed.

 
The project involves staging a meeting with researchers in the discipline,
with a goal that includes the development of plans for future research. 

 
Obviously, there's no experimental data here, and the output is
essentially a narrative document.  The PI asserts that "This proposal does
not seek to collect environmental data. A data management plan is not

required."

 
If a conference is being organized, notes are taken, documents are being
written.   Is there a need to say how these records will be developed and
preserved?  Or am I being too compulsive!

 
Thanks for any thoughts, and if there might be a better place to ask this
question.

 
Regards,

Aletia

 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research 
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus

ahmorgan at vpr.rutgers.edu
732-445-3344


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110421/3530894e/attachment.html>

From cragin at illinois.edu  Thu Apr 21 22:10:20 2011
From: cragin at illinois.edu (Cragin, Melissa H)
Date: Thu, 21 Apr 2011 21:10:20 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <EB3A3240-746C-4C84-913B-3A5232C6C063@ad.uiuc.edu>


On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" <ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>> wrote:

Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
required.?

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


From cragin at illinois.edu  Thu Apr 21 22:10:20 2011
From: cragin at illinois.edu (Cragin, Melissa H)
Date: Thu, 21 Apr 2011 21:10:20 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <EB3A3240-746C-4C84-913B-3A5232C6C063@ad.uiuc.edu>


On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" <ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>> wrote:

Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
required.?

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


From cragin at illinois.edu  Thu Apr 21 22:21:17 2011
From: cragin at illinois.edu (Cragin, Melissa H)
Date: Thu, 21 Apr 2011 21:21:17 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <7EF7518C-10D2-4CB3-B408-92E7C08716FA@ad.uiuc.edu>

Dear Aletia,

  It is not likely that you need a DMP for a proposal to develop and convene a meeting. Is the proposal a response to a formal call?  If so, there is generally a clear set of requirements available on the web page for that NSF program, and you'll be able to see what's involved.

Best,
Melissa

On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" <ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>> wrote:

Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
required.?

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


From cragin at illinois.edu  Thu Apr 21 22:21:17 2011
From: cragin at illinois.edu (Cragin, Melissa H)
Date: Thu, 21 Apr 2011 21:21:17 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <7EF7518C-10D2-4CB3-B408-92E7C08716FA@ad.uiuc.edu>

Dear Aletia,

  It is not likely that you need a DMP for a proposal to develop and convene a meeting. Is the proposal a response to a formal call?  If so, there is generally a clear set of requirements available on the web page for that NSF program, and you'll be able to see what's involved.

Best,
Melissa

On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" <ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>> wrote:

Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
required.?

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


From jgraybeal at ucsd.edu  Fri Apr 22 00:55:51 2011
From: jgraybeal at ucsd.edu (John Graybeal)
Date: Thu, 21 Apr 2011 21:55:51 -0700
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>

Here's the way I think about this kind of thing:

1) You have done useful work by the time you finish (presumably).
2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs.
3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that?

If the basis is entirely on other papers, then those would presumably be cited in the report.  If it is strictly a thought process of a single group, then the report is the data.  

But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect.  (They might validate your group's wisdom, or find great wisdom that your group missed.)  In these cases, I would say a very short Data Management Plan would be worth including.

john

On Apr 21, 2011, at 13:43, Aletia Morgan wrote:

> Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.
>  
> The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.
>  
> Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
> required.?
>  
> If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!
>  
> Thanks for any thoughts, and if there might be a better place to ask this question.
>  
> Regards,
> Aletia
>  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Aletia Morgan
> Research Application Designer
> Office of the Vice President for Research 
>    and Graduate & Professional Education
> Rutgers, The State University of New Jersey
> 715 CoRE Building, Busch Campus
> ahmorgan at vpr.rutgers.edu
> 732-445-3344
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110421/c4733c20/attachment.html>

From jgraybeal at ucsd.edu  Fri Apr 22 00:55:51 2011
From: jgraybeal at ucsd.edu (John Graybeal)
Date: Thu, 21 Apr 2011 21:55:51 -0700
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>

Here's the way I think about this kind of thing:

1) You have done useful work by the time you finish (presumably).
2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs.
3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that?

If the basis is entirely on other papers, then those would presumably be cited in the report.  If it is strictly a thought process of a single group, then the report is the data.  

But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect.  (They might validate your group's wisdom, or find great wisdom that your group missed.)  In these cases, I would say a very short Data Management Plan would be worth including.

john

On Apr 21, 2011, at 13:43, Aletia Morgan wrote:

> Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.
>  
> The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.
>  
> Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
> required.?
>  
> If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!
>  
> Thanks for any thoughts, and if there might be a better place to ask this question.
>  
> Regards,
> Aletia
>  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Aletia Morgan
> Research Application Designer
> Office of the Vice President for Research 
>    and Graduate & Professional Education
> Rutgers, The State University of New Jersey
> 715 CoRE Building, Busch Campus
> ahmorgan at vpr.rutgers.edu
> 732-445-3344
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110421/c4733c20/attachment-0001.html>

From oneiros at grace.nascom.nasa.gov  Fri Apr 22 11:35:00 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Fri, 22 Apr 2011 11:35:00 -0400 (EDT)
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <alpine.OSX.2.00.1104221103080.4253@moat.nascom.nasa.gov>


On Thu, 21 Apr 2011, Aletia Morgan wrote:

> 
> Greetings ? I?m working with PIs on reviewing DMP documents that are 
> part of NSF proposals, and I am wondering about whether a DMP is needed.
> 
> The project involves staging a meeting with researchers in the 
> discipline, with a goal that includes the development of plans for 
> future research.
> 
> Obviously, there?s no experimental data here, and the output is 
> essentially a narrative document.? The PI asserts that ?This proposal 
> does not seek to collect environmental data. A data management plan is 
> not required.?

It's observational data, not experimental.  And if you're collecting 
information about the attitudes of people, I'd consider that to be 
environmental, personally.


> If a conference is being organized, notes are taken, documents are being 
> written.?? Is there a need to say how these records will be developed 
> and preserved?? Or am I being too compulsive!

I think this goes back to the discussion at RDAP11, where some people 
didn't think of what they were collecting as 'data', but I'd argue that 
the notes taken, and recordings of the meetings, etc, are in fact data. 
If there were some sort of coding done on the participants responses or 
overall attitudes, or results of any polls/voting at the meeting, all of 
those would be more easily recognizable as 'data', but I'd argue that 
notes, even with inherent bias and lack of completeness are in fact a form 
of 'data'.

I'm actually participating in multiple side discussions right now on the 
nature of what 'data' is;  my point of view had been from the digital 
sensor aspect, but in this case it's the recordings by some observer (in 
this case, a person, not a sensor) that are going to be used as the 
supporting basis for any results.

I'd say that the notes on planning the meetings might not be 'data', but 
the agenda and materials given to participants for the meetings are 
important documents to archive, as they're the equivalent of the 
experimental plan -- knowing how you presented information to the 
participants to elicit the responses that you received might either be 
useful to someone trying to repeat the process in another discipline, 
repeat it with a larger audience, or repeat it after time has elapsed to 
see if attitudes have changed.

There might also be something in that plan that could hint at potential 
bias in the results.

...

It may also be that the apparent lack of some sort of hard data is a 
problem in itself.  (I don't know how the specific grant, or what's normal 
for that area, so this might not be the case) ... but if you wanted to 
make sure that there was at least *some* true 'data' as a result of the 
meetings, you could distribute a short questionaire to each of the 
participants (yes, I know, that means getting IRB involved), so there was 
at least some 'data' that came out of the meetings ... even if it's just 
on how well they though they meetings went and comments for how to improve 
them.


> Thanks for any thoughts, and if there might be a better place to ask 
> this question.

If there are any discussion groups for the specific discipline covered by 
the grant (even if that isn't the precise discipline the researchers might 
be in), I'd also ask there.

... but I think this goes into what what Cliff Lynch said were the basic 
parts of the NSF DMPs ... the first step is to figure out what the 
products of the effort are valuable and worth keeping ... and he did say, 
it's possible that there aren't any.

I personally think there are, even if it's just to allow someone else to 
review how you conducted the meetings, so they can repeat them or improve 
on them, and thus save time for the next iteration.  Is it data or 
documentation?  Well, I'm not going to get into that argument, but 
whatever it is, I think it does have value.

-Joe

-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center


From bdwestra at gmail.com  Fri Apr 22 12:07:17 2011
From: bdwestra at gmail.com (Brian Westra)
Date: Fri, 22 Apr 2011 09:07:17 -0700
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <BANLkTikc95n27LYKdrEr12tS=SmUkermuA@mail.gmail.com>

Hi Aletia,

I'd echo the comments of others on this list.

By the way, the ARL data sharing support group list (
http://groups.google.com/group/arl-data-sharing-support-group/?pli=1), and
the Research Digital Management list (
https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=RESEARCH-DATAMAN) are also
good places to post this type of question.

Thanks,
Brian
--

Brian Westra
Lorry I. Lokey Science Data Services Librarian
University of Oregon
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110422/aa38f2cd/attachment.html>

From oneiros at grace.nascom.nasa.gov  Fri Apr 22 14:11:37 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Fri, 22 Apr 2011 14:11:37 -0400 (EDT)
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
Message-ID: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>


As part of the side discussion about defining what a dataset is, an issue 
got raised about what about a given object makes it of 'archive quality'?


I know there's the TRAC checklist to look at what archives need to do, and 
I've seen various guidance on digitization of images, movies or audio on 
what resolution you should digitize at, and which formats you should use, 
and what metadata to attach.

... but for people who are writing the various standards used for storing 
scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any 
recommendations on what sort of features / documentation / metadata are 
necessary to be of 'archival' quality?  (either as part of the standard, 
or as metadata that needs to be in each object being tracked)

-Joe

-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center


From cheetham at doit.wisc.edu  Fri Apr 22 12:14:18 2011
From: cheetham at doit.wisc.edu (Jan Cheetham)
Date: Fri, 22 Apr 2011 11:14:18 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
Message-ID: <22DA497A-63E4-4F52-89A2-1B3DA2065EF3@doit.wisc.edu>

One of the members of the digital curation team at my campus helped a PI write a DMP for a project that sounds similar to the one you're describing, i.e. it involves establishing a working group of researchers to brainstorm research approaches to solve a particular problem, develop a model, and disseminate the model and reports.

It's not appropriate to share the DMP at this point (since its currently under review by NSF) but I can describe the approach it took in general terms. It broke down the types of data in a table. This included things like reports from the group, the model design, etc. It described formats of these documents (PDF and spreadsheets, mostly) and what type of metadata (eg. author names, dates, copyright, etc) would be embedded in them. The data sharing section described how documents would be shared and accessed by members of the research group during the "discovery" period (including the URLs of relevant wikis and file sharing sites) and also how the results would be disseminated afterwards, including who the target audiences would be, URLs for sites where the results would be published, and stated what the policies for re-use of results would be Finally, there was a section on long term archiving, which mentioned how long documents would be kept, on whose computer, etc., and plans to archive/share the results with archive.org. In total, this DMP is about 1.5 pages long.

Of course, it's too early to know how successful this approach will be with NSF reviewers but I thought it might help to share this much.

Jan


On Apr 21, 2011, at 3:43 PM, Aletia Morgan wrote:

> Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.
>  
> The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.
>  
> Obviously, there?s no experimental data here, and the output is essentially a narrative document.  The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not
> required.?
>  
> If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!
>  
> Thanks for any thoughts, and if there might be a better place to ask this question.
>  
> Regards,
> Aletia
>  
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Aletia Morgan
> Research Application Designer
> Office of the Vice President for Research 
>    and Graduate & Professional Education
> Rutgers, The State University of New Jersey
> 715 CoRE Building, Busch Campus
> ahmorgan at vpr.rutgers.edu
> 732-445-3344
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap

Jan Cheetham, Ph.D.
Research Technologies Consultant
Division of Information Technology (DoIT)
University of Wisconsin-Madison
1401 University Ave
Madison, Wisconsin
608-262-2947
cheetham at doit.wisc.edu


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110422/3330dff7/attachment.html>

From sah at virginia.edu  Fri Apr 22 16:32:35 2011
From: sah at virginia.edu (Lake, Sherry (sah))
Date: Fri, 22 Apr 2011 20:32:35 +0000
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
	<8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>
Message-ID: <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu>

As for the particular question asked in this e-mail thread, maybe page 4 of the Education & Human Resources (HER) Directorate DMP guidelines (http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf)  will help. This section of the guidelines list examples for EHR proposals to follow:

1. A proposal for a workshop that will result in a workshop report.
a. The DMP could consist of a statement to the effect that a workshop report will be produced and disseminated, e.g., via a website, publication in a journal, or other means.

Institute of Museum & Library Services (IMLS) is a little clearer on when a DMP is required. The DMP requiremnts is part 3 of the section "Specification for Projects that Develop Digital Products". So I assume, for IMLS, no digital products, no DMP required.

I wish the NSF would give a little more guidance as to what they want in a DMP. They could then tell us which proposals do not require one. I think looking at the specific solicitation should have that information, like Melissa said.

--
Sherry Lake                                              shlake at virginia.edu<mailto:shlake at virginia.edu>
Scientific Data Consultant<http://www.lib.virginia.edu/brown/data/>
Brown Science and Engineering Library   University of Virginia
* * * * * * * * * * * * * * * * *
       "A library is not a luxury but one of the necessities
                  of life." --- Henry Ward Beecher
* * * * * * * * * * * * * * * * *


From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of John Graybeal
Sent: Friday, April 22, 2011 12:56 AM
To: Research Data, Access and Preservation
Cc: rdap at mail.asis.org
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

Here's the way I think about this kind of thing:

1) You have done useful work by the time you finish (presumably).
2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs.
3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that?

If the basis is entirely on other papers, then those would presumably be cited in the report.  If it is strictly a thought process of a single group, then the report is the data.

But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect.  (They might validate your group's wisdom, or find great wisdom that your group missed.)  In these cases, I would say a very short Data Management Plan would be worth including.

john

On Apr 21, 2011, at 13:43, Aletia Morgan wrote:


Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there's no experimental data here, and the output is essentially a narrative document.  The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not
required."

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344


_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110422/0cea8399/attachment.html>

From oneiros at grace.nascom.nasa.gov  Mon Apr 25 14:17:41 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Mon, 25 Apr 2011 14:17:41 -0400 (EDT)
Subject: [Rdap] Presentations from RDAP 2011
Message-ID: <alpine.OSX.2.00.1104251358360.4253@moat.nascom.nasa.gov>


Sorry,

I don't think anything went to the list about this --

Melissa Weaver placed the presentations up on SlideShare:

 	http://www.slideshare.net/asist_org/

I'm also not sure what the etiquitte is for pre-print stuff for the ASIS&T 
Bulletin, but I have a summary written up (which well, had to get trimmed 
for space), but it's on a different machine, so I'll have to post it 
later ... and I'll be posting it around anyway, as part of my trip report, 
so it'll appear on a few mailing lists.

I already have my rough notes online, but I don't know how useful they are 
on their own:

 	http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_notes.txt

I tried to make sure to record the Q/A parts, but well, there's a few gaps 
where either I was participating in the discussion, or I was distracted (I 
admit, I was sending e-mail ... but some of it was because of stuff I was 
learning at the meeting)

-Joe


From Peter.Wittenburg at mpi.nl  Mon Apr 25 15:26:56 2011
From: Peter.Wittenburg at mpi.nl (Peter Wittenburg)
Date: Mon, 25 Apr 2011 21:26:56 +0200
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
Message-ID: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl>

Hallo Joe  - an answer from Europe.

There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future.

With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course). 
Here are a few references which are easy to read (Short Guides):
- http://www.isocat.org (concept registry)
- http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides)
- http://www.datasealofapproval.org/ 
- http://www.mpi.nl/tla (our unit)

best
Peter

> -----Original Message-----
> From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of
> Joe Hourcle
> Sent: Friday, April 22, 2011 8:12 PM
> To: Research Data, Access and Preservation
> Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
> 
> 
> 
> As part of the side discussion about defining what a dataset is, an issue got
> raised about what about a given object makes it of 'archive quality'?
> 
> 
> I know there's the TRAC checklist to look at what archives need to do, and
> I've seen various guidance on digitization of images, movies or audio on
> what resolution you should digitize at, and which formats you should use,
> and what metadata to attach.
> 
> ... but for people who are writing the various standards used for storing
> scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any
> recommendations on what sort of features / documentation / metadata are
> necessary to be of 'archival' quality?  (either as part of the standard,
> or as metadata that needs to be in each object being tracked)
> 
> -Joe
> 
> -----
> Joe Hourcle
> Programmer/Analyst
> Solar Data Analysis Center
> Goddard Space Flight Center
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


From michael at psu.edu  Tue Apr 26 12:54:13 2011
From: michael at psu.edu (Michael J. Giarlo)
Date: Tue, 26 Apr 2011 12:54:13 -0400
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
Message-ID: <4DB6F8B5.90106@psu.edu>

On 04/22/2011 02:11 PM, Joe Hourcle wrote:
>
> As part of the side discussion about defining what a dataset is, an issue
> got raised about what about a given object makes it of 'archive quality'?
>

That's a doozy, Joe!  Good question to ponder as a group.

This strikes me as something of a glib non-answer, but here goes.

The first bit I have trouble wrapping my mind around is that the notion 
of archival quality is binary, that an object can be of archival quality 
or not.

To be perfectly frank, I don't think we know enough yet about what 
archival quality really means in the digital context; *most* of us have 
been managing files for, what, 10-15 years?

Combining these thoughts, archival quality feels like a bit of guesswork 
informed by our professional expertise, more like a confidence level 
associated with an object being preserved based upon:

  * Retention period -- I have fair-to-high confidence that just about 
any digital object is preservable for 1 year, and very low confidence 
that just about any digital object is preservable for 100.

  * File formats -- I have higher confidence in open formats than 
proprietary formats, in widely used rather than narrowly used formats, 
in self-descriptive rather than opaque formats.

  * Metadata/documentation richness -- The richer and more interoperable 
and more widely understood the metadata, the higher my confidence will be.

  * Use contexts -- I'd worry a lot more about an object with few or no 
use contexts ("dark archives") than others.

  * A whole lot more that escapes me, such as issues around the archive 
itself and its organizational, technical, and financial sustainability.

Thanks for raising the question, Joe.

-Mike


From Mark.Conrad at nara.gov  Tue Apr 26 13:36:01 2011
From: Mark.Conrad at nara.gov (Mark Conrad)
Date: Tue, 26 Apr 2011 13:36:01 -0400
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl>
Message-ID: <4DB6CA4102000097000081A1@gwiaa.nara.gov>

ISO 16363 is in draft. It replaces TRAC. I understand that in Europe there will be three levels of certification for a digital repository. DSA will be one level ISO 16363 will be another level.

 
Mark Conrad
NARA Center for Advanced Systems and Technologies
NHA 
The National Archives and Records Administration
Erma Ora Byrd Conference and Learning Center
Building 494 Second Floor
610 State Route 956
Rocket Center, WV  26726

Phone: 304-726-7820
Fax: 304-726-7802
Email: mark.conrad at nara.gov 
http://www.facebook.com/NARACAST
http://www.archives.gov/ncast/
>>> Peter Wittenburg <Peter.Wittenburg at mpi.nl> 4/25/2011 3:26 PM >>>
Hallo Joe  - an answer from Europe.

There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future.

With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course). 
Here are a few references which are easy to read (Short Guides):
- http://www.isocat.org (concept registry)
- http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides)
- http://www.datasealofapproval.org/ 
- http://www.mpi.nl/tla (our unit)

best
Peter

> -----Original Message-----
> From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of
> Joe Hourcle
> Sent: Friday, April 22, 2011 8:12 PM
> To: Research Data, Access and Preservation
> Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
> 
> 
> 
> As part of the side discussion about defining what a dataset is, an issue got
> raised about what about a given object makes it of 'archive quality'?
> 
> 
> I know there's the TRAC checklist to look at what archives need to do, and
> I've seen various guidance on digitization of images, movies or audio on
> what resolution you should digitize at, and which formats you should use,
> and what metadata to attach.
> 
> ... but for people who are writing the various standards used for storing
> scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any
> recommendations on what sort of features / documentation / metadata are
> necessary to be of 'archival' quality?  (either as part of the standard,
> or as metadata that needs to be in each object being tracked)
> 
> -Joe
> 
> -----
> Joe Hourcle
> Programmer/Analyst
> Solar Data Analysis Center
> Goddard Space Flight Center
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org
http://mail.asis.org/mailman/listinfo/rdap

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110426/d32dcd71/attachment.html>

From Peter.Wittenburg at mpi.nl  Tue Apr 26 14:03:49 2011
From: Peter.Wittenburg at mpi.nl (Peter Wittenburg)
Date: Tue, 26 Apr 2011 20:03:49 +0200
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <4DB6CA4102000097000081A1@gwiaa.nara.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl>
	<4DB6CA4102000097000081A1@gwiaa.nara.gov>
Message-ID: <147889AFDB014A43A5A9318270C826C9BA100C368A@MAILER.mpi.nl>

Yes Mark this is correct.
peter

From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Mark Conrad
Sent: Tuesday, April 26, 2011 7:36 PM
To: Access and Preservation Research Data
Subject: Re: [Rdap] What makes an 'Archive Quality' Digital Object?

ISO 16363 is in draft. It replaces TRAC. I understand that in Europe there will be three levels of certification for a digital repository. DSA will be one level ISO 16363 will be another level.

Mark Conrad
NARA Center for Advanced Systems and Technologies
NHA
The National Archives and Records Administration
Erma Ora Byrd Conference and Learning Center
Building 494 Second Floor
610 State Route 956
Rocket Center, WV  26726

Phone: 304-726-7820
Fax: 304-726-7802
Email: mark.conrad at nara.gov<mailto:mark.conrad at nara.gov>
http://www.facebook.com/NARACAST
http://www.archives.gov/ncast/
>>> Peter Wittenburg <Peter.Wittenburg at mpi.nl<mailto:Peter.Wittenburg at mpi.nl>
> 4/25/2011 3:26 PM >>>
Hallo Joe  - an answer from Europe.

There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future.

With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course).
Here are a few references which are easy to read (Short Guides):
- http://www.isocat.org (concept registry)
- http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides)
- http://www.datasealofapproval.org/
- http://www.mpi.nl/tla (our unit)

best
Peter

> -----Original Message-----
> From: rdap-bounces at asis.org<mailto:rdap-bounces at asis.org> [mailto:rdap-bounces at asis.org]<mailto:[mailto:rdap-bounces at asis.org]> On Behalf Of
> Joe Hourcle
> Sent: Friday, April 22, 2011 8:12 PM
> To: Research Data, Access and Preservation
> Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
>
>
>
> As part of the side discussion about defining what a dataset is, an issue got
> raised about what about a given object makes it of 'archive quality'?
>
>
> I know there's the TRAC checklist to look at what archives need to do, and
> I've seen various guidance on digitization of images, movies or audio on
> what resolution you should digitize at, and which formats you should use,
> and what metadata to attach.
>
> ... but for people who are writing the various standards used for storing
> scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any
> recommendations on what sort of features / documentation / metadata are
> necessary to be of 'archival' quality?  (either as part of the standard,
> or as metadata that needs to be in each object being tracked)
>
> -Joe
>
> -----
> Joe Hourcle
> Programmer/Analyst
> Solar Data Analysis Center
> Goddard Space Flight Center
>
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
> http://mail.asis.org/mailman/listinfo/rdap

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110426/0fe3924c/attachment.html>

From michael at psu.edu  Tue Apr 26 14:23:35 2011
From: michael at psu.edu (Michael J. Giarlo)
Date: Tue, 26 Apr 2011 14:23:35 -0400
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <4DB6CA4102000097000081A1@gwiaa.nara.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>	<147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl>
	<4DB6CA4102000097000081A1@gwiaa.nara.gov>
Message-ID: <4DB70DA7.1080808@psu.edu>

On 04/26/2011 01:36 PM, Mark Conrad wrote:
> ISO 16363 is in draft. It replaces TRAC. I understand that in Europe
> there will be three levels of certification for a digital repository.
> DSA will be one level ISO 16363 will be another level.
>

There's a bit more information about this in the January 2011 D-Lib 
(down towards the bottom):

      http://www.dlib.org/dlib/january11/klump/01klump.html

-Mike


From oneiros at grace.nascom.nasa.gov  Tue Apr 26 15:30:05 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Tue, 26 Apr 2011 15:30:05 -0400 (EDT)
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <4DB6F8B5.90106@psu.edu>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
Message-ID: <alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>


On Tue, 26 Apr 2011, Michael J. Giarlo wrote:

> On 04/22/2011 02:11 PM, Joe Hourcle wrote:
>>
>> As part of the side discussion about defining what a dataset is, an issue
>> got raised about what about a given object makes it of 'archive quality'?
>>
>
> That's a doozy, Joe!  Good question to ponder as a group.
>
> This strikes me as something of a glib non-answer, but here goes.
>
> The first bit I have trouble wrapping my mind around is that the notion
> of archival quality is binary, that an object can be of archival quality
> or not.

Maybe I should be asking the opposite --

What would make a digital object *not* of archival quality?

Although, even then, I can think of a few cases where even something 
stored in a proprietary file format, which needs proprietary software 
to be read, which runs on hardware that's no longer manufactured ... is 
considered 'archived' because they either stored the hardware, or managed 
to emulate the machine and OS so they could run the software.

...

I guess what I'm really looking for is something like TRAC or Data Seal of 
Approval, but instead of guidelines for the repository, focuses on the 
qualities of a good self-documenting file or package that could later be 
ingested by an archive for them to maintain.

In some ways, this is difficult, as without the associated archive, you 
won't necessarily have a useful identifier scheme.  (as an identifier 
without a place to register it in my mind is pretty useless, unless you're 
just using it to try to show that two objects are likely not the same 
thing)

...

And then for some reason, as I was writing this e-mail, I realized I may 
already have something close to what I'm looking for, but it probably 
needs to be more explicit in some regards:

 	http://sdac.virtualsolar.org/catalogs/catalog_checklist

I wrote this checklist to give to scientists who are maintaining 'science 
catalogs' (basically, lists of ... something).  In solar physics, they 
tend to be stored & distributed in ASCII text files, HTML tables, or  MS 
Excel files.  Sometimes, they're distributed as PDF files or FITS tables.

It's not a perfect correlation into what's ideal for archiving ... but it 
does get into some of the issues of what file formats you're using, and 
what sort of documentation you should be providing, although some of it's 
catalog-specific.

...

And, to spur discussion, as I said this came up because of another 
conversation, here was my response before I threw it out to this group:

> > My intent [] is to answer a very common question which is what does 
> > "Archive Quality" mean when it comes to digital data.  It seems its 
> > always two-sides of the same coin: Preservation and Usability.
> 
> I thought that was what you were going for ... I'd say to be of 'archive 
> quality', it must be:
>
>     1. In a format that's well documented, and with the standard
>        freely available such that software to read the files could be
>        re-implemented from only the documentation.
>
>     2. Have enough metadata attached to the object such that it can be
>        identified, and the necessary scientific metadata for use can
>        be obtained.  (the science metadata might be stored in a
>        separate catalog, but then the object would need to identify
>        the catalog and the necessary foreign keys into the catalog)
>
>     3. As part of that 'scientific metadata', identify what collection
>        it's a part of, for purposes of retrieving documentation on how
>        to use the data.
>
>     4. Have some way of identifying if there has been corruption of
>        the data from what was originally stored.  (I won't get
>        into how sensitive it needs to be; if just identifying it
>        occurred is enough, or if you have to be able to track down
>        which discrete datum is bad)
> 
> There are other characteristics which are more about how it's being 
> stored than about the object itself.  (eg, don't lose it, make sure it's 
> actually available for use)


-Joe


From jgraybeal at ucsd.edu  Tue Apr 26 19:06:39 2011
From: jgraybeal at ucsd.edu (John Graybeal)
Date: Tue, 26 Apr 2011 16:06:39 -0700
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
Message-ID: <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>

That's a neat catalog checklist! 

I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata:
  http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system

Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc.

Two lessons from that progression:
 - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it.  In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?).  In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way. 
 - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media.  Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later.  If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data.  Ideally this transformation is lossless (lossless by design, even).  If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc.

We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around.  Different technological choices appropriate in each case.

John

On Apr 26, 2011, at 12:30, Joe Hourcle wrote:

> And then for some reason, as I was writing this e-mail, I realized I may 
> already have something close to what I'm looking for, but it probably 
> needs to be more explicit in some regards:
> 
> 	http://sdac.virtualsolar.org/catalogs/catalog_checklist
> 
> I wrote this checklist to give to scientists who are maintaining 'science 
> catalogs' (basically, lists of ... something).  In solar physics, they 
> tend to be stored & distributed in ASCII text files, HTML tables, or  MS 
> Excel files.  Sometimes, they're distributed as PDF files or FITS tables.
> 
> -Joe
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org


From rduerr at nsidc.org  Tue Apr 26 23:59:15 2011
From: rduerr at nsidc.org (Ruth Duerr)
Date: Tue, 26 Apr 2011 21:59:15 -0600
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>
Message-ID: <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org>

I like both Joe and John's lists - pretty darn comprehensive.  One thing I noted about the lists is that while calibration was explicitly listed in John's list, neither list explicitly discussed validation as a step beyond calibration, though I suppose some of the other items on each list might cover that concept.   Validation is particularly important with remote sensing data since knowing that a sensor is calibrated to some precision may not say anything about how accurately it is actually measuring some physical parameter - ground truth is often needed to judge that.

I especially liked John's "subtle characteristics," especially the one about data being annotatable.  Given that the quality of any particular data set varies depending on the use to which you'd like to put it to and that the data originator rarely is in a position to know all of the potential uses and users of their data, capturing the annotations of users is often the only way to start capturing information about the utility of the data to audiences other than the original producer.

Ruth

On Apr 26, 2011, at 5:06 PM, John Graybeal wrote:

> That's a neat catalog checklist! 
> 
> I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata:
>  http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system
> 
> Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc.
> 
> Two lessons from that progression:
> - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it.  In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?).  In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way. 
> - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media.  Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later.  If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data.  Ideally this transformation is lossless (lossless by design, even).  If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc.
> 
> We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around.  Different technological choices appropriate in each case.
> 
> John
> 
> On Apr 26, 2011, at 12:30, Joe Hourcle wrote:
> 
>> And then for some reason, as I was writing this e-mail, I realized I may 
>> already have something close to what I'm looking for, but it probably 
>> needs to be more explicit in some regards:
>> 
>> 	http://sdac.virtualsolar.org/catalogs/catalog_checklist
>> 
>> I wrote this checklist to give to scientists who are maintaining 'science 
>> catalogs' (basically, lists of ... something).  In solar physics, they 
>> tend to be stored & distributed in ASCII text files, HTML tables, or  MS 
>> Excel files.  Sometimes, they're distributed as PDF files or FITS tables.
>> 
>> -Joe
>> _______________________________________________
>> Rdap mailing list
>> Rdap at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/rdap
> 
> 
> ----------------
> John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
> Product Manager
> Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
> Marine Metadata Interoperability Project: http://marinemetadata.org
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


From mweaver at asis.org  Wed Apr 27 00:37:43 2011
From: mweaver at asis.org (Melissa Weaver)
Date: Tue, 26 Apr 2011 21:37:43 -0700
Subject: [Rdap] RDAP11 Slideshare Presentation added
Message-ID: <BANLkTi=49XSP+LrKp+YQu04t=gb1oNFB5g@mail.gmail.com>

Hello all,

We recently added another poster session from Monica Omodei and Jeremy
Frumkin to the RDAP11 slideshare:
http://www.slideshare.net/asist_org/

Please do visit and share if you haven't seen the share space already. If
you have submissions or comments, please contact me at mweaver at asis.org.

Thank you,

"Misty" Melissa Weaver
ASIS&T
mweaver at asis.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110426/44211050/attachment.html>

From rduerr at nsidc.org  Wed Apr 27 01:04:22 2011
From: rduerr at nsidc.org (Ruth Duerr)
Date: Tue, 26 Apr 2011 23:04:22 -0600
Subject: [Rdap] RDAP11 Slideshare Presentation added
In-Reply-To: <BANLkTi=49XSP+LrKp+YQu04t=gb1oNFB5g@mail.gmail.com>
References: <BANLkTi=49XSP+LrKp+YQu04t=gb1oNFB5g@mail.gmail.com>
Message-ID: <E519B71B-4415-413F-9355-FC18C40C6DA4@nsidc.org>

Hi Melissa,

I was surprised to not find my presentation on the slideshare site.  I'd given a copy to Joe shortly after the conference.  Do you know when I can expect to find that presentation up on the site?

Thanks,

Ruth

On Apr 26, 2011, at 10:37 PM, Melissa Weaver wrote:

> Hello all,
> 
> We recently added another poster session from Monica Omodei and Jeremy Frumkin to the RDAP11 slideshare:
> http://www.slideshare.net/asist_org/
> 
> Please do visit and share if you haven't seen the share space already. If you have submissions or comments, please contact me at mweaver at asis.org.
> 
> Thank you,
> 
> "Misty" Melissa Weaver
> ASIS&T
> mweaver at asis.org
> 
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety."

Benjamin Franklin, Historical Review of Pennsylvania, 1759

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ruth Duerr
Data Stewardship Program Lead
National Snow and Ice Data Center
Cooperative Institute for Research in Environmental Science
University of Colorado at Boulder
Boulder, CO 80309
(303) 735-0136
rduerr at nsidc.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110426/cff4782a/attachment.html>

From oneiros at grace.nascom.nasa.gov  Wed Apr 27 10:16:32 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Wed, 27 Apr 2011 10:16:32 -0400 (EDT)
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>
	<2AD5543E-D5D3-455F-B555-152205439895@nsidc.org>
Message-ID: <alpine.OSX.2.00.1104270907220.4253@moat.nascom.nasa.gov>


On Tue, 26 Apr 2011, Ruth Duerr wrote:

> I like both Joe and John's lists - pretty darn comprehensive.  One thing 
> I noted about the lists is that while calibration was explicitly listed 
> in John's list, neither list explicitly discussed validation as a step 
> beyond calibration, though I suppose some of the other items on each 
> list might cover that concept.  Validation is particularly important 
> with remote sensing data since knowing that a sensor is calibrated to 
> some precision may not say anything about how accurately it is actually 
> measuring some physical parameter - ground truth is often needed to 
> judge that.

I think this is one of those issues where what you're studying comes into 
play.  There's no way to get the 'ground truth' for the data I deal with. 
(even with those plans to put a man on the sun ... but they'd do it at 
night, so it'd be okay).

In fact, my talk that spurred the checklist (dealing with some of the 
problems we were having in processing catalogs) was put in a session on 
'data and instrumentation', and one of the talks before mine was from 
someone who worked on the SXI (Solar X-ray Imager) on the GOES fleet 
(constellation?  I'm not sure what you call a group of spacecraft, and 
'spacecrafts' sounds funny).

Anyway, they do an intercalibration between GOES launches, so that the 
next SXI's data is calibrated to be comparable to the previous SXI ... and 
it seems that the calibration factor they were using to get to physical 
units (W/m^2) had actually been wrong ... the later instruments were 
calibrated correctly on the ground, and it was original ones that 
should've been adjusted.

And it was by a factor of about 20-30%.

They threw out the question to the scientists of how to deal with it -- go 
back and reclassify all flares?  (an M8 would now be an X1), change the 
definition of the flare class (X would now be > 8*10^-3 vs. 10^-4), or 
something else?   I don't know if a decision was ever made, but if groups 
are using two different scales for classifying flares, it could be messy.


> I especially liked John's "subtle characteristics," especially the one 
> about data being annotatable.  Given that the quality of any particular 
> data set varies depending on the use to which you'd like to put it to 
> and that the data originator rarely is in a position to know all of the 
> potential uses and users of their data, capturing the annotations of 
> users is often the only way to start capturing information about the 
> utility of the data to audiences other than the original producer.

What?  We're supposed to actually *test* the backups?

And then you're going to tell me that it's a problem when it takes 15 hrs 
to retrieve a day's worth of data from tape, just because we need to 
re-calibrate the first 9 months of data from the mission, and someone 
accidentially flushed the raw data from disk.

(Hmm ... now I just have to figure out how to get the scientists to 
actually look at these lists before they build data systems ... catalogs 
can generally be cleaned up after the fact, but data systems not so much)

...

But for the annotation -- in a way, the various 'catalogs' that I deal 
with are annotations, but we have some really subtle issues that basically 
correlate to some of the problems in data citation:

     What am I annotating?
 	(a) The world as observed in this data (eg, X2 flare)
 	(b) The observation  (eg, partially obscurred by clouds)
 	(c) The observation as it exists on disk (eg, blocks lost in
 		transfer; partial image)
 	(d) The calibrated edition of data (eg, notes on oddities in the
 		data)
 	(e) The file on disk (eg, invalid checksum; possible corruption)
 	(f) The instrument (eg, a discontinuity due to servicing, or even
 	    the lack of data for a time period)

It's possible that some of these might 'trickle down'.  (eg, if I'm 
annotating the calibrated form, I'm also indirectly annotating the 
observation and the state of the world ... I saw an X2 flare based on 
this set of calibrated images ... which means that I'm asserting that 
there's an X2 flare that might've been visible by other instruments 
observing that region at that time if they had similar observing 
characteristics)

And I mention data citation for a few reasons:

 	1. Citation should be a type of annoation;  if someone is later
 	   browsing the data, we should be able to tell them what papers
 	   have been published using that data, so they can avoid
 	   duplicating work or identify collaborators for additional
 	   analysis.

 	2. Downloading should be a type of annotation.  It allows a
 	   researcher to easily identify what they had downloaded, so they
 	   can then generate a record for citation.  It can also be used
 	   to generate periods/locations of interest in general, but even
 	   anonymous, we get into some issues like Mike Brown's Haumea
 	   incident [1].

 	3. But if we know who's downloaded the data, we can inform
 	   researchers if we've identified problems with the data and/or
 	   a recalibration run, so they don't get caught unaware when it
 	   happens after they've downloaded the data, but before they've
 	   submitted their research paper.

[1] Another researcher published that they had discovered the 'dwarf'
     planet, but it was later discovered they had downloaded the observing
     logs, and determined where Brown had been looking :
     http://www.nytimes.com/2005/09/13/science/space/13plan.html


-Joe


From rduerr at nsidc.org  Wed Apr 27 11:44:00 2011
From: rduerr at nsidc.org (Ruth Duerr)
Date: Wed, 27 Apr 2011 09:44:00 -0600
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104270907220.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>
	<2AD5543E-D5D3-455F-B555-152205439895@nsidc.org>
	<alpine.OSX.2.00.1104270907220.4253@moat.nascom.nasa.gov>
Message-ID: <CE74473B-C055-4D10-9E2B-1ADD11478176@nsidc.org>


On Apr 27, 2011, at 8:16 AM, Joe Hourcle wrote:

> 
> 
> On Tue, 26 Apr 2011, Ruth Duerr wrote:
> 
>> I like both Joe and John's lists - pretty darn comprehensive.  One thing I noted about the lists is that while calibration was explicitly listed in John's list, neither list explicitly discussed validation as a step beyond calibration, though I suppose some of the other items on each list might cover that concept.  Validation is particularly important with remote sensing data since knowing that a sensor is calibrated to some precision may not say anything about how accurately it is actually measuring some physical parameter - ground truth is often needed to judge that.
> 
> I think this is one of those issues where what you're studying comes into play.  There's no way to get the 'ground truth' for the data I deal with. (even with those plans to put a man on the sun ... but they'd do it at night, so it'd be okay).

Yup
> 
> In fact, my talk that spurred the checklist (dealing with some of the problems we were having in processing catalogs) was put in a session on 'data and instrumentation', and one of the talks before mine was from someone who worked on the SXI (Solar X-ray Imager) on the GOES fleet (constellation?  I'm not sure what you call a group of spacecraft, and 'spacecrafts' sounds funny).
> 
> Anyway, they do an intercalibration between GOES launches, so that the next SXI's data is calibrated to be comparable to the previous SXI ... and it seems that the calibration factor they were using to get to physical units (W/m^2) had actually been wrong ... the later instruments were calibrated correctly on the ground, and it was original ones that should've been adjusted.
> 
> And it was by a factor of about 20-30%.
> 
> They threw out the question to the scientists of how to deal with it -- go back and reclassify all flares?  (an M8 would now be an X1), change the definition of the flare class (X would now be > 8*10^-3 vs. 10^-4), or something else?   I don't know if a decision was ever made, but if groups are using two different scales for classifying flares, it could be messy.
> 
> 
> 
>> I especially liked John's "subtle characteristics," especially the one about data being annotatable.  Given that the quality of any particular data set varies depending on the use to which you'd like to put it to and that the data originator rarely is in a position to know all of the potential uses and users of their data, capturing the annotations of users is often the only way to start capturing information about the utility of the data to audiences other than the original producer.
> 
> What?  We're supposed to actually *test* the backups?
> 
> And then you're going to tell me that it's a problem when it takes 15 hrs to retrieve a day's worth of data from tape, just because we need to re-calibrate the first 9 months of data from the mission, and someone accidentially flushed the raw data from disk.
> 
> (Hmm ... now I just have to figure out how to get the scientists to actually look at these lists before they build data systems ... catalogs can generally be cleaned up after the fact, but data systems not so much)
> 
> ...
> 
> But for the annotation -- in a way, the various 'catalogs' that I deal with are annotations, but we have some really subtle issues that basically correlate to some of the problems in data citation:
> 
>    What am I annotating?
> 	(a) The world as observed in this data (eg, X2 flare)
> 	(b) The observation  (eg, partially obscurred by clouds)
> 	(c) The observation as it exists on disk (eg, blocks lost in
> 		transfer; partial image)
> 	(d) The calibrated edition of data (eg, notes on oddities in the
> 		data)
> 	(e) The file on disk (eg, invalid checksum; possible corruption)
> 	(f) The instrument (eg, a discontinuity due to servicing, or even
> 	    the lack of data for a time period)

and possibly all of the above and more!  
> 
> It's possible that some of these might 'trickle down'.  (eg, if I'm annotating the calibrated form, I'm also indirectly annotating the observation and the state of the world ... I saw an X2 flare based on this set of calibrated images ... which means that I'm asserting that there's an X2 flare that might've been visible by other instruments observing that region at that time if they had similar observing characteristics)
> 
> And I mention data citation for a few reasons:
> 
> 	1. Citation should be a type of annoation;  if someone is later
> 	   browsing the data, we should be able to tell them what papers
> 	   have been published using that data, so they can avoid
> 	   duplicating work or identify collaborators for additional
> 	   analysis.

Actually the USGCRP paper includes citations as well as a host of technical documentation - while the paper is long the list inside the paper is relatively short:

"Instrument / sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics, etc.)
Instrument / sensor calibration data and method;
Processing algorithms and their scientific basis, including complete description of any sampling or mapping algorithm used in the creation of the product (e.g. contained in peer reviewed papers, in some cases supplemented by thematic information introducing the data set or product to scientists unfamiliar with it);
Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product;
Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive;
Quality assessment information;
Validation record, including identification of validation data sets;
Data structure and format, with definition of all parameters and fields;
In the case of earth-based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors that could influence the long-term record;
A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set;
Information received back from users of the data set or product."
> 
> 	2. Downloading should be a type of annotation.  It allows a
> 	   researcher to easily identify what they had downloaded, so they
> 	   can then generate a record for citation.  It can also be used
> 	   to generate periods/locations of interest in general, but even
> 	   anonymous, we get into some issues like Mike Brown's Haumea
> 	   incident [1].

Yes - in ESIP we've discussed this with the idea that a repository could create a citation for a user that referenced back to a specific set of files.  The issue with that is that generally users wouldn't be citing all of the data they downloaded, but some fraction of it or more likely some fraction of a number of downloads.  Just because it doesn't work perfectly for citation generation doesn't mean that it isn't a good idea in general though...
> 
> 	3. But if we know who's downloaded the data, we can inform
> 	   researchers if we've identified problems with the data and/or
> 	   a recalibration run, so they don't get caught unaware when it
> 	   happens after they've downloaded the data, but before they've
> 	   submitted their research paper.

Agreed...
> 
> [1] Another researcher published that they had discovered the 'dwarf'
>    planet, but it was later discovered they had downloaded the observing
>    logs, and determined where Brown had been looking :
>    http://www.nytimes.com/2005/09/13/science/space/13plan.html
> 
> 
> 
> -Joe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110427/bf342b16/attachment.html>

From michael at psu.edu  Wed Apr 27 12:21:37 2011
From: michael at psu.edu (Michael J. Giarlo)
Date: Wed, 27 Apr 2011 12:21:37 -0400
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
Message-ID: <4DB84291.7070300@psu.edu>

On 04/26/2011 03:30 PM, Joe Hourcle wrote:
>
>>
>> The first bit I have trouble wrapping my mind around is that the notion
>> of archival quality is binary, that an object can be of archival quality
>> or not.
>
> Maybe I should be asking the opposite --
>
> What would make a digital object *not* of archival quality?

I'd respond the same. :)  I couldn't confidently say whether an object 
is archival or not archival w/o a lot more context along the dimensions 
I mentioned (use, file formats, retention period, etc.), and even then I 
would be inclined not to say it's archival or not but rather give a 
qualified judgment on *how* likely I think that object will be around in 
N years.

> I guess what I'm really looking for is something like TRAC or Data Seal of
> Approval, but instead of guidelines for the repository, focuses on the
> qualities of a good self-documenting file or package that could later be
> ingested by an archive for them to maintain.

Nothing comes to mind, but I'd be interested in hearing examples of the 
same.

I guess this might be a good place to bring up what may be a related 
technology:

"""
Data Format Description Language (DFDL, often pronounced daff-o-dil) is 
a modeling language from the Open Grid Forum for describing general text 
and binary data. A DFDL model or schema allows any text or binary data 
to be read (or "parsed") from its native format and to be presented as 
an instance of an information set. The same DFDL schema also allows data 
to be taken from an instance of an information set and written out (or 
"serialized") to its native format.
""" -- http://en.wikipedia.org/wiki/Data_Format_Description_Language

I haven't had the occasion to use DFDL -- and I tend to shy away from 
huge blobs of XML and XML Schema -- but it could have some promise in 
this area.  Has anyone else used DFDL or thought about it in the context 
Joe brings up?

Btw, Joe, I appreciate your catalog checklist and the four 
characteristics you shared.  Good starting point.

-Mike


From rduerr at nsidc.org  Wed Apr 27 12:41:37 2011
From: rduerr at nsidc.org (Ruth Duerr)
Date: Wed, 27 Apr 2011 10:41:37 -0600
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <4DB84291.7070300@psu.edu>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<4DB84291.7070300@psu.edu>
Message-ID: <6E3876EF-D04A-48BF-9DF3-753838D8E303@nsidc.org>


On Apr 27, 2011, at 10:21 AM, Michael J. Giarlo wrote:

> On 04/26/2011 03:30 PM, Joe Hourcle wrote:
>> 
>>> 
>>> The first bit I have trouble wrapping my mind around is that the notion
>>> of archival quality is binary, that an object can be of archival quality
>>> or not.
>> 
>> Maybe I should be asking the opposite --
>> 
>> What would make a digital object *not* of archival quality?
> 
> I'd respond the same. :)  I couldn't confidently say whether an object 
> is archival or not archival w/o a lot more context along the dimensions 
> I mentioned (use, file formats, retention period, etc.), and even then I 
> would be inclined not to say it's archival or not but rather give a 
> qualified judgment on *how* likely I think that object will be around in 
> N years.
> 
>> I guess what I'm really looking for is something like TRAC or Data Seal of
>> Approval, but instead of guidelines for the repository, focuses on the
>> qualities of a good self-documenting file or package that could later be
>> ingested by an archive for them to maintain.
> 
> Nothing comes to mind, but I'd be interested in hearing examples of the 
> same.
> 
> I guess this might be a good place to bring up what may be a related 
> technology:
> 
> """
> Data Format Description Language (DFDL, often pronounced daff-o-dil) is 
> a modeling language from the Open Grid Forum for describing general text 
> and binary data. A DFDL model or schema allows any text or binary data 
> to be read (or "parsed") from its native format and to be presented as 
> an instance of an information set. The same DFDL schema also allows data 
> to be taken from an instance of an information set and written out (or 
> "serialized") to its native format.
> """ -- http://en.wikipedia.org/wiki/Data_Format_Description_Language
> 
> I haven't had the occasion to use DFDL -- and I tend to shy away from 
> huge blobs of XML and XML Schema -- but it could have some promise in 
> this area.  Has anyone else used DFDL or thought about it in the context 
> Joe brings up?
> 

I think THG looked at DFDL (along with a whole host of other such schemas) when they started working on developing human/machine readable maps to the content of HDF4 files.  They ended up developing their own schema based loosely on PREMIS.  I should note that the schema and map creating software is nearing final release and NASA EOSDIS plans on mapping each HDF4 file in the archives of their Earth Science Data Centers.

> Btw, Joe, I appreciate your catalog checklist and the four 
> characteristics you shared.  Good starting point.
> 
> -Mike
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


From oneiros at grace.nascom.nasa.gov  Wed Apr 27 12:54:36 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Wed, 27 Apr 2011 12:54:36 -0400 (EDT)
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <CE74473B-C055-4D10-9E2B-1ADD11478176@nsidc.org>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu>
	<2AD5543E-D5D3-455F-B555-152205439895@nsidc.org>
	<alpine.OSX.2.00.1104270907220.4253@moat.nascom.nasa.gov>
	<CE74473B-C055-4D10-9E2B-1ADD11478176@nsidc.org>
Message-ID: <alpine.OSX.2.00.1104271159510.4253@moat.nascom.nasa.gov>


On Wed, 27 Apr 2011, Ruth Duerr wrote:
> On Apr 27, 2011, at 8:16 AM, Joe Hourcle wrote:
>> On Tue, 26 Apr 2011, Ruth Duerr wrote:

[trimmed]

>> And I mention data citation for a few reasons:
>>
>> 	1. Citation should be a type of annoation;  if someone is later
>> 	   browsing the data, we should be able to tell them what papers
>> 	   have been published using that data, so they can avoid
>> 	   duplicating work or identify collaborators for additional
>> 	   analysis.


> Actually the USGCRP paper includes citations as well as a host of 
> technical documentation - while the paper is long the list inside the 
> paper is relatively short:

[trimmed]

And this is why we need the inter-discipline groups ... I hadn't even 
heard of this report.

>> 	2. Downloading should be a type of annotation.  It allows a
>> 	   researcher to easily identify what they had downloaded, so they
>> 	   can then generate a record for citation.  It can also be used
>> 	   to generate periods/locations of interest in general, but even
>> 	   anonymous, we get into some issues like Mike Brown's Haumea
>> 	   incident [1].
>
> Yes - in ESIP we've discussed this with the idea that a repository could 
> create a citation for a user that referenced back to a specific set of 
> files.  The issue with that is that generally users wouldn't be citing 
> all of the data they downloaded, but some fraction of it or more likely 
> some fraction of a number of downloads.  Just because it doesn't work 
> perfectly for citation generation doesn't mean that it isn't a good idea 
> in general though...

Exactly ... it's better than nothing, which is what we have right now.

The other thing that my group has discussed, is that if there were 
standards for identification (back to the archive & record within the 
archive), tools could be written to process the downloaded files, and
either:

 	1. Generate an appropriate citatation document (listing each file
 	   individually, using ORE, metalink or similar), or guide the
 	   researcher to generate a more concise citation such as the IPY
 	   standard.

 	2. Pass the identifiers back to the archive to ask if any of them
 	   have been deprecated, and automatically start retrieving the
 	   replacement.

 	3. Check with annotation services to see if there's any new
 	   information about the data.

If the tool's built to support plugins for different file formats & 
we had standards for the archives & annotation services, it could probably 
be generic enough to support any discipline.

(and so I have free time to do it, I need to stop procrastinating by 
replying to e-mail, and finish up some stuff to support a meeting next 
week)

-Joe


From mdawnweaver at gmail.com  Wed Apr 27 00:24:36 2011
From: mdawnweaver at gmail.com (Misty Melissa Weaver)
Date: Tue, 26 Apr 2011 21:24:36 -0700
Subject: [Rdap] RDAP presentations on slideshare
Message-ID: <BANLkTimTfWuqdsWphC9z7mrKyyBs9W0B1Q@mail.gmail.com>

Hello all,

We recently added another poster session from Monica Omodei and Jeremy
Frumkin to the RDAP11 slideshare:
http://www.slideshare.net/asist_org/

Please do visit and share if you haven't seen the share space already. If
you have submissions or comments, please contact me at mweaver at asis.org.

Thank you,
Misty Melissa Weaver
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110426/43e2953d/attachment.html>

From oneiros at grace.nascom.nasa.gov  Wed Apr 27 13:53:12 2011
From: oneiros at grace.nascom.nasa.gov (Joe Hourcle)
Date: Wed, 27 Apr 2011 13:53:12 -0400 (EDT)
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <4DB84291.7070300@psu.edu>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>
	<4DB6F8B5.90106@psu.edu>
	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>
	<4DB84291.7070300@psu.edu>
Message-ID: <alpine.OSX.2.00.1104271303040.4253@moat.nascom.nasa.gov>


On Wed, 27 Apr 2011, Michael J. Giarlo wrote:

> On 04/26/2011 03:30 PM, Joe Hourcle wrote:
>>
>>>
>>> The first bit I have trouble wrapping my mind around is that the notion
>>> of archival quality is binary, that an object can be of archival quality
>>> or not.
>>
>> Maybe I should be asking the opposite --
>>
>> What would make a digital object *not* of archival quality?
>
> I'd respond the same. :)  I couldn't confidently say whether an object
> is archival or not archival w/o a lot more context along the dimensions
> I mentioned (use, file formats, retention period, etc.), and even then I
> would be inclined not to say it's archival or not but rather give a
> qualified judgment on *how* likely I think that object will be around in
> N years.

Perhaps we need to treat it more like IT security ...

 	A password of (x) length and (y) complexity is expected
 	to be good for about (z) time ... but as new attacks are
 	found, (z) might decrease.

... and come up with recommendations for different time spans (if you want 
it to last (x) time, you should ...) or different levels of effort (if you 
only do (y), it'll likely be unreadable within (z) time.)


[trimmed]

> I guess this might be a good place to bring up what may be a related
> technology:

[trimmed]

> """ -- http://en.wikipedia.org/wiki/Data_Format_Description_Language
>
> I haven't had the occasion to use DFDL -- and I tend to shy away from
> huge blobs of XML and XML Schema -- but it could have some promise in
> this area.  Has anyone else used DFDL or thought about it in the context
> Joe brings up?

I hadn't seen that one.  There's been a discussion by the SPASE group [1]
about the need for documenting ASCII files, and I had brought up DSPL:

 	http://code.google.com/apis/publicdata/

They had the problem though that it seemed tightly tied to the individual 
file to be processed, rather than having it link to a collection of files, 
or have the data file link back to the description (or more than one 
description, in different languages, but then we start getting into 
the realm of self-documenting formats)

And they seemed more geared towards ASCII, while it sounds like DFDL might 
be able to describe more complex stuff, like FITS & VOTable.

(when I was working on catalogs, I started on a parser to use ExtJS to 
display the contents of VOTable, but dropped it when they kept giving 
different responses regarding the change to ExtJS's license from BSD to 
LGPL to GPL, and my unwillingness to spend 2+ years going through NASA's 
legal council ... and there was some concerns about section 508 support)

[1] Space Physics Archive Search & Extract; http://www.spase-group.org/


-Joe


(yes, yes, the not reading e-mail thing isn't working).


From michael at psu.edu  Wed Apr 27 14:10:41 2011
From: michael at psu.edu (Michael J. Giarlo)
Date: Wed, 27 Apr 2011 14:10:41 -0400
Subject: [Rdap] What makes an 'Archive Quality' Digital Object?
In-Reply-To: <alpine.OSX.2.00.1104271303040.4253@moat.nascom.nasa.gov>
References: <alpine.OSX.2.00.1104221337490.4253@moat.nascom.nasa.gov>	<4DB6F8B5.90106@psu.edu>	<alpine.OSX.2.00.1104261422220.4253@moat.nascom.nasa.gov>	<4DB84291.7070300@psu.edu>
	<alpine.OSX.2.00.1104271303040.4253@moat.nascom.nasa.gov>
Message-ID: <4DB85C21.2020009@psu.edu>

On 04/27/2011 01:53 PM, Joe Hourcle wrote:
>
> Perhaps we need to treat it more like IT security ...
>
>   	A password of (x) length and (y) complexity is expected
>   	to be good for about (z) time ... but as new attacks are
>   	found, (z) might decrease.
>
> ... and come up with recommendations for different time spans (if you want
> it to last (x) time, you should ...) or different levels of effort (if you
> only do (y), it'll likely be unreadable within (z) time.)

That's the approach that fits best, to my mind.

> And they seemed more geared towards ASCII, while it sounds like DFDL might
> be able to describe more complex stuff, like FITS&  VOTable.

Indeed, those seem to be the sorts of use cases DFDL was designed for -- 
again noting that I have not yet used DFDL (or Daffodil or Defuddle or 
any of the software in that space).

> (yes, yes, the not reading e-mail thing isn't working).

We hadn't noticed. ;)

-Mike


From cragin at illinois.edu  Wed Apr 27 16:12:44 2011
From: cragin at illinois.edu (Cragin, Melissa H)
Date: Wed, 27 Apr 2011 15:12:44 -0500
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>
	<8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>
	<6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu>
Message-ID: <CFF353563186834E894F010D7CCA39B6061DDACDC1@DSMAILBOX2.ad.uiuc.edu>

Hi,

  In seeing this bit that Sherry posted, I re-visited other NSF materials, and find that I need to correct what I wrote a couple of days ago (My apologies for not sorting this out prior to posting my response.)
While the requirements expressed by the Directorates vary to some extent, there is an FAQ posted by NSF that seems to apply across Directorates. It states:

"2. Is a plan for Data Management required if my project is not expected to generate data or samples?

     Yes. It is acceptable to state in the Data Management Plan that the project is not anticipated to generate data or samples that require management and/or sharing. PIs should note that the statement will be subject to peer review."  (http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp#2)

Jan Cheetham's post on sorting out the specifics of how such a meeting will be documented and what will be made public (and where) is very helpful advice.

Melissa


From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Lake, Sherry (sah)
Sent: Friday, April 22, 2011 3:33 PM
To: Research Data, Access and Preservation
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

As for the particular question asked in this e-mail thread, maybe page 4 of the Education & Human Resources (HER) Directorate DMP guidelines (http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf)  will help. This section of the guidelines list examples for EHR proposals to follow:

1. A proposal for a workshop that will result in a workshop report.
a. The DMP could consist of a statement to the effect that a workshop report will be produced and disseminated, e.g., via a website, publication in a journal, or other means.

Institute of Museum & Library Services (IMLS) is a little clearer on when a DMP is required. The DMP requiremnts is part 3 of the section "Specification for Projects that Develop Digital Products". So I assume, for IMLS, no digital products, no DMP required.

I wish the NSF would give a little more guidance as to what they want in a DMP. They could then tell us which proposals do not require one. I think looking at the specific solicitation should have that information, like Melissa said.

--
Sherry Lake                                              shlake at virginia.edu<mailto:shlake at virginia.edu>
Scientific Data Consultant<http://www.lib.virginia.edu/brown/data/>
Brown Science and Engineering Library   University of Virginia
* * * * * * * * * * * * * * * * *
       "A library is not a luxury but one of the necessities
                  of life." --- Henry Ward Beecher
* * * * * * * * * * * * * * * * *


From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of John Graybeal
Sent: Friday, April 22, 2011 12:56 AM
To: Research Data, Access and Preservation
Cc: rdap at mail.asis.org
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

Here's the way I think about this kind of thing:

1) You have done useful work by the time you finish (presumably).
2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs.
3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that?

If the basis is entirely on other papers, then those would presumably be cited in the report.  If it is strictly a thought process of a single group, then the report is the data.

But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect.  (They might validate your group's wisdom, or find great wisdom that your group missed.)  In these cases, I would say a very short Data Management Plan would be worth including.

john

On Apr 21, 2011, at 13:43, Aletia Morgan wrote:

Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed.

The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research.

Obviously, there's no experimental data here, and the output is essentially a narrative document.  The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not
required."

If a conference is being organized, notes are taken, documents are being written.   Is there a need to say how these records will be developed and preserved?  Or am I being too compulsive!

Thanks for any thoughts, and if there might be a better place to ask this question.

Regards,
Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus
ahmorgan at vpr.rutgers.edu<mailto:ahmorgan at vpr.rutgers.edu>
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org<mailto:Rdap at mail.asis.org>
http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110427/ff89e2fc/attachment.html>

From ahmorgan at vpr.rutgers.edu  Thu Apr 28 09:03:23 2011
From: ahmorgan at vpr.rutgers.edu (Aletia Morgan)
Date: Thu, 28 Apr 2011 09:03:23 -0400 (EDT)
Subject: [Rdap] Deciding when a Data Management Plan is not required?
In-Reply-To: <CFF353563186834E894F010D7CCA39B6061DDACDC1@DSMAILBOX2.ad.uiuc.edu>
References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu>	<8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu>	<6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu>
	<CFF353563186834E894F010D7CCA39B6061DDACDC1@DSMAILBOX2.ad.uiuc.edu>
Message-ID: <004001cc05a5$2c813630$8583a290$@rutgers.edu>

Good morning!

I appreciate everyone's comments - especially seeing that some of the
instructions from the different directorates are different.  And Sherry, I
am in full agreement with you that a lot of the written requirements are
just too vague at this point - I hope that NSF will post updates in the
not-too-distant future.  

 
I think this kind of clarification of intent will continue to be a topic
for some time.

 
Thanks,

Aletia

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research 
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus

ahmorgan at vpr.rutgers.edu
732-445-3344


From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of
Cragin, Melissa H
Sent: Wednesday, April 27, 2011 4:13 PM
To: Research Data, Access and Preservation
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

 
Hi,

  In seeing this bit that Sherry posted, I re-visited other NSF materials,
and find that I need to correct what I wrote a couple of days ago (My
apologies for not sorting this out prior to posting my response.)

While the requirements expressed by the Directorates vary to some extent,
there is an FAQ posted by NSF that seems to apply across Directorates. It
states:

"2. Is a plan for Data Management required if my project is not expected
to generate data or samples?

     Yes. It is acceptable to state in the Data Management Plan that the
project is not anticipated to generate data or samples that require
management and/or sharing. PIs should note that the statement will be
subject to peer review."
(http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp#2)

Jan Cheetham's post on sorting out the specifics of how such a meeting
will be documented and what will be made public (and where) is very
helpful advice.

Melissa

 
From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of
Lake, Sherry (sah)
Sent: Friday, April 22, 2011 3:33 PM
To: Research Data, Access and Preservation
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

 
As for the particular question asked in this e-mail thread, maybe page 4
of the Education & Human Resources (HER) Directorate DMP guidelines
(http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf)  will help. This
section of the guidelines list examples for EHR proposals to follow:

 
1. A proposal for a workshop that will result in a workshop report.

a. The DMP could consist of a statement to the effect that a workshop
report will be produced and disseminated, e.g., via a website, publication
in a journal, or other means.

 
Institute of Museum & Library Services (IMLS) is a little clearer on when
a DMP is required. The DMP requiremnts is part 3 of the section
"Specification for Projects that Develop Digital Products". So I assume,
for IMLS, no digital products, no DMP required.

 
I wish the NSF would give a little more guidance as to what they want in a
DMP. They could then tell us which proposals do not require one. I think
looking at the specific solicitation should have that information, like
Melissa said.

 
--

Sherry Lake
shlake at virginia.edu

Scientific Data Consultant <http://www.lib.virginia.edu/brown/data/> 

Brown Science and Engineering Library   University of Virginia

& & & & & & & & & & & & & & & & &

       "A library is not a luxury but one of the necessities

                  of life." --- Henry Ward Beecher

& & & & & & & & & & & & & & & & &

 
From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of
John Graybeal
Sent: Friday, April 22, 2011 12:56 AM
To: Research Data, Access and Preservation
Cc: rdap at mail.asis.org
Subject: Re: [Rdap] Deciding when a Data Management Plan is not required?

 
Here's the way I think about this kind of thing:

 
1) You have done useful work by the time you finish (presumably).

2) To the extent the useful work was based on discovered, collected,
observed, or otherwise modeled data, the conclusions are likely to depend
on the those inputs.

3) If someone wants to evaluate your conclusions in light of your inputs,
would they be able to do that?

 
If the basis is entirely on other papers, then those would presumably be
cited in the report.  If it is strictly a thought process of a single
group, then the report is the data.  

 
But if the basis is on brainstorming ideas from multiple groups, or having
people in the meeting each generate their own inputs, which are then
collated and massaged, or it relied on results that were on-line and might
be different tomorrow -- then it would be a 'best practice' to maintain
the original raw materials in a repository (say the web site where your
work is managed, if any) that others could inspect.  (They might validate
your group's wisdom, or find great wisdom that your group missed.)  In
these cases, I would say a very short Data Management Plan would be worth
including.

 
john

 
On Apr 21, 2011, at 13:43, Aletia Morgan wrote:

 
Greetings - I'm working with PIs on reviewing DMP documents that are part
of NSF proposals, and I am wondering about whether a DMP is needed.

 
The project involves staging a meeting with researchers in the discipline,
with a goal that includes the development of plans for future research.

 
Obviously, there's no experimental data here, and the output is
essentially a narrative document.  The PI asserts that "This proposal does
not seek to collect environmental data. A data management plan is not

required."

 
If a conference is being organized, notes are taken, documents are being
written.   Is there a need to say how these records will be developed and
preserved?  Or am I being too compulsive!

 
Thanks for any thoughts, and if there might be a better place to ask this
question.

 
Regards,

Aletia

 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Aletia Morgan
Research Application Designer
Office of the Vice President for Research 
   and Graduate & Professional Education
Rutgers, The State University of New Jersey
715 CoRE Building, Busch Campus

ahmorgan at vpr.rutgers.edu
732-445-3344

_______________________________________________
Rdap mailing list
Rdap at mail.asis.org
http://mail.asis.org/mailman/listinfo/rdap

 
----------------

John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162

Product Manager

Ocean Observatories Initiative Cyberinfrastructure Project:
http://ci.oceanobservatories.org

Marine Metadata Interoperability Project: http://marinemetadata.org

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kunverj.com/pipermail/rdap/attachments/20110428/ddc4582c/attachment.html>