From oneiros at grace.nascom.nasa.gov Fri Apr 8 14:48:47 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Fri, 8 Apr 2011 14:48:47 -0400 (EDT) Subject: [Rdap] Google Exacycle grant program Message-ID: If you (or you know of researchers) who need some high CPU, but low I/O, for highly parallelized processing, Google's offering time on their cluster. It looks like it's only open to academics, and you have to write the code in C or C++. Deadline for proposals is May 31st. More details at: http://research.google.com/university/exacycle_program.html -Joe From oneiros at grace.nascom.nasa.gov Fri Apr 8 17:24:48 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Fri, 8 Apr 2011 17:24:48 -0400 (EDT) Subject: [Rdap] possible 'relaunch' of data.gov? Message-ID: I know it was mentioned during the RDAP meeting that data.gov was getting de-funded, but there's tenatively some workshops scheduled next week (assuming the furlough is over by Thursday), for comments by: "Agency POCs, Open Government Working Group, Data Stewards and New Media community in the Federal Government" If you have a '.gov' email address, you can register at: http://www.socrata.com/datagov/workshop/ (and then manage to get the Washington, DC on the morning of the 14th, if the furlough's been lifted). But, if anyone else has comments to make, I'd be willing to relay them, as I've already registered. (and I should say ... as it's a '.com' address, I have no idea if this is from someone who's actually contracted to work on data.gov, if this is going to be the equivalent to someone trying to sell me a timeshare, or what ... I'll try to report back when I know more) -Joe ----- Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center From tinajayroe at gmail.com Sat Apr 9 14:18:45 2011 From: tinajayroe at gmail.com (Tina Jayroe) Date: Sat, 9 Apr 2011 12:18:45 -0600 Subject: [Rdap] "Save the Data" action we can take Message-ID: Hello Everyone, This topic came up at RDAP and I saw it retweeted by @TomBaker. You may want to use the link below to take some action. Here's the tweet as well as the gist of the site: *RT @**timberners_lee* *: Sign at ** http://sunlightfoundation.com/savethedata/* * and retweet please** * Save the Data! Some of the most important technology programs that keep Washington accountable are in danger of being eliminated. Data.gov, USASpending.gov, the IT Dashboard and other federal data transparency and government accountability programs are facing a massive budget cut, despite only being a tiny fraction of the national budget. Help save the data and make sure that Congress doesn?t leave the American people in the dark. Things you can do now! 1. Sign onto our Letter to Congress Sign on to our open letter to Congressional leaders asking them to protect funding for the Electronic Government Fund. You can read the full letter here. Sign Now! 2. Call your representatives Making sure your Representative and Senators hear from you directly is one of the most powerful actions you can take. Call now and tell them to Save the Data by protecting the Electronic Government fund and the crucial transparency programs it supports. Call Now 3. Write a letter to the editor Writing a letter to the editor of your local paper is an easy way to raise awareness and a powerful way to make your voice heard. It?s easy, too ? we?ll help you each step of the way. Get Started on Your Letter 4. Spread the word On Twitter or Facebook? Please share this far and wide. Are you a blogger? You can help by writing a blog post ? and share the link with us here so we can help promote it! -------------- next part -------------- An HTML attachment was scrubbed... URL: From mweaver at asis.org Fri Apr 15 14:12:51 2011 From: mweaver at asis.org (Melissa Weaver) Date: Fri, 15 Apr 2011 11:12:51 -0700 Subject: [Rdap] Call for Work-in-Progress Papers -- DC-SAM workshop on Research and Best Practices in Linking Scientific Metadata Message-ID: *Message to RDAP discussion list from Dr. Jian Qin, Syracuse University:* Please feel free to distribute the call and excuse the multiple postings. * * *A DC-SAM Workshop: Research and Best Practices in Linking Scientific Metadata* September 29, 2011, Berlin, Germany To be held as part of the Theory and Practice in Digital Libraries (TPDL) Conference (September 25-29, 2011, Berlin, Germany) * * *Call for Work-in-Progress Papers* Describing scientific research data can be challenging due to their complexity and diversity. Standards for describing scientific datasets include not only entities responsible for data collection, processing, and distribution, but also information for data users to assess the relevancy to their data needs, quality of datasets, as well as technicalities regarding data file manipulation. Although scientific metadata schemes address a range of needs for data identification, quality assessment, verifiability, and dissemination, they do not fully address the challenges related to metadata generation and islands of information exist within and across scientific metadata records. One step towards addressing these challenges and problems is to have information scientists and domain scientists collaborate to evolve existing solutions in web-friendly ways. This one-day workshop will feature invited speakers from science and information science in the morning sessions and selected work-in-progress reports and interactive discussion in the afternoon. The DC-SAM (DCMI Science and Metadata Community) workshop will include three parts: a morning session consists of invited speakers from both science and information science, a working lunch with focus group discussion, and an afternoon session for work-in-progress reports. This call is soliciting submissions of work-in-progress reports for the afternoon session. We are especially interested in, but not limited to, the following topics: ? -- Identification systems and standards for scientific metadata ? -- Scientific metadata architecture and models in Semantic Web ? -- Interoperable taxonomies and vocabularies in [biology, astronomy, etc.] ? -- Metadata linking mechanisms and technologies and their applications in scientific metadata ? -- Organizational and technical challenges in linking scientific metadata The outcomes of this workshop are expected to be a collection of research papers/reports and a research agenda in this increasingly important area, which will be made available on DC-SAM community website. Researchers are invited to submit reports for their projects relevant to the theme of this workshop, which are either work in progress or completed. The work-in-progress papers should be no more than six (6) pages, single spaced, (approximately 3,000 words including abstract and references). The submission should include the following components: ? -- Title of the paper ? -- Author(s) and affiliations ? -- Email addresses of authors ? -- Project website ? -- Abstract ? -- Tags or keywords ? -- Body of paper ? -- References cited The papers will be reviewed by the workshop program committee and selected based on originality, coherence, clarity, and appropriateness for the workshop. Each selected paper will be given a 15-minute slot for oral presentation. *Important dates:* ? Deadline for submitting papers: June 12, 2011, midnight (your local time) ? Notification of acceptance: June 30, 2011 ? Final version of accepted papers: August 15, 2011 Please submit your paper to dc.sam.workshop at gmail.com. All inquiries and questions can be sent to the same email address. A copy of this call can be found on the workshop website http://eslib.ischool.syr.edu/SAM/. Information about the workshop will be updated as it becomes available. *Workshop co-organizers:* Jian Qin , Syracuse University, USA Jane Greenberg , University of North Carolina at Chapel Hill, USA Norman Gray , University of Glasgow, UK Jian Qin, Ph.D. Associate Professor School of Information Studies Syracuse University 311 Hinds Hall Syracuse, NY 13244 Phone: 315-443-5642 Fax: 315-443-5806 http://ischool.syr.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Fri Apr 15 14:58:56 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Fri, 15 Apr 2011 14:58:56 -0400 (EDT) Subject: [Rdap] Next Generation Data.gov Message-ID: Yesterday, I attended a workshop at the GSA on the 'next generation data.gov'. After a survey in the breakout session, I realized I was the only non-manager who attended, and it was for the session on APIs that they said was the 'more technical' session. (I admit, I wasn't actually invited directly, I was forwarded the invite by my ATR (the civil servant who directs my work ... pretty much my boss)) Anyway, the basic summary is this: * data.gov is being upgraded. * they're using a platform from Socrata : http://www.socrata.com/ It's basically a shared dataset hosting platform, with some built in tools for interaction & visualization. * people will be able to interact with the data, not just download files, provided that as part of the submission process, you actually describe the columns, etc. You can also pre-define common filters, sorting, views, visualizations, etc. * there's also a 'social' component (one of those things that we mentioned in the RDAP 'Future of Digital Libraries'). They said that the owner for each dataset could define what level of social interaction was allowed. I'm not sure all of what was allowed (I think they mentioned commenting and defining views for other people to use), but they could be moderated or disallowed. * you can run a local server to expose your data using their API, and then just register it with them, and it'll make calls to your server to get the data. * application developers can register to get an ID to use the API. it'll automatically rate throttle any that are being too abusive, but it also allows for dataset owners to see who's using their data, or for end users to see what tools have been built to use the data. (there was also a request for data owners to be able to send a message to all of the developers using their data, so they could warn of possible upcoming changes) Anyway ... They showed off a lot of cool features that'd probably be useful for most tabular data. They showed an import screen that had options for 'dataset', 'chart', 'calendar', and there might've been a forth, but I got up too early, and can't remember. I told 'em I had a few million images, and it didn't sound like they were really geared towards that ... maybe to serve the catalog of the data, but not the data itself) I asked about putting NSF research data in there, and I got a kinda roundabout answer about how it'd have to be approved through the 'normal agency channels', and I don't know if NSF would want us mixing this type of research 'data' in with their other 'data'. Due to the nature of what's being done, I don't think it'd qualify under TRAC, so you'd likely want a separate archival copy of the data, but I could be mistaken. They offer a generic API for serving tabular data, (Socrata Open Data API) so it's possible that other people could implement it, even if you don't want to license their product, or you could write something to harvest the various data sources. I haven't looked into the spec, so I don't know how hard it'd be to try to translate between something like IVOA TAP (http://www.ivoa.net/Documents/TAP/). ... It looks like Socrata's updating their website right now, but I wasn't given any sort of an NDA to sign, and I'm assuming they'd need some load testing, etc, so once it's back up: Beta of the new site. (you have to register ... it let me in immediately, but I used a '.gov' address, and it's down right now, so I can't test with one of my other addresses): http://datagov.socrata.com/ Documentation (for data submission, API usage, etc) http://dev.socrata.com/ -Joe ---------- Forwarded message ---------- Date: Thu, 14 Apr 2011 21:05:22 -0500 From: "hyon.kim at gsa.gov" To: "marion.royal at gsa.gov" Cc: "chris.metcalf at socrata.com" , "charles at socrata.com" , "saf.rabah at socrata.com" Subject: Next Generation Data.gov Platform - Link to Workshop Materials Thank you for your interest in the Next Generation Data.gov Platform.? We have posted the agenda, presentations and the Getting Started Guide at the following link: http://www.socrata.com/datagov/workshop/presentations/ We will be following up with those of you who expressed interest in participating in the new platform. We will keep you informed of our progress as we move toward the launch of the Next Generation Data.gov Platform. Thank you. Hyon Kim Deputy Program Director Data.gov (202) 694 8148 From ahmorgan at vpr.rutgers.edu Thu Apr 21 16:43:20 2011 From: ahmorgan at vpr.rutgers.edu (Aletia Morgan) Date: Thu, 21 Apr 2011 16:43:20 -0400 (EDT) Subject: [Rdap] Deciding when a Data Management Plan is not required? Message-ID: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there's no experimental data here, and the output is essentially a narrative document. The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not required." If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 -------------- next part -------------- An HTML attachment was scrubbed... URL: From cragin at illinois.edu Thu Apr 21 22:10:20 2011 From: cragin at illinois.edu (Cragin, Melissa H) Date: Thu, 21 Apr 2011 21:10:20 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" > wrote: Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not required.? If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap From cragin at illinois.edu Thu Apr 21 22:10:20 2011 From: cragin at illinois.edu (Cragin, Melissa H) Date: Thu, 21 Apr 2011 21:10:20 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" > wrote: Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not required.? If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap From cragin at illinois.edu Thu Apr 21 22:21:17 2011 From: cragin at illinois.edu (Cragin, Melissa H) Date: Thu, 21 Apr 2011 21:21:17 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: <7EF7518C-10D2-4CB3-B408-92E7C08716FA@ad.uiuc.edu> Dear Aletia, It is not likely that you need a DMP for a proposal to develop and convene a meeting. Is the proposal a response to a formal call? If so, there is generally a clear set of requirements available on the web page for that NSF program, and you'll be able to see what's involved. Best, Melissa On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" > wrote: Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not required.? If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap From cragin at illinois.edu Thu Apr 21 22:21:17 2011 From: cragin at illinois.edu (Cragin, Melissa H) Date: Thu, 21 Apr 2011 21:21:17 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: <7EF7518C-10D2-4CB3-B408-92E7C08716FA@ad.uiuc.edu> Dear Aletia, It is not likely that you need a DMP for a proposal to develop and convene a meeting. Is the proposal a response to a formal call? If so, there is generally a clear set of requirements available on the web page for that NSF program, and you'll be able to see what's involved. Best, Melissa On Apr 21, 2011, at 8:57 PM, "Aletia Morgan" > wrote: Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not required.? If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap From jgraybeal at ucsd.edu Fri Apr 22 00:55:51 2011 From: jgraybeal at ucsd.edu (John Graybeal) Date: Thu, 21 Apr 2011 21:55:51 -0700 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> Here's the way I think about this kind of thing: 1) You have done useful work by the time you finish (presumably). 2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs. 3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that? If the basis is entirely on other papers, then those would presumably be cited in the report. If it is strictly a thought process of a single group, then the report is the data. But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect. (They might validate your group's wisdom, or find great wisdom that your group missed.) In these cases, I would say a very short Data Management Plan would be worth including. john On Apr 21, 2011, at 13:43, Aletia Morgan wrote: > Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. > > The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. > > Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not > required.? > > If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! > > Thanks for any thoughts, and if there might be a better place to ask this question. > > Regards, > Aletia > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Aletia Morgan > Research Application Designer > Office of the Vice President for Research > and Graduate & Professional Education > Rutgers, The State University of New Jersey > 715 CoRE Building, Busch Campus > ahmorgan at vpr.rutgers.edu > 732-445-3344 > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgraybeal at ucsd.edu Fri Apr 22 00:55:51 2011 From: jgraybeal at ucsd.edu (John Graybeal) Date: Thu, 21 Apr 2011 21:55:51 -0700 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> Here's the way I think about this kind of thing: 1) You have done useful work by the time you finish (presumably). 2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs. 3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that? If the basis is entirely on other papers, then those would presumably be cited in the report. If it is strictly a thought process of a single group, then the report is the data. But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect. (They might validate your group's wisdom, or find great wisdom that your group missed.) In these cases, I would say a very short Data Management Plan would be worth including. john On Apr 21, 2011, at 13:43, Aletia Morgan wrote: > Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. > > The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. > > Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not > required.? > > If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! > > Thanks for any thoughts, and if there might be a better place to ask this question. > > Regards, > Aletia > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Aletia Morgan > Research Application Designer > Office of the Vice President for Research > and Graduate & Professional Education > Rutgers, The State University of New Jersey > 715 CoRE Building, Busch Campus > ahmorgan at vpr.rutgers.edu > 732-445-3344 > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Fri Apr 22 11:35:00 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Fri, 22 Apr 2011 11:35:00 -0400 (EDT) Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: On Thu, 21 Apr 2011, Aletia Morgan wrote: > > Greetings ? I?m working with PIs on reviewing DMP documents that are > part of NSF proposals, and I am wondering about whether a DMP is needed. > > The project involves staging a meeting with researchers in the > discipline, with a goal that includes the development of plans for > future research. > > Obviously, there?s no experimental data here, and the output is > essentially a narrative document.? The PI asserts that ?This proposal > does not seek to collect environmental data. A data management plan is > not required.? It's observational data, not experimental. And if you're collecting information about the attitudes of people, I'd consider that to be environmental, personally. > If a conference is being organized, notes are taken, documents are being > written.?? Is there a need to say how these records will be developed > and preserved?? Or am I being too compulsive! I think this goes back to the discussion at RDAP11, where some people didn't think of what they were collecting as 'data', but I'd argue that the notes taken, and recordings of the meetings, etc, are in fact data. If there were some sort of coding done on the participants responses or overall attitudes, or results of any polls/voting at the meeting, all of those would be more easily recognizable as 'data', but I'd argue that notes, even with inherent bias and lack of completeness are in fact a form of 'data'. I'm actually participating in multiple side discussions right now on the nature of what 'data' is; my point of view had been from the digital sensor aspect, but in this case it's the recordings by some observer (in this case, a person, not a sensor) that are going to be used as the supporting basis for any results. I'd say that the notes on planning the meetings might not be 'data', but the agenda and materials given to participants for the meetings are important documents to archive, as they're the equivalent of the experimental plan -- knowing how you presented information to the participants to elicit the responses that you received might either be useful to someone trying to repeat the process in another discipline, repeat it with a larger audience, or repeat it after time has elapsed to see if attitudes have changed. There might also be something in that plan that could hint at potential bias in the results. ... It may also be that the apparent lack of some sort of hard data is a problem in itself. (I don't know how the specific grant, or what's normal for that area, so this might not be the case) ... but if you wanted to make sure that there was at least *some* true 'data' as a result of the meetings, you could distribute a short questionaire to each of the participants (yes, I know, that means getting IRB involved), so there was at least some 'data' that came out of the meetings ... even if it's just on how well they though they meetings went and comments for how to improve them. > Thanks for any thoughts, and if there might be a better place to ask > this question. If there are any discussion groups for the specific discipline covered by the grant (even if that isn't the precise discipline the researchers might be in), I'd also ask there. ... but I think this goes into what what Cliff Lynch said were the basic parts of the NSF DMPs ... the first step is to figure out what the products of the effort are valuable and worth keeping ... and he did say, it's possible that there aren't any. I personally think there are, even if it's just to allow someone else to review how you conducted the meetings, so they can repeat them or improve on them, and thus save time for the next iteration. Is it data or documentation? Well, I'm not going to get into that argument, but whatever it is, I think it does have value. -Joe ----- Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center From bdwestra at gmail.com Fri Apr 22 12:07:17 2011 From: bdwestra at gmail.com (Brian Westra) Date: Fri, 22 Apr 2011 09:07:17 -0700 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: Hi Aletia, I'd echo the comments of others on this list. By the way, the ARL data sharing support group list ( http://groups.google.com/group/arl-data-sharing-support-group/?pli=1), and the Research Digital Management list ( https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=RESEARCH-DATAMAN) are also good places to post this type of question. Thanks, Brian -- Brian Westra Lorry I. Lokey Science Data Services Librarian University of Oregon -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Fri Apr 22 14:11:37 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Fri, 22 Apr 2011 14:11:37 -0400 (EDT) Subject: [Rdap] What makes an 'Archive Quality' Digital Object? Message-ID: As part of the side discussion about defining what a dataset is, an issue got raised about what about a given object makes it of 'archive quality'? I know there's the TRAC checklist to look at what archives need to do, and I've seen various guidance on digitization of images, movies or audio on what resolution you should digitize at, and which formats you should use, and what metadata to attach. ... but for people who are writing the various standards used for storing scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any recommendations on what sort of features / documentation / metadata are necessary to be of 'archival' quality? (either as part of the standard, or as metadata that needs to be in each object being tracked) -Joe ----- Joe Hourcle Programmer/Analyst Solar Data Analysis Center Goddard Space Flight Center From cheetham at doit.wisc.edu Fri Apr 22 12:14:18 2011 From: cheetham at doit.wisc.edu (Jan Cheetham) Date: Fri, 22 Apr 2011 11:14:18 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> Message-ID: <22DA497A-63E4-4F52-89A2-1B3DA2065EF3@doit.wisc.edu> One of the members of the digital curation team at my campus helped a PI write a DMP for a project that sounds similar to the one you're describing, i.e. it involves establishing a working group of researchers to brainstorm research approaches to solve a particular problem, develop a model, and disseminate the model and reports. It's not appropriate to share the DMP at this point (since its currently under review by NSF) but I can describe the approach it took in general terms. It broke down the types of data in a table. This included things like reports from the group, the model design, etc. It described formats of these documents (PDF and spreadsheets, mostly) and what type of metadata (eg. author names, dates, copyright, etc) would be embedded in them. The data sharing section described how documents would be shared and accessed by members of the research group during the "discovery" period (including the URLs of relevant wikis and file sharing sites) and also how the results would be disseminated afterwards, including who the target audiences would be, URLs for sites where the results would be published, and stated what the policies for re-use of results would be Finally, there was a section on long term archiving, which mentioned how long documents would be kept, on whose computer, etc., and plans to archive/share the results with archive.org. In total, this DMP is about 1.5 pages long. Of course, it's too early to know how successful this approach will be with NSF reviewers but I thought it might help to share this much. Jan On Apr 21, 2011, at 3:43 PM, Aletia Morgan wrote: > Greetings ? I?m working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. > > The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. > > Obviously, there?s no experimental data here, and the output is essentially a narrative document. The PI asserts that ?This proposal does not seek to collect environmental data. A data management plan is not > required.? > > If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! > > Thanks for any thoughts, and if there might be a better place to ask this question. > > Regards, > Aletia > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Aletia Morgan > Research Application Designer > Office of the Vice President for Research > and Graduate & Professional Education > Rutgers, The State University of New Jersey > 715 CoRE Building, Busch Campus > ahmorgan at vpr.rutgers.edu > 732-445-3344 > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap Jan Cheetham, Ph.D. Research Technologies Consultant Division of Information Technology (DoIT) University of Wisconsin-Madison 1401 University Ave Madison, Wisconsin 608-262-2947 cheetham at doit.wisc.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From sah at virginia.edu Fri Apr 22 16:32:35 2011 From: sah at virginia.edu (Lake, Sherry (sah)) Date: Fri, 22 Apr 2011 20:32:35 +0000 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> Message-ID: <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu> As for the particular question asked in this e-mail thread, maybe page 4 of the Education & Human Resources (HER) Directorate DMP guidelines (http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf) will help. This section of the guidelines list examples for EHR proposals to follow: 1. A proposal for a workshop that will result in a workshop report. a. The DMP could consist of a statement to the effect that a workshop report will be produced and disseminated, e.g., via a website, publication in a journal, or other means. Institute of Museum & Library Services (IMLS) is a little clearer on when a DMP is required. The DMP requiremnts is part 3 of the section "Specification for Projects that Develop Digital Products". So I assume, for IMLS, no digital products, no DMP required. I wish the NSF would give a little more guidance as to what they want in a DMP. They could then tell us which proposals do not require one. I think looking at the specific solicitation should have that information, like Melissa said. -- Sherry Lake shlake at virginia.edu Scientific Data Consultant Brown Science and Engineering Library University of Virginia * * * * * * * * * * * * * * * * * "A library is not a luxury but one of the necessities of life." --- Henry Ward Beecher * * * * * * * * * * * * * * * * * From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of John Graybeal Sent: Friday, April 22, 2011 12:56 AM To: Research Data, Access and Preservation Cc: rdap at mail.asis.org Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? Here's the way I think about this kind of thing: 1) You have done useful work by the time you finish (presumably). 2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs. 3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that? If the basis is entirely on other papers, then those would presumably be cited in the report. If it is strictly a thought process of a single group, then the report is the data. But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect. (They might validate your group's wisdom, or find great wisdom that your group missed.) In these cases, I would say a very short Data Management Plan would be worth including. john On Apr 21, 2011, at 13:43, Aletia Morgan wrote: Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there's no experimental data here, and the output is essentially a narrative document. The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not required." If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Mon Apr 25 14:17:41 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Mon, 25 Apr 2011 14:17:41 -0400 (EDT) Subject: [Rdap] Presentations from RDAP 2011 Message-ID: Sorry, I don't think anything went to the list about this -- Melissa Weaver placed the presentations up on SlideShare: http://www.slideshare.net/asist_org/ I'm also not sure what the etiquitte is for pre-print stuff for the ASIS&T Bulletin, but I have a summary written up (which well, had to get trimmed for space), but it's on a different machine, so I'll have to post it later ... and I'll be posting it around anyway, as part of my trip report, so it'll appear on a few mailing lists. I already have my rough notes online, but I don't know how useful they are on their own: http://vso1.nascom.nasa.gov/joe/notes/rdap/RDAP_2011_notes.txt I tried to make sure to record the Q/A parts, but well, there's a few gaps where either I was participating in the discussion, or I was distracted (I admit, I was sending e-mail ... but some of it was because of stuff I was learning at the meeting) -Joe From Peter.Wittenburg at mpi.nl Mon Apr 25 15:26:56 2011 From: Peter.Wittenburg at mpi.nl (Peter Wittenburg) Date: Mon, 25 Apr 2011 21:26:56 +0200 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: Message-ID: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl> Hallo Joe - an answer from Europe. There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future. With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course). Here are a few references which are easy to read (Short Guides): - http://www.isocat.org (concept registry) - http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides) - http://www.datasealofapproval.org/ - http://www.mpi.nl/tla (our unit) best Peter > -----Original Message----- > From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of > Joe Hourcle > Sent: Friday, April 22, 2011 8:12 PM > To: Research Data, Access and Preservation > Subject: [Rdap] What makes an 'Archive Quality' Digital Object? > > > > As part of the side discussion about defining what a dataset is, an issue got > raised about what about a given object makes it of 'archive quality'? > > > I know there's the TRAC checklist to look at what archives need to do, and > I've seen various guidance on digitization of images, movies or audio on > what resolution you should digitize at, and which formats you should use, > and what metadata to attach. > > ... but for people who are writing the various standards used for storing > scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any > recommendations on what sort of features / documentation / metadata are > necessary to be of 'archival' quality? (either as part of the standard, > or as metadata that needs to be in each object being tracked) > > -Joe > > ----- > Joe Hourcle > Programmer/Analyst > Solar Data Analysis Center > Goddard Space Flight Center > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap From michael at psu.edu Tue Apr 26 12:54:13 2011 From: michael at psu.edu (Michael J. Giarlo) Date: Tue, 26 Apr 2011 12:54:13 -0400 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: Message-ID: <4DB6F8B5.90106@psu.edu> On 04/22/2011 02:11 PM, Joe Hourcle wrote: > > As part of the side discussion about defining what a dataset is, an issue > got raised about what about a given object makes it of 'archive quality'? > That's a doozy, Joe! Good question to ponder as a group. This strikes me as something of a glib non-answer, but here goes. The first bit I have trouble wrapping my mind around is that the notion of archival quality is binary, that an object can be of archival quality or not. To be perfectly frank, I don't think we know enough yet about what archival quality really means in the digital context; *most* of us have been managing files for, what, 10-15 years? Combining these thoughts, archival quality feels like a bit of guesswork informed by our professional expertise, more like a confidence level associated with an object being preserved based upon: * Retention period -- I have fair-to-high confidence that just about any digital object is preservable for 1 year, and very low confidence that just about any digital object is preservable for 100. * File formats -- I have higher confidence in open formats than proprietary formats, in widely used rather than narrowly used formats, in self-descriptive rather than opaque formats. * Metadata/documentation richness -- The richer and more interoperable and more widely understood the metadata, the higher my confidence will be. * Use contexts -- I'd worry a lot more about an object with few or no use contexts ("dark archives") than others. * A whole lot more that escapes me, such as issues around the archive itself and its organizational, technical, and financial sustainability. Thanks for raising the question, Joe. -Mike From Mark.Conrad at nara.gov Tue Apr 26 13:36:01 2011 From: Mark.Conrad at nara.gov (Mark Conrad) Date: Tue, 26 Apr 2011 13:36:01 -0400 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl> References: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl> Message-ID: <4DB6CA4102000097000081A1@gwiaa.nara.gov> ISO 16363 is in draft. It replaces TRAC. I understand that in Europe there will be three levels of certification for a digital repository. DSA will be one level ISO 16363 will be another level. Mark Conrad NARA Center for Advanced Systems and Technologies NHA The National Archives and Records Administration Erma Ora Byrd Conference and Learning Center Building 494 Second Floor 610 State Route 956 Rocket Center, WV 26726 Phone: 304-726-7820 Fax: 304-726-7802 Email: mark.conrad at nara.gov http://www.facebook.com/NARACAST http://www.archives.gov/ncast/ >>> Peter Wittenburg 4/25/2011 3:26 PM >>> Hallo Joe - an answer from Europe. There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future. With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course). Here are a few references which are easy to read (Short Guides): - http://www.isocat.org (concept registry) - http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides) - http://www.datasealofapproval.org/ - http://www.mpi.nl/tla (our unit) best Peter > -----Original Message----- > From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of > Joe Hourcle > Sent: Friday, April 22, 2011 8:12 PM > To: Research Data, Access and Preservation > Subject: [Rdap] What makes an 'Archive Quality' Digital Object? > > > > As part of the side discussion about defining what a dataset is, an issue got > raised about what about a given object makes it of 'archive quality'? > > > I know there's the TRAC checklist to look at what archives need to do, and > I've seen various guidance on digitization of images, movies or audio on > what resolution you should digitize at, and which formats you should use, > and what metadata to attach. > > ... but for people who are writing the various standards used for storing > scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any > recommendations on what sort of features / documentation / metadata are > necessary to be of 'archival' quality? (either as part of the standard, > or as metadata that needs to be in each object being tracked) > > -Joe > > ----- > Joe Hourcle > Programmer/Analyst > Solar Data Analysis Center > Goddard Space Flight Center > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap -------------- next part -------------- An HTML attachment was scrubbed... URL: From Peter.Wittenburg at mpi.nl Tue Apr 26 14:03:49 2011 From: Peter.Wittenburg at mpi.nl (Peter Wittenburg) Date: Tue, 26 Apr 2011 20:03:49 +0200 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <4DB6CA4102000097000081A1@gwiaa.nara.gov> References: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl> <4DB6CA4102000097000081A1@gwiaa.nara.gov> Message-ID: <147889AFDB014A43A5A9318270C826C9BA100C368A@MAILER.mpi.nl> Yes Mark this is correct. peter From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Mark Conrad Sent: Tuesday, April 26, 2011 7:36 PM To: Access and Preservation Research Data Subject: Re: [Rdap] What makes an 'Archive Quality' Digital Object? ISO 16363 is in draft. It replaces TRAC. I understand that in Europe there will be three levels of certification for a digital repository. DSA will be one level ISO 16363 will be another level. Mark Conrad NARA Center for Advanced Systems and Technologies NHA The National Archives and Records Administration Erma Ora Byrd Conference and Learning Center Building 494 Second Floor 610 State Route 956 Rocket Center, WV 26726 Phone: 304-726-7820 Fax: 304-726-7802 Email: mark.conrad at nara.gov http://www.facebook.com/NARACAST http://www.archives.gov/ncast/ >>> Peter Wittenburg > 4/25/2011 3:26 PM >>> Hallo Joe - an answer from Europe. There are by the way two procedures to assess quality: one is (T)RAC as you specify and the other is DSA (Data Seal of Approval). While the first is a heavy procedure, the second is a more lightweight process. We have chosen to go for DSA which seems to be appropriate for us storing also patrimonial data which needs to be preserved for the future. With respect to metadata I think that it is the responsibility of the community to come up with an agreed set of elements. In our domain (languages) we are using CMDI (Component Metadata Infrastructure) allowing everyone to define his own components and profiles, but requesting everyone to use the elements and vocabularies registered in ISOcat which is based on the ISO 12620 standard. But this is just one example. I know that the climate researchers and astronomers use different element sets (of course). Here are a few references which are easy to read (Short Guides): - http://www.isocat.org (concept registry) - http://www.clarin.eu/external/index.php?page=publications&sub=3 (metadata & concept registry short guides) - http://www.datasealofapproval.org/ - http://www.mpi.nl/tla (our unit) best Peter > -----Original Message----- > From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of > Joe Hourcle > Sent: Friday, April 22, 2011 8:12 PM > To: Research Data, Access and Preservation > Subject: [Rdap] What makes an 'Archive Quality' Digital Object? > > > > As part of the side discussion about defining what a dataset is, an issue got > raised about what about a given object makes it of 'archive quality'? > > > I know there's the TRAC checklist to look at what archives need to do, and > I've seen various guidance on digitization of images, movies or audio on > what resolution you should digitize at, and which formats you should use, > and what metadata to attach. > > ... but for people who are writing the various standards used for storing > scientific data (eg, CDF, NetCDF, HDF, FITS, VOTable), are there any > recommendations on what sort of features / documentation / metadata are > necessary to be of 'archival' quality? (either as part of the standard, > or as metadata that needs to be in each object being tracked) > > -Joe > > ----- > Joe Hourcle > Programmer/Analyst > Solar Data Analysis Center > Goddard Space Flight Center > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at psu.edu Tue Apr 26 14:23:35 2011 From: michael at psu.edu (Michael J. Giarlo) Date: Tue, 26 Apr 2011 14:23:35 -0400 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <4DB6CA4102000097000081A1@gwiaa.nara.gov> References: <147889AFDB014A43A5A9318270C826C9BA100C35E1@MAILER.mpi.nl> <4DB6CA4102000097000081A1@gwiaa.nara.gov> Message-ID: <4DB70DA7.1080808@psu.edu> On 04/26/2011 01:36 PM, Mark Conrad wrote: > ISO 16363 is in draft. It replaces TRAC. I understand that in Europe > there will be three levels of certification for a digital repository. > DSA will be one level ISO 16363 will be another level. > There's a bit more information about this in the January 2011 D-Lib (down towards the bottom): http://www.dlib.org/dlib/january11/klump/01klump.html -Mike From oneiros at grace.nascom.nasa.gov Tue Apr 26 15:30:05 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Tue, 26 Apr 2011 15:30:05 -0400 (EDT) Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <4DB6F8B5.90106@psu.edu> References: <4DB6F8B5.90106@psu.edu> Message-ID: On Tue, 26 Apr 2011, Michael J. Giarlo wrote: > On 04/22/2011 02:11 PM, Joe Hourcle wrote: >> >> As part of the side discussion about defining what a dataset is, an issue >> got raised about what about a given object makes it of 'archive quality'? >> > > That's a doozy, Joe! Good question to ponder as a group. > > This strikes me as something of a glib non-answer, but here goes. > > The first bit I have trouble wrapping my mind around is that the notion > of archival quality is binary, that an object can be of archival quality > or not. Maybe I should be asking the opposite -- What would make a digital object *not* of archival quality? Although, even then, I can think of a few cases where even something stored in a proprietary file format, which needs proprietary software to be read, which runs on hardware that's no longer manufactured ... is considered 'archived' because they either stored the hardware, or managed to emulate the machine and OS so they could run the software. ... I guess what I'm really looking for is something like TRAC or Data Seal of Approval, but instead of guidelines for the repository, focuses on the qualities of a good self-documenting file or package that could later be ingested by an archive for them to maintain. In some ways, this is difficult, as without the associated archive, you won't necessarily have a useful identifier scheme. (as an identifier without a place to register it in my mind is pretty useless, unless you're just using it to try to show that two objects are likely not the same thing) ... And then for some reason, as I was writing this e-mail, I realized I may already have something close to what I'm looking for, but it probably needs to be more explicit in some regards: http://sdac.virtualsolar.org/catalogs/catalog_checklist I wrote this checklist to give to scientists who are maintaining 'science catalogs' (basically, lists of ... something). In solar physics, they tend to be stored & distributed in ASCII text files, HTML tables, or MS Excel files. Sometimes, they're distributed as PDF files or FITS tables. It's not a perfect correlation into what's ideal for archiving ... but it does get into some of the issues of what file formats you're using, and what sort of documentation you should be providing, although some of it's catalog-specific. ... And, to spur discussion, as I said this came up because of another conversation, here was my response before I threw it out to this group: > > My intent [] is to answer a very common question which is what does > > "Archive Quality" mean when it comes to digital data. It seems its > > always two-sides of the same coin: Preservation and Usability. > > I thought that was what you were going for ... I'd say to be of 'archive > quality', it must be: > > 1. In a format that's well documented, and with the standard > freely available such that software to read the files could be > re-implemented from only the documentation. > > 2. Have enough metadata attached to the object such that it can be > identified, and the necessary scientific metadata for use can > be obtained. (the science metadata might be stored in a > separate catalog, but then the object would need to identify > the catalog and the necessary foreign keys into the catalog) > > 3. As part of that 'scientific metadata', identify what collection > it's a part of, for purposes of retrieving documentation on how > to use the data. > > 4. Have some way of identifying if there has been corruption of > the data from what was originally stored. (I won't get > into how sensitive it needs to be; if just identifying it > occurred is enough, or if you have to be able to track down > which discrete datum is bad) > > There are other characteristics which are more about how it's being > stored than about the object itself. (eg, don't lose it, make sure it's > actually available for use) -Joe From jgraybeal at ucsd.edu Tue Apr 26 19:06:39 2011 From: jgraybeal at ucsd.edu (John Graybeal) Date: Tue, 26 Apr 2011 16:06:39 -0700 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: <4DB6F8B5.90106@psu.edu> Message-ID: <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> That's a neat catalog checklist! I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata: http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc. Two lessons from that progression: - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it. In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?). In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way. - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media. Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later. If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data. Ideally this transformation is lossless (lossless by design, even). If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc. We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around. Different technological choices appropriate in each case. John On Apr 26, 2011, at 12:30, Joe Hourcle wrote: > And then for some reason, as I was writing this e-mail, I realized I may > already have something close to what I'm looking for, but it probably > needs to be more explicit in some regards: > > http://sdac.virtualsolar.org/catalogs/catalog_checklist > > I wrote this checklist to give to scientists who are maintaining 'science > catalogs' (basically, lists of ... something). In solar physics, they > tend to be stored & distributed in ASCII text files, HTML tables, or MS > Excel files. Sometimes, they're distributed as PDF files or FITS tables. > > -Joe > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org From rduerr at nsidc.org Tue Apr 26 23:59:15 2011 From: rduerr at nsidc.org (Ruth Duerr) Date: Tue, 26 Apr 2011 21:59:15 -0600 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> References: <4DB6F8B5.90106@psu.edu> <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> Message-ID: <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org> I like both Joe and John's lists - pretty darn comprehensive. One thing I noted about the lists is that while calibration was explicitly listed in John's list, neither list explicitly discussed validation as a step beyond calibration, though I suppose some of the other items on each list might cover that concept. Validation is particularly important with remote sensing data since knowing that a sensor is calibrated to some precision may not say anything about how accurately it is actually measuring some physical parameter - ground truth is often needed to judge that. I especially liked John's "subtle characteristics," especially the one about data being annotatable. Given that the quality of any particular data set varies depending on the use to which you'd like to put it to and that the data originator rarely is in a position to know all of the potential uses and users of their data, capturing the annotations of users is often the only way to start capturing information about the utility of the data to audiences other than the original producer. Ruth On Apr 26, 2011, at 5:06 PM, John Graybeal wrote: > That's a neat catalog checklist! > > I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata: > http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system > > Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc. > > Two lessons from that progression: > - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it. In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?). In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way. > - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media. Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later. If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data. Ideally this transformation is lossless (lossless by design, even). If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc. > > We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around. Different technological choices appropriate in each case. > > John > > On Apr 26, 2011, at 12:30, Joe Hourcle wrote: > >> And then for some reason, as I was writing this e-mail, I realized I may >> already have something close to what I'm looking for, but it probably >> needs to be more explicit in some regards: >> >> http://sdac.virtualsolar.org/catalogs/catalog_checklist >> >> I wrote this checklist to give to scientists who are maintaining 'science >> catalogs' (basically, lists of ... something). In solar physics, they >> tend to be stored & distributed in ASCII text files, HTML tables, or MS >> Excel files. Sometimes, they're distributed as PDF files or FITS tables. >> >> -Joe >> _______________________________________________ >> Rdap mailing list >> Rdap at mail.asis.org >> http://mail.asis.org/mailman/listinfo/rdap > > > ---------------- > John Graybeal phone: 858-534-2162 > Product Manager > Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org > Marine Metadata Interoperability Project: http://marinemetadata.org > > > > > > > > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap From mweaver at asis.org Wed Apr 27 00:37:43 2011 From: mweaver at asis.org (Melissa Weaver) Date: Tue, 26 Apr 2011 21:37:43 -0700 Subject: [Rdap] RDAP11 Slideshare Presentation added Message-ID: Hello all, We recently added another poster session from Monica Omodei and Jeremy Frumkin to the RDAP11 slideshare: http://www.slideshare.net/asist_org/ Please do visit and share if you haven't seen the share space already. If you have submissions or comments, please contact me at mweaver at asis.org. Thank you, "Misty" Melissa Weaver ASIS&T mweaver at asis.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rduerr at nsidc.org Wed Apr 27 01:04:22 2011 From: rduerr at nsidc.org (Ruth Duerr) Date: Tue, 26 Apr 2011 23:04:22 -0600 Subject: [Rdap] RDAP11 Slideshare Presentation added In-Reply-To: References: Message-ID: Hi Melissa, I was surprised to not find my presentation on the slideshare site. I'd given a copy to Joe shortly after the conference. Do you know when I can expect to find that presentation up on the site? Thanks, Ruth On Apr 26, 2011, at 10:37 PM, Melissa Weaver wrote: > Hello all, > > We recently added another poster session from Monica Omodei and Jeremy Frumkin to the RDAP11 slideshare: > http://www.slideshare.net/asist_org/ > > Please do visit and share if you haven't seen the share space already. If you have submissions or comments, please contact me at mweaver at asis.org. > > Thank you, > > "Misty" Melissa Weaver > ASIS&T > mweaver at asis.org > > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety." Benjamin Franklin, Historical Review of Pennsylvania, 1759 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ruth Duerr Data Stewardship Program Lead National Snow and Ice Data Center Cooperative Institute for Research in Environmental Science University of Colorado at Boulder Boulder, CO 80309 (303) 735-0136 rduerr at nsidc.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Wed Apr 27 10:16:32 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Wed, 27 Apr 2011 10:16:32 -0400 (EDT) Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org> References: <4DB6F8B5.90106@psu.edu> <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org> Message-ID: On Tue, 26 Apr 2011, Ruth Duerr wrote: > I like both Joe and John's lists - pretty darn comprehensive. One thing > I noted about the lists is that while calibration was explicitly listed > in John's list, neither list explicitly discussed validation as a step > beyond calibration, though I suppose some of the other items on each > list might cover that concept. Validation is particularly important > with remote sensing data since knowing that a sensor is calibrated to > some precision may not say anything about how accurately it is actually > measuring some physical parameter - ground truth is often needed to > judge that. I think this is one of those issues where what you're studying comes into play. There's no way to get the 'ground truth' for the data I deal with. (even with those plans to put a man on the sun ... but they'd do it at night, so it'd be okay). In fact, my talk that spurred the checklist (dealing with some of the problems we were having in processing catalogs) was put in a session on 'data and instrumentation', and one of the talks before mine was from someone who worked on the SXI (Solar X-ray Imager) on the GOES fleet (constellation? I'm not sure what you call a group of spacecraft, and 'spacecrafts' sounds funny). Anyway, they do an intercalibration between GOES launches, so that the next SXI's data is calibrated to be comparable to the previous SXI ... and it seems that the calibration factor they were using to get to physical units (W/m^2) had actually been wrong ... the later instruments were calibrated correctly on the ground, and it was original ones that should've been adjusted. And it was by a factor of about 20-30%. They threw out the question to the scientists of how to deal with it -- go back and reclassify all flares? (an M8 would now be an X1), change the definition of the flare class (X would now be > 8*10^-3 vs. 10^-4), or something else? I don't know if a decision was ever made, but if groups are using two different scales for classifying flares, it could be messy. > I especially liked John's "subtle characteristics," especially the one > about data being annotatable. Given that the quality of any particular > data set varies depending on the use to which you'd like to put it to > and that the data originator rarely is in a position to know all of the > potential uses and users of their data, capturing the annotations of > users is often the only way to start capturing information about the > utility of the data to audiences other than the original producer. What? We're supposed to actually *test* the backups? And then you're going to tell me that it's a problem when it takes 15 hrs to retrieve a day's worth of data from tape, just because we need to re-calibrate the first 9 months of data from the mission, and someone accidentially flushed the raw data from disk. (Hmm ... now I just have to figure out how to get the scientists to actually look at these lists before they build data systems ... catalogs can generally be cleaned up after the fact, but data systems not so much) ... But for the annotation -- in a way, the various 'catalogs' that I deal with are annotations, but we have some really subtle issues that basically correlate to some of the problems in data citation: What am I annotating? (a) The world as observed in this data (eg, X2 flare) (b) The observation (eg, partially obscurred by clouds) (c) The observation as it exists on disk (eg, blocks lost in transfer; partial image) (d) The calibrated edition of data (eg, notes on oddities in the data) (e) The file on disk (eg, invalid checksum; possible corruption) (f) The instrument (eg, a discontinuity due to servicing, or even the lack of data for a time period) It's possible that some of these might 'trickle down'. (eg, if I'm annotating the calibrated form, I'm also indirectly annotating the observation and the state of the world ... I saw an X2 flare based on this set of calibrated images ... which means that I'm asserting that there's an X2 flare that might've been visible by other instruments observing that region at that time if they had similar observing characteristics) And I mention data citation for a few reasons: 1. Citation should be a type of annoation; if someone is later browsing the data, we should be able to tell them what papers have been published using that data, so they can avoid duplicating work or identify collaborators for additional analysis. 2. Downloading should be a type of annotation. It allows a researcher to easily identify what they had downloaded, so they can then generate a record for citation. It can also be used to generate periods/locations of interest in general, but even anonymous, we get into some issues like Mike Brown's Haumea incident [1]. 3. But if we know who's downloaded the data, we can inform researchers if we've identified problems with the data and/or a recalibration run, so they don't get caught unaware when it happens after they've downloaded the data, but before they've submitted their research paper. [1] Another researcher published that they had discovered the 'dwarf' planet, but it was later discovered they had downloaded the observing logs, and determined where Brown had been looking : http://www.nytimes.com/2005/09/13/science/space/13plan.html -Joe From rduerr at nsidc.org Wed Apr 27 11:44:00 2011 From: rduerr at nsidc.org (Ruth Duerr) Date: Wed, 27 Apr 2011 09:44:00 -0600 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: <4DB6F8B5.90106@psu.edu> <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org> Message-ID: On Apr 27, 2011, at 8:16 AM, Joe Hourcle wrote: > > > On Tue, 26 Apr 2011, Ruth Duerr wrote: > >> I like both Joe and John's lists - pretty darn comprehensive. One thing I noted about the lists is that while calibration was explicitly listed in John's list, neither list explicitly discussed validation as a step beyond calibration, though I suppose some of the other items on each list might cover that concept. Validation is particularly important with remote sensing data since knowing that a sensor is calibrated to some precision may not say anything about how accurately it is actually measuring some physical parameter - ground truth is often needed to judge that. > > I think this is one of those issues where what you're studying comes into play. There's no way to get the 'ground truth' for the data I deal with. (even with those plans to put a man on the sun ... but they'd do it at night, so it'd be okay). Yup > > In fact, my talk that spurred the checklist (dealing with some of the problems we were having in processing catalogs) was put in a session on 'data and instrumentation', and one of the talks before mine was from someone who worked on the SXI (Solar X-ray Imager) on the GOES fleet (constellation? I'm not sure what you call a group of spacecraft, and 'spacecrafts' sounds funny). > > Anyway, they do an intercalibration between GOES launches, so that the next SXI's data is calibrated to be comparable to the previous SXI ... and it seems that the calibration factor they were using to get to physical units (W/m^2) had actually been wrong ... the later instruments were calibrated correctly on the ground, and it was original ones that should've been adjusted. > > And it was by a factor of about 20-30%. > > They threw out the question to the scientists of how to deal with it -- go back and reclassify all flares? (an M8 would now be an X1), change the definition of the flare class (X would now be > 8*10^-3 vs. 10^-4), or something else? I don't know if a decision was ever made, but if groups are using two different scales for classifying flares, it could be messy. > > > >> I especially liked John's "subtle characteristics," especially the one about data being annotatable. Given that the quality of any particular data set varies depending on the use to which you'd like to put it to and that the data originator rarely is in a position to know all of the potential uses and users of their data, capturing the annotations of users is often the only way to start capturing information about the utility of the data to audiences other than the original producer. > > What? We're supposed to actually *test* the backups? > > And then you're going to tell me that it's a problem when it takes 15 hrs to retrieve a day's worth of data from tape, just because we need to re-calibrate the first 9 months of data from the mission, and someone accidentially flushed the raw data from disk. > > (Hmm ... now I just have to figure out how to get the scientists to actually look at these lists before they build data systems ... catalogs can generally be cleaned up after the fact, but data systems not so much) > > ... > > But for the annotation -- in a way, the various 'catalogs' that I deal with are annotations, but we have some really subtle issues that basically correlate to some of the problems in data citation: > > What am I annotating? > (a) The world as observed in this data (eg, X2 flare) > (b) The observation (eg, partially obscurred by clouds) > (c) The observation as it exists on disk (eg, blocks lost in > transfer; partial image) > (d) The calibrated edition of data (eg, notes on oddities in the > data) > (e) The file on disk (eg, invalid checksum; possible corruption) > (f) The instrument (eg, a discontinuity due to servicing, or even > the lack of data for a time period) and possibly all of the above and more! > > It's possible that some of these might 'trickle down'. (eg, if I'm annotating the calibrated form, I'm also indirectly annotating the observation and the state of the world ... I saw an X2 flare based on this set of calibrated images ... which means that I'm asserting that there's an X2 flare that might've been visible by other instruments observing that region at that time if they had similar observing characteristics) > > And I mention data citation for a few reasons: > > 1. Citation should be a type of annoation; if someone is later > browsing the data, we should be able to tell them what papers > have been published using that data, so they can avoid > duplicating work or identify collaborators for additional > analysis. Actually the USGCRP paper includes citations as well as a host of technical documentation - while the paper is long the list inside the paper is relatively short: "Instrument / sensor characteristics including pre-flight or pre-operational performance measurements (e.g., spectral response, noise characteristics, etc.) Instrument / sensor calibration data and method; Processing algorithms and their scientific basis, including complete description of any sampling or mapping algorithm used in the creation of the product (e.g. contained in peer reviewed papers, in some cases supplemented by thematic information introducing the data set or product to scientists unfamiliar with it); Complete information on any ancillary data or other data sets used in generation or calibration of the data set or derived product; Processing history including versions of processing source code corresponding to versions of the data set or derived product held in the archive; Quality assessment information; Validation record, including identification of validation data sets; Data structure and format, with definition of all parameters and fields; In the case of earth-based data, station location and any changes in location, instrumentation, controlling agency, surrounding land use and other factors that could influence the long-term record; A bibliography of pertinent Technical Notes and articles, including refereed publications reporting on research using the data set; Information received back from users of the data set or product." > > 2. Downloading should be a type of annotation. It allows a > researcher to easily identify what they had downloaded, so they > can then generate a record for citation. It can also be used > to generate periods/locations of interest in general, but even > anonymous, we get into some issues like Mike Brown's Haumea > incident [1]. Yes - in ESIP we've discussed this with the idea that a repository could create a citation for a user that referenced back to a specific set of files. The issue with that is that generally users wouldn't be citing all of the data they downloaded, but some fraction of it or more likely some fraction of a number of downloads. Just because it doesn't work perfectly for citation generation doesn't mean that it isn't a good idea in general though... > > 3. But if we know who's downloaded the data, we can inform > researchers if we've identified problems with the data and/or > a recalibration run, so they don't get caught unaware when it > happens after they've downloaded the data, but before they've > submitted their research paper. Agreed... > > [1] Another researcher published that they had discovered the 'dwarf' > planet, but it was later discovered they had downloaded the observing > logs, and determined where Brown had been looking : > http://www.nytimes.com/2005/09/13/science/space/13plan.html > > > > -Joe -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael at psu.edu Wed Apr 27 12:21:37 2011 From: michael at psu.edu (Michael J. Giarlo) Date: Wed, 27 Apr 2011 12:21:37 -0400 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: <4DB6F8B5.90106@psu.edu> Message-ID: <4DB84291.7070300@psu.edu> On 04/26/2011 03:30 PM, Joe Hourcle wrote: > >> >> The first bit I have trouble wrapping my mind around is that the notion >> of archival quality is binary, that an object can be of archival quality >> or not. > > Maybe I should be asking the opposite -- > > What would make a digital object *not* of archival quality? I'd respond the same. :) I couldn't confidently say whether an object is archival or not archival w/o a lot more context along the dimensions I mentioned (use, file formats, retention period, etc.), and even then I would be inclined not to say it's archival or not but rather give a qualified judgment on *how* likely I think that object will be around in N years. > I guess what I'm really looking for is something like TRAC or Data Seal of > Approval, but instead of guidelines for the repository, focuses on the > qualities of a good self-documenting file or package that could later be > ingested by an archive for them to maintain. Nothing comes to mind, but I'd be interested in hearing examples of the same. I guess this might be a good place to bring up what may be a related technology: """ Data Format Description Language (DFDL, often pronounced daff-o-dil) is a modeling language from the Open Grid Forum for describing general text and binary data. A DFDL model or schema allows any text or binary data to be read (or "parsed") from its native format and to be presented as an instance of an information set. The same DFDL schema also allows data to be taken from an instance of an information set and written out (or "serialized") to its native format. """ -- http://en.wikipedia.org/wiki/Data_Format_Description_Language I haven't had the occasion to use DFDL -- and I tend to shy away from huge blobs of XML and XML Schema -- but it could have some promise in this area. Has anyone else used DFDL or thought about it in the context Joe brings up? Btw, Joe, I appreciate your catalog checklist and the four characteristics you shared. Good starting point. -Mike From rduerr at nsidc.org Wed Apr 27 12:41:37 2011 From: rduerr at nsidc.org (Ruth Duerr) Date: Wed, 27 Apr 2011 10:41:37 -0600 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <4DB84291.7070300@psu.edu> References: <4DB6F8B5.90106@psu.edu> <4DB84291.7070300@psu.edu> Message-ID: <6E3876EF-D04A-48BF-9DF3-753838D8E303@nsidc.org> On Apr 27, 2011, at 10:21 AM, Michael J. Giarlo wrote: > On 04/26/2011 03:30 PM, Joe Hourcle wrote: >> >>> >>> The first bit I have trouble wrapping my mind around is that the notion >>> of archival quality is binary, that an object can be of archival quality >>> or not. >> >> Maybe I should be asking the opposite -- >> >> What would make a digital object *not* of archival quality? > > I'd respond the same. :) I couldn't confidently say whether an object > is archival or not archival w/o a lot more context along the dimensions > I mentioned (use, file formats, retention period, etc.), and even then I > would be inclined not to say it's archival or not but rather give a > qualified judgment on *how* likely I think that object will be around in > N years. > >> I guess what I'm really looking for is something like TRAC or Data Seal of >> Approval, but instead of guidelines for the repository, focuses on the >> qualities of a good self-documenting file or package that could later be >> ingested by an archive for them to maintain. > > Nothing comes to mind, but I'd be interested in hearing examples of the > same. > > I guess this might be a good place to bring up what may be a related > technology: > > """ > Data Format Description Language (DFDL, often pronounced daff-o-dil) is > a modeling language from the Open Grid Forum for describing general text > and binary data. A DFDL model or schema allows any text or binary data > to be read (or "parsed") from its native format and to be presented as > an instance of an information set. The same DFDL schema also allows data > to be taken from an instance of an information set and written out (or > "serialized") to its native format. > """ -- http://en.wikipedia.org/wiki/Data_Format_Description_Language > > I haven't had the occasion to use DFDL -- and I tend to shy away from > huge blobs of XML and XML Schema -- but it could have some promise in > this area. Has anyone else used DFDL or thought about it in the context > Joe brings up? > I think THG looked at DFDL (along with a whole host of other such schemas) when they started working on developing human/machine readable maps to the content of HDF4 files. They ended up developing their own schema based loosely on PREMIS. I should note that the schema and map creating software is nearing final release and NASA EOSDIS plans on mapping each HDF4 file in the archives of their Earth Science Data Centers. > Btw, Joe, I appreciate your catalog checklist and the four > characteristics you shared. Good starting point. > > -Mike > _______________________________________________ > Rdap mailing list > Rdap at mail.asis.org > http://mail.asis.org/mailman/listinfo/rdap From oneiros at grace.nascom.nasa.gov Wed Apr 27 12:54:36 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Wed, 27 Apr 2011 12:54:36 -0400 (EDT) Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: <4DB6F8B5.90106@psu.edu> <2F49D923-8FA7-4B98-9674-BE09EC933A43@ucsd.edu> <2AD5543E-D5D3-455F-B555-152205439895@nsidc.org> Message-ID: On Wed, 27 Apr 2011, Ruth Duerr wrote: > On Apr 27, 2011, at 8:16 AM, Joe Hourcle wrote: >> On Tue, 26 Apr 2011, Ruth Duerr wrote: [trimmed] >> And I mention data citation for a few reasons: >> >> 1. Citation should be a type of annoation; if someone is later >> browsing the data, we should be able to tell them what papers >> have been published using that data, so they can avoid >> duplicating work or identify collaborators for additional >> analysis. > Actually the USGCRP paper includes citations as well as a host of > technical documentation - while the paper is long the list inside the > paper is relatively short: [trimmed] And this is why we need the inter-discipline groups ... I hadn't even heard of this report. >> 2. Downloading should be a type of annotation. It allows a >> researcher to easily identify what they had downloaded, so they >> can then generate a record for citation. It can also be used >> to generate periods/locations of interest in general, but even >> anonymous, we get into some issues like Mike Brown's Haumea >> incident [1]. > > Yes - in ESIP we've discussed this with the idea that a repository could > create a citation for a user that referenced back to a specific set of > files. The issue with that is that generally users wouldn't be citing > all of the data they downloaded, but some fraction of it or more likely > some fraction of a number of downloads. Just because it doesn't work > perfectly for citation generation doesn't mean that it isn't a good idea > in general though... Exactly ... it's better than nothing, which is what we have right now. The other thing that my group has discussed, is that if there were standards for identification (back to the archive & record within the archive), tools could be written to process the downloaded files, and either: 1. Generate an appropriate citatation document (listing each file individually, using ORE, metalink or similar), or guide the researcher to generate a more concise citation such as the IPY standard. 2. Pass the identifiers back to the archive to ask if any of them have been deprecated, and automatically start retrieving the replacement. 3. Check with annotation services to see if there's any new information about the data. If the tool's built to support plugins for different file formats & we had standards for the archives & annotation services, it could probably be generic enough to support any discipline. (and so I have free time to do it, I need to stop procrastinating by replying to e-mail, and finish up some stuff to support a meeting next week) -Joe From mdawnweaver at gmail.com Wed Apr 27 00:24:36 2011 From: mdawnweaver at gmail.com (Misty Melissa Weaver) Date: Tue, 26 Apr 2011 21:24:36 -0700 Subject: [Rdap] RDAP presentations on slideshare Message-ID: Hello all, We recently added another poster session from Monica Omodei and Jeremy Frumkin to the RDAP11 slideshare: http://www.slideshare.net/asist_org/ Please do visit and share if you haven't seen the share space already. If you have submissions or comments, please contact me at mweaver at asis.org. Thank you, Misty Melissa Weaver -------------- next part -------------- An HTML attachment was scrubbed... URL: From oneiros at grace.nascom.nasa.gov Wed Apr 27 13:53:12 2011 From: oneiros at grace.nascom.nasa.gov (Joe Hourcle) Date: Wed, 27 Apr 2011 13:53:12 -0400 (EDT) Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: <4DB84291.7070300@psu.edu> References: <4DB6F8B5.90106@psu.edu> <4DB84291.7070300@psu.edu> Message-ID: On Wed, 27 Apr 2011, Michael J. Giarlo wrote: > On 04/26/2011 03:30 PM, Joe Hourcle wrote: >> >>> >>> The first bit I have trouble wrapping my mind around is that the notion >>> of archival quality is binary, that an object can be of archival quality >>> or not. >> >> Maybe I should be asking the opposite -- >> >> What would make a digital object *not* of archival quality? > > I'd respond the same. :) I couldn't confidently say whether an object > is archival or not archival w/o a lot more context along the dimensions > I mentioned (use, file formats, retention period, etc.), and even then I > would be inclined not to say it's archival or not but rather give a > qualified judgment on *how* likely I think that object will be around in > N years. Perhaps we need to treat it more like IT security ... A password of (x) length and (y) complexity is expected to be good for about (z) time ... but as new attacks are found, (z) might decrease. ... and come up with recommendations for different time spans (if you want it to last (x) time, you should ...) or different levels of effort (if you only do (y), it'll likely be unreadable within (z) time.) [trimmed] > I guess this might be a good place to bring up what may be a related > technology: [trimmed] > """ -- http://en.wikipedia.org/wiki/Data_Format_Description_Language > > I haven't had the occasion to use DFDL -- and I tend to shy away from > huge blobs of XML and XML Schema -- but it could have some promise in > this area. Has anyone else used DFDL or thought about it in the context > Joe brings up? I hadn't seen that one. There's been a discussion by the SPASE group [1] about the need for documenting ASCII files, and I had brought up DSPL: http://code.google.com/apis/publicdata/ They had the problem though that it seemed tightly tied to the individual file to be processed, rather than having it link to a collection of files, or have the data file link back to the description (or more than one description, in different languages, but then we start getting into the realm of self-documenting formats) And they seemed more geared towards ASCII, while it sounds like DFDL might be able to describe more complex stuff, like FITS & VOTable. (when I was working on catalogs, I started on a parser to use ExtJS to display the contents of VOTable, but dropped it when they kept giving different responses regarding the change to ExtJS's license from BSD to LGPL to GPL, and my unwillingness to spend 2+ years going through NASA's legal council ... and there was some concerns about section 508 support) [1] Space Physics Archive Search & Extract; http://www.spase-group.org/ -Joe (yes, yes, the not reading e-mail thing isn't working). From michael at psu.edu Wed Apr 27 14:10:41 2011 From: michael at psu.edu (Michael J. Giarlo) Date: Wed, 27 Apr 2011 14:10:41 -0400 Subject: [Rdap] What makes an 'Archive Quality' Digital Object? In-Reply-To: References: <4DB6F8B5.90106@psu.edu> <4DB84291.7070300@psu.edu> Message-ID: <4DB85C21.2020009@psu.edu> On 04/27/2011 01:53 PM, Joe Hourcle wrote: > > Perhaps we need to treat it more like IT security ... > > A password of (x) length and (y) complexity is expected > to be good for about (z) time ... but as new attacks are > found, (z) might decrease. > > ... and come up with recommendations for different time spans (if you want > it to last (x) time, you should ...) or different levels of effort (if you > only do (y), it'll likely be unreadable within (z) time.) That's the approach that fits best, to my mind. > And they seemed more geared towards ASCII, while it sounds like DFDL might > be able to describe more complex stuff, like FITS& VOTable. Indeed, those seem to be the sorts of use cases DFDL was designed for -- again noting that I have not yet used DFDL (or Daffodil or Defuddle or any of the software in that space). > (yes, yes, the not reading e-mail thing isn't working). We hadn't noticed. ;) -Mike From cragin at illinois.edu Wed Apr 27 16:12:44 2011 From: cragin at illinois.edu (Cragin, Melissa H) Date: Wed, 27 Apr 2011 15:12:44 -0500 Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu> References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu> Message-ID: Hi, In seeing this bit that Sherry posted, I re-visited other NSF materials, and find that I need to correct what I wrote a couple of days ago (My apologies for not sorting this out prior to posting my response.) While the requirements expressed by the Directorates vary to some extent, there is an FAQ posted by NSF that seems to apply across Directorates. It states: "2. Is a plan for Data Management required if my project is not expected to generate data or samples? Yes. It is acceptable to state in the Data Management Plan that the project is not anticipated to generate data or samples that require management and/or sharing. PIs should note that the statement will be subject to peer review." (http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp#2) Jan Cheetham's post on sorting out the specifics of how such a meeting will be documented and what will be made public (and where) is very helpful advice. Melissa From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Lake, Sherry (sah) Sent: Friday, April 22, 2011 3:33 PM To: Research Data, Access and Preservation Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? As for the particular question asked in this e-mail thread, maybe page 4 of the Education & Human Resources (HER) Directorate DMP guidelines (http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf) will help. This section of the guidelines list examples for EHR proposals to follow: 1. A proposal for a workshop that will result in a workshop report. a. The DMP could consist of a statement to the effect that a workshop report will be produced and disseminated, e.g., via a website, publication in a journal, or other means. Institute of Museum & Library Services (IMLS) is a little clearer on when a DMP is required. The DMP requiremnts is part 3 of the section "Specification for Projects that Develop Digital Products". So I assume, for IMLS, no digital products, no DMP required. I wish the NSF would give a little more guidance as to what they want in a DMP. They could then tell us which proposals do not require one. I think looking at the specific solicitation should have that information, like Melissa said. -- Sherry Lake shlake at virginia.edu Scientific Data Consultant Brown Science and Engineering Library University of Virginia * * * * * * * * * * * * * * * * * "A library is not a luxury but one of the necessities of life." --- Henry Ward Beecher * * * * * * * * * * * * * * * * * From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of John Graybeal Sent: Friday, April 22, 2011 12:56 AM To: Research Data, Access and Preservation Cc: rdap at mail.asis.org Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? Here's the way I think about this kind of thing: 1) You have done useful work by the time you finish (presumably). 2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs. 3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that? If the basis is entirely on other papers, then those would presumably be cited in the report. If it is strictly a thought process of a single group, then the report is the data. But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect. (They might validate your group's wisdom, or find great wisdom that your group missed.) In these cases, I would say a very short Data Management Plan would be worth including. john On Apr 21, 2011, at 13:43, Aletia Morgan wrote: Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there's no experimental data here, and the output is essentially a narrative document. The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not required." If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From ahmorgan at vpr.rutgers.edu Thu Apr 28 09:03:23 2011 From: ahmorgan at vpr.rutgers.edu (Aletia Morgan) Date: Thu, 28 Apr 2011 09:03:23 -0400 (EDT) Subject: [Rdap] Deciding when a Data Management Plan is not required? In-Reply-To: References: <005101cc0065$42cd9d10$c868d730$@rutgers.edu> <8012F251-4B81-4537-91E6-E2F5584439CA@ucsd.edu> <6679CD8702498947A68B23D55773733203342F@GRANT.eservices.virginia.edu> Message-ID: <004001cc05a5$2c813630$8583a290$@rutgers.edu> Good morning! I appreciate everyone's comments - especially seeing that some of the instructions from the different directorates are different. And Sherry, I am in full agreement with you that a lot of the written requirements are just too vague at this point - I hope that NSF will post updates in the not-too-distant future. I think this kind of clarification of intent will continue to be a topic for some time. Thanks, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Cragin, Melissa H Sent: Wednesday, April 27, 2011 4:13 PM To: Research Data, Access and Preservation Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? Hi, In seeing this bit that Sherry posted, I re-visited other NSF materials, and find that I need to correct what I wrote a couple of days ago (My apologies for not sorting this out prior to posting my response.) While the requirements expressed by the Directorates vary to some extent, there is an FAQ posted by NSF that seems to apply across Directorates. It states: "2. Is a plan for Data Management required if my project is not expected to generate data or samples? Yes. It is acceptable to state in the Data Management Plan that the project is not anticipated to generate data or samples that require management and/or sharing. PIs should note that the statement will be subject to peer review." (http://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp#2) Jan Cheetham's post on sorting out the specifics of how such a meeting will be documented and what will be made public (and where) is very helpful advice. Melissa From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of Lake, Sherry (sah) Sent: Friday, April 22, 2011 3:33 PM To: Research Data, Access and Preservation Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? As for the particular question asked in this e-mail thread, maybe page 4 of the Education & Human Resources (HER) Directorate DMP guidelines (http://www.nsf.gov/bfa/dias/policy/dmpdocs/ehr.pdf) will help. This section of the guidelines list examples for EHR proposals to follow: 1. A proposal for a workshop that will result in a workshop report. a. The DMP could consist of a statement to the effect that a workshop report will be produced and disseminated, e.g., via a website, publication in a journal, or other means. Institute of Museum & Library Services (IMLS) is a little clearer on when a DMP is required. The DMP requiremnts is part 3 of the section "Specification for Projects that Develop Digital Products". So I assume, for IMLS, no digital products, no DMP required. I wish the NSF would give a little more guidance as to what they want in a DMP. They could then tell us which proposals do not require one. I think looking at the specific solicitation should have that information, like Melissa said. -- Sherry Lake shlake at virginia.edu Scientific Data Consultant Brown Science and Engineering Library University of Virginia & & & & & & & & & & & & & & & & & "A library is not a luxury but one of the necessities of life." --- Henry Ward Beecher & & & & & & & & & & & & & & & & & From: rdap-bounces at asis.org [mailto:rdap-bounces at asis.org] On Behalf Of John Graybeal Sent: Friday, April 22, 2011 12:56 AM To: Research Data, Access and Preservation Cc: rdap at mail.asis.org Subject: Re: [Rdap] Deciding when a Data Management Plan is not required? Here's the way I think about this kind of thing: 1) You have done useful work by the time you finish (presumably). 2) To the extent the useful work was based on discovered, collected, observed, or otherwise modeled data, the conclusions are likely to depend on the those inputs. 3) If someone wants to evaluate your conclusions in light of your inputs, would they be able to do that? If the basis is entirely on other papers, then those would presumably be cited in the report. If it is strictly a thought process of a single group, then the report is the data. But if the basis is on brainstorming ideas from multiple groups, or having people in the meeting each generate their own inputs, which are then collated and massaged, or it relied on results that were on-line and might be different tomorrow -- then it would be a 'best practice' to maintain the original raw materials in a repository (say the web site where your work is managed, if any) that others could inspect. (They might validate your group's wisdom, or find great wisdom that your group missed.) In these cases, I would say a very short Data Management Plan would be worth including. john On Apr 21, 2011, at 13:43, Aletia Morgan wrote: Greetings - I'm working with PIs on reviewing DMP documents that are part of NSF proposals, and I am wondering about whether a DMP is needed. The project involves staging a meeting with researchers in the discipline, with a goal that includes the development of plans for future research. Obviously, there's no experimental data here, and the output is essentially a narrative document. The PI asserts that "This proposal does not seek to collect environmental data. A data management plan is not required." If a conference is being organized, notes are taken, documents are being written. Is there a need to say how these records will be developed and preserved? Or am I being too compulsive! Thanks for any thoughts, and if there might be a better place to ask this question. Regards, Aletia ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Aletia Morgan Research Application Designer Office of the Vice President for Research and Graduate & Professional Education Rutgers, The State University of New Jersey 715 CoRE Building, Busch Campus ahmorgan at vpr.rutgers.edu 732-445-3344 _______________________________________________ Rdap mailing list Rdap at mail.asis.org http://mail.asis.org/mailman/listinfo/rdap ---------------- John Graybeal phone: 858-534-2162 Product Manager Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org Marine Metadata Interoperability Project: http://marinemetadata.org -------------- next part -------------- An HTML attachment was scrubbed... URL: