[Rdap] What makes an 'Archive Quality' Digital Object?
Ruth Duerr
rduerr at nsidc.org
Tue Apr 26 23:59:15 EDT 2011
I like both Joe and John's lists - pretty darn comprehensive. One thing I noted about the lists is that while calibration was explicitly listed in John's list, neither list explicitly discussed validation as a step beyond calibration, though I suppose some of the other items on each list might cover that concept. Validation is particularly important with remote sensing data since knowing that a sensor is calibrated to some precision may not say anything about how accurately it is actually measuring some physical parameter - ground truth is often needed to judge that.
I especially liked John's "subtle characteristics," especially the one about data being annotatable. Given that the quality of any particular data set varies depending on the use to which you'd like to put it to and that the data originator rarely is in a position to know all of the potential uses and users of their data, capturing the annotations of users is often the only way to start capturing information about the utility of the data to audiences other than the original producer.
Ruth
On Apr 26, 2011, at 5:06 PM, John Graybeal wrote:
> That's a neat catalog checklist!
>
> I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata:
> http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system
>
> Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc.
>
> Two lessons from that progression:
> - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it. In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?). In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way.
> - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media. Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later. If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data. Ideally this transformation is lossless (lossless by design, even). If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc.
>
> We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around. Different technological choices appropriate in each case.
>
> John
>
> On Apr 26, 2011, at 12:30, Joe Hourcle wrote:
>
>> And then for some reason, as I was writing this e-mail, I realized I may
>> already have something close to what I'm looking for, but it probably
>> needs to be more explicit in some regards:
>>
>> http://sdac.virtualsolar.org/catalogs/catalog_checklist
>>
>> I wrote this checklist to give to scientists who are maintaining 'science
>> catalogs' (basically, lists of ... something). In solar physics, they
>> tend to be stored & distributed in ASCII text files, HTML tables, or MS
>> Excel files. Sometimes, they're distributed as PDF files or FITS tables.
>>
>> -Joe
>> _______________________________________________
>> Rdap mailing list
>> Rdap at mail.asis.org
>> http://mail.asis.org/mailman/listinfo/rdap
>
>
> ----------------
> John Graybeal <mailto:jgraybeal at ucsd.edu> phone: 858-534-2162
> Product Manager
> Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
> Marine Metadata Interoperability Project: http://marinemetadata.org
>
>
>
>
>
>
>
>
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap
More information about the RDAP
mailing list