[Rdap] What makes an 'Archive Quality' Digital Object?

John Graybeal jgraybeal at ucsd.edu
Tue Apr 26 19:06:39 EDT 2011


That's a neat catalog checklist! 

I wrote something analogous a while back from the data management system perspective, though some of the things in mine are about the data system many are about the data and metadata:
  http://marinemetadata.org/blogs/graybeal/the-good-enough-data-system

Something Michael said about file formats reminded me of the physical storage analog, tapes/CDs/DVDs/etc.

Two lessons from that progression:
 - You don't want your storage mechanism to become corrupt or inaccessible. In data context, the rough equivalent would be a binary format that no software can parse, or content that isn't reproducible because people or systems have made changes that transform it.  In hardware, the answer is to pick a widely used format (CD, not 8-track or 9-track tape!) that won't quickly degrade (curation quality, maybe?).  In software, it means picking a format that is well known, as widely used as possible, and ideally is represented, or representable, in a human-parseable way. 
 - Rewriting physical media is a given when trying to manage long-term storage of computer bits onto physical media.  Rewriting digital objects will need to be a given when trying to manage long-term storage of them in a way that is recoverable and actionable later.  If you have an audio file today in some obscure digital format, you need to rewrite it into a well-known format for audio data.  Ideally this transformation is lossless (lossless by design, even).  If you have a scientific data artifact that you really want to save, netCDF is good, but the ASCII representation might be better. And so on for databases, movies, office documents (RTF better than Word), etc.

We may have to make a distinction between: archival for long-term computer access assuming ongoing maintenance of the archive; archival for survivability of the digital information content assuming the original archive itself may not still be around in 20 years; and archival for human posterity, when computers may not be around.  Different technological choices appropriate in each case.

John

On Apr 26, 2011, at 12:30, Joe Hourcle wrote:

> And then for some reason, as I was writing this e-mail, I realized I may 
> already have something close to what I'm looking for, but it probably 
> needs to be more explicit in some regards:
> 
> 	http://sdac.virtualsolar.org/catalogs/catalog_checklist
> 
> I wrote this checklist to give to scientists who are maintaining 'science 
> catalogs' (basically, lists of ... something).  In solar physics, they 
> tend to be stored & distributed in ASCII text files, HTML tables, or  MS 
> Excel files.  Sometimes, they're distributed as PDF files or FITS tables.
> 
> -Joe
> _______________________________________________
> Rdap mailing list
> Rdap at mail.asis.org
> http://mail.asis.org/mailman/listinfo/rdap


----------------
John Graybeal    <mailto:jgraybeal at ucsd.edu>     phone: 858-534-2162
Product Manager
Ocean Observatories Initiative Cyberinfrastructure Project: http://ci.oceanobservatories.org
Marine Metadata Interoperability Project: http://marinemetadata.org











More information about the RDAP mailing list