[Rdap] What makes an 'Archive Quality' Digital Object?

Joe Hourcle oneiros at grace.nascom.nasa.gov
Tue Apr 26 15:30:05 EDT 2011



On Tue, 26 Apr 2011, Michael J. Giarlo wrote:

> On 04/22/2011 02:11 PM, Joe Hourcle wrote:
>>
>> As part of the side discussion about defining what a dataset is, an issue
>> got raised about what about a given object makes it of 'archive quality'?
>>
>
> That's a doozy, Joe!  Good question to ponder as a group.
>
> This strikes me as something of a glib non-answer, but here goes.
>
> The first bit I have trouble wrapping my mind around is that the notion
> of archival quality is binary, that an object can be of archival quality
> or not.

Maybe I should be asking the opposite --

What would make a digital object *not* of archival quality?

Although, even then, I can think of a few cases where even something 
stored in a proprietary file format, which needs proprietary software 
to be read, which runs on hardware that's no longer manufactured ... is 
considered 'archived' because they either stored the hardware, or managed 
to emulate the machine and OS so they could run the software.

...

I guess what I'm really looking for is something like TRAC or Data Seal of 
Approval, but instead of guidelines for the repository, focuses on the 
qualities of a good self-documenting file or package that could later be 
ingested by an archive for them to maintain.

In some ways, this is difficult, as without the associated archive, you 
won't necessarily have a useful identifier scheme.  (as an identifier 
without a place to register it in my mind is pretty useless, unless you're 
just using it to try to show that two objects are likely not the same 
thing)

...

And then for some reason, as I was writing this e-mail, I realized I may 
already have something close to what I'm looking for, but it probably 
needs to be more explicit in some regards:

 	http://sdac.virtualsolar.org/catalogs/catalog_checklist

I wrote this checklist to give to scientists who are maintaining 'science 
catalogs' (basically, lists of ... something).  In solar physics, they 
tend to be stored & distributed in ASCII text files, HTML tables, or  MS 
Excel files.  Sometimes, they're distributed as PDF files or FITS tables.

It's not a perfect correlation into what's ideal for archiving ... but it 
does get into some of the issues of what file formats you're using, and 
what sort of documentation you should be providing, although some of it's 
catalog-specific.

...

And, to spur discussion, as I said this came up because of another 
conversation, here was my response before I threw it out to this group:

> > My intent [] is to answer a very common question which is what does 
> > "Archive Quality" mean when it comes to digital data.  It seems its 
> > always two-sides of the same coin: Preservation and Usability.
> 
> I thought that was what you were going for ... I'd say to be of 'archive 
> quality', it must be:
>
>     1. In a format that's well documented, and with the standard
>        freely available such that software to read the files could be
>        re-implemented from only the documentation.
>
>     2. Have enough metadata attached to the object such that it can be
>        identified, and the necessary scientific metadata for use can
>        be obtained.  (the science metadata might be stored in a
>        separate catalog, but then the object would need to identify
>        the catalog and the necessary foreign keys into the catalog)
>
>     3. As part of that 'scientific metadata', identify what collection
>        it's a part of, for purposes of retrieving documentation on how
>        to use the data.
>
>     4. Have some way of identifying if there has been corruption of
>        the data from what was originally stored.  (I won't get
>        into how sensitive it needs to be; if just identifying it
>        occurred is enough, or if you have to be able to track down
>        which discrete datum is bad)
> 
> There are other characteristics which are more about how it's being 
> stored than about the object itself.  (eg, don't lose it, make sure it's 
> actually available for use)



-Joe



More information about the RDAP mailing list