[Rdap] What makes an 'Archive Quality' Digital Object?

Wed Apr 27 12:54:36 EDT 2011

On Wed, 27 Apr 2011, Ruth Duerr wrote:
> On Apr 27, 2011, at 8:16 AM, Joe Hourcle wrote:
>> On Tue, 26 Apr 2011, Ruth Duerr wrote:

[trimmed]

>> And I mention data citation for a few reasons:
>>
>> 	1. Citation should be a type of annoation;  if someone is later
>> 	   browsing the data, we should be able to tell them what papers
>> 	   have been published using that data, so they can avoid
>> 	   duplicating work or identify collaborators for additional
>> 	   analysis.

> Actually the USGCRP paper includes citations as well as a host of 
> technical documentation - while the paper is long the list inside the 
> paper is relatively short:

[trimmed]

And this is why we need the inter-discipline groups ... I hadn't even 
heard of this report.

>> 	2. Downloading should be a type of annotation.  It allows a
>> 	   researcher to easily identify what they had downloaded, so they
>> 	   can then generate a record for citation.  It can also be used
>> 	   to generate periods/locations of interest in general, but even
>> 	   anonymous, we get into some issues like Mike Brown's Haumea
>> 	   incident [1].
>
> Yes - in ESIP we've discussed this with the idea that a repository could 
> create a citation for a user that referenced back to a specific set of 
> files.  The issue with that is that generally users wouldn't be citing 
> all of the data they downloaded, but some fraction of it or more likely 
> some fraction of a number of downloads.  Just because it doesn't work 
> perfectly for citation generation doesn't mean that it isn't a good idea 
> in general though...

Exactly ... it's better than nothing, which is what we have right now.

The other thing that my group has discussed, is that if there were 
standards for identification (back to the archive & record within the 
archive), tools could be written to process the downloaded files, and
either:

 	1. Generate an appropriate citatation document (listing each file
 	   individually, using ORE, metalink or similar), or guide the
 	   researcher to generate a more concise citation such as the IPY
 	   standard.

 	2. Pass the identifiers back to the archive to ask if any of them
 	   have been deprecated, and automatically start retrieving the
 	   replacement.

 	3. Check with annotation services to see if there's any new
 	   information about the data.

If the tool's built to support plugins for different file formats & 
we had standards for the archives & annotation services, it could probably 
be generic enough to support any discipline.

(and so I have free time to do it, I need to stop procrastinating by 
replying to e-mail, and finish up some stuff to support a meeting next 
week)

-Joe