[Rdap] saving server space in institutional repositories

Joe Hourcle oneiros at grace.nascom.nasa.gov
Tue Sep 5 21:25:35 EDT 2017



On Tue, 5 Sep 2017, Daureen Nesdill wrote:

> Hi Joe
>
> Thanks for your comments.
>
> Below in green

Um ... I'm using a plain text client (no risk of it opening attachments 
or executing javascript on me).  But I know what was mine.

> -----Original Message-----
> From: Joe Hourcle [mailto:oneiros at grace.nascom.nasa.gov]
>
>
>> On Tue, 5 Sep 2017, Daureen Nesdill wrote:
>>
>>> Hi
>>>
>>> In 2014 the Data Act was passed 
>>> https://www.usaspending.gov/Pages/data-act.aspx to increase 
>>> transparency and accountability in government. As a result states and 
>>> cities have been developing portals to their open data. Utah is one of 
>>> those states : https://utah.gov/digital/ https://opendata.utah.gov/. 
>>> They are looking for any data related to the state. Guess what? There 
>>> is a lot of research at the U of Utah related to the state - health, 
>>> environment, disaster relief and recovery, fire, land use, water 
>>> quality, etc.
>>>
>>> If all the data related to research about the state is hosted on state 
>>> servers at state expense, then the library does not have to host it 
>>> and save server space and save $$$$.
>>>
>>> Anyone else working with their state IT?

>> Before you shift everything to them, you should check to see who is 
>> considered responsible for the data if it was generated as part of a 
>> grant.  If it's the university, you'd probably want to keep a dark 
>> copy, just in case the state archives loses it.

> It is not the state archive but the IT department in the governor's 
> office. We actually have nothing in our repository - it is still in 
> beta.
>
> And yes we are looking into :
>
> The UU owns the data so will the UU allow researchers to give it away to 
> the state? (faculty senate?)

The way that you worded it, it sounds like you're just giving a copy to 
the state.  What you're proposing to do is ceeding responsibility for its 
preservation to the state.  And I would *not* trust an IT department to do 
that.  State Archives would understand the implications, IT would not.

> Data generated from research performed on this campus must stay on 
> campus. So we give a second copy to the State and do not save $$$

Not necessarily true.  You could move the local copy to a lower class of 
storage (eg. JBOD, to be restored from the state should something go 
wrong)

> Do we need to get legal involved and draw up an agreement with the 
> state?

I would.  At the very least you need a Memorandum of Understanding, 
spelling out what each group is responsible for.  You may also want 
something like a Service Level Agreement, but those are usually for IT 
services where money's changing hands.  (the service provider specifies 
what sort of uptime guarantee & minimum level of service (bandwidth, etc.) 
will be available, or you don't have to pay for some period (month, week, 
etc)


> Contractual agreements with funders may indicate an entity other than 
> the UU owns the data.

And in those cases, you may not be able to transfer the data to another 
group (at all, much less give them responsibility for it), but you also 
may have restrictions on distribution (which after having worked in IT for 
20+ years, I wouldn't trust to a run-of-the-mill IT department)

You can run into problems where the group generating the data obtained 
restricted data from some other group, and may only use it for a very 
narrow purpose.  This seems to come up more typically with non-US groups 
(where data *can* be copyrighted) as part of a data sharing agreement.

There can also be legal restrictions -- HIPAA for human research (use has 
to go through IRB approval), ITAR for certain types of physics and 
engineering data, etc.

Animal & ecology data can also be sensitive -- research on endangered 
species could be used by poachers.

Our group has the luxury of *only* distributed unrestricted data (and we 
won't accept restricted data), but part of that's because dealing with 
authentication & authorization at NASA requires a ton of extra hoops to 
jump through ... and if you're dealing with foreign nationals, it's even 
worse.


>> And I admit that it's been a while since I talked to anyone from the 
>> National Archive, but when they had the second release of 'data.gov', I 
>> was chatting with someone from there, and I remember that the amount of 
>> digital data that they were dealing with was orders of magnitude less 
>> than what our group did.  (and that's not even all of NASA).  I 
>> wouldn't  be surprised if the same was true for state archives.
>
> 23,000 datasets, but that's not the point. The state wants to grow their 
> open data repository.


Um ... that seems like they're looking for a reason to justify their 
existance.  What they should really should be doing is two things :

1. A registry of data available that meets their inclusion criteria
2. A repository for organizations that don't have a suitable system to
    serve the data to the public and/or preserve it for the long term (or
    at least the time frame required by law)

One of the big problems is that government IT departments can be at the 
whim of politicians -- suddenly replaced by contractors ... or everything 
has to move to some new system (from a company owned by a one of their 
campaign donors), etc.

And let's not forget the websites and databases that are now dark because 
of a change in administration and the attempts to scrub anything related 
to 'climate change'.

I would much rather have an archival group in charge of the authoritative 
record, rather than an IT department.


-Joe

ps. And after that incident w/ HTTPS-Only ... insert standard disclaimer
     about this being my own opinion and not that of my place of work ...
     although I've worked in government IT for 15+ years (state & fed
     levels), was an elected official for 6 years, and worked in university
     IT for ~7 years, so I do have some experience that influenced by
     response.


-----
Joe Hourcle
Programmer/Analyst
Solar Data Analysis Center
Goddard Space Flight Center



More information about the RDAP mailing list