Bob has led many large projects, including the study of packrat ‘middens’ (up to 25,000 years old!) which contain parts of plants and other debris used to date climate change chronology, etc. Bob is not planning on retiring soon, so the discussion was more ‘general’ with comments about data, projects, etc.
Comments and Questions –
1) General comment – ‘when I first saw this document, I thought there is no way most scientists in this building would want to fill this out….especially if they are on their way out the door’. Document is too long and redundant. Discussion continued about how an ‘interview’ would be more appropriate when a person is leaving the USGS.
2) Define ‘Data’ – discussion about the amount of data (8 Terabytes) a student working on this project in Oregon is responsible for! New versions are being created quite often, so at what point is the ‘data’ worth recording as metadata, etc. Many times a ‘subset’ of a database is used to run analysis of data.
3) As far as metadata, this project team does not make a habit of collecting metadata as per FGDC Standards, but the printed or digital reports contain appropriate metadata… the source of the info used for datasets to perform analysis, etc.
4) Many datasets are ‘moving targets’….updates to climate change data and flora descriptions happen quite often and analysis needs to be redone.
5) Data Management plans – general discussion about how when you start a project, you have a plan in place as to where the data exists, where it will be stored, links to other sources of the data, etc. I don’t think many of the scientists who have been around for a while have ever taken or been offered a DM planning class per se.
6) Servers discussion – Most data located on local PC’s and external drives. This project tries to do their own backups. We won’t mention the word ‘Dropbox’ here!? General discussion about the lack of automated backup systems in this building. In some cases there are links to the data via other groups (ex. NOAA). DVD’s are shared among team members. Budget issues don’t always allow for the purchase of large capacity servers for the life of the project.
7) Software/Licenses – Non standard applications might be used to create a dataset or product, but they try to convert the results to formats readable by standard USGS software. (Ex. Use of proprietary package called PSColor, created around 1992 in Fortran language, still used to create diagram layout for Atlas publication! But they convert it to .pdf for final dissemination.)
8) Electronic Data – where is the data located? – answered ‘yes, all of the above’! …data located on computers, websites, external drives, personal computers.
9) Data Labeled? – Always issues with filenames…may be intuitive to one person, but not to another team member. Or 1 year later, you, the creator of the data, has no idea what was meant in the data file name!
10) Websites – yes, both internal to USGS and public websites have been created.
11) With 4 funded people on the project, if Bob were to leave tomorrow, someone knows all about the data!
More comments in the document itself (see attachment -Science_data_exit_form_DRAFT_053114_TB.docx)
Here is my edit on the Intro
Overall, management very excited by this form, and adopting a formal process to help identify data is very valuable for our science center. Below are a series of comments from the discussion.
We often don't get adequate notice on departing employees, sometimes as little as 2 weeks. This survey really requires a bit of effort that they may not have adequate time.
Encourage employees to prioritize documentation on the most valuable data assets.
Our team has no policy on how to document data upon departure.
A test implementation with recently departing or close to departing employees will be very valuable—I am trying to conduct this test, but having trouble finding time (both scientist and myself)
Our local Data management project will include language and funds to facilitate this type of interview in BASIS for FY15.
Recommend that we prioritize the form, focusing primarily on the ‘who/what/where/when/why/how’ of data, and de-emphasis some of the lesser important items (such as software/hardware).
Implementation of form may want to clearly highlight :
- must have info
- Less important info
- Least important information
We may want to document different scenarios depending on the expertise available to help the scientist with this form. Data-management rich science centers versus those lacking.
This could be another important role of a science-center level data manager.
Another question to add: Is data backed up regularly/where is your data?
Add name of supervisor to form
How to document emeriti or other continuing projects?
In project contacts & relationship, what point of contact is to be provided? Supervisor, manager, project chief, data steward?
Scientists will likely not know anything about DOI’s.
Expand details on field records & disposition schedules. What are they, and how to use them.
Recommends putting the publication section higher in order of questions.
Modify the IPDS question to, ‘Provide IPDS number’ for easy lookup.
Another way to organize is separate active/current data that may be in review versus legacy data.
The system/hardware and software may not be relevant to many users.
Include questions about if a server or website is maintained by the departing scientist, has a new POC been identified and trained for transition of responsibility?
Have the ‘Data’ section as the first section.
Physical samples could also include rock cuttings, oil or water samples, etc.