The CDI August Monthly Meeting featured several presentations related to Data Access, as well as Announcements in preparation of FY17 events.
Access the slides and recording at the August Monthly Meeting Page: CDI Monthly Meeting 20160810
Leah Morgan from the Argon Geochemistry Laboratory told us about the types of data collected at the Southwest Isotope Research Laboratory, data formats, software, community guidelines, and asked what was the best way to get started with Data Release.
Comments on the forum post include the Data Release Instructions page and examples of Data Dictionaries.
Viv Hutchison, Science Data Management Branch Chief, presented the history and progress of the USGS Public Access/Open Data Plan.
This timeline from her talk nicely summarizes where we’ve come from in dealing making scientific data public and open. (Click image to expand.)
If you want to read more about the Public Access plan and implementation updates, see https://www2.usgs.gov/quality_integrity/open_access/
Carly Moore’s talk about Data-Driven Discovery showed how some academic institutions are making changes to support data-driven research and tools.
An interesting concept discussed is the rise of the distribution of preprints. A Preprint is an early version of a manuscript. It may be a draft, an incomplete version, or a final version of an article.
Preprints are have been popular in certain disciplines for a long time. They are a great example of openly sharing research results in a very fast and efficient manner. Arxiv.org and ASAPBio.org are examples of preprint servers. Subsequent discussion brought up the point that many Federal policies would not allow such sharing before interpretive results are approved, an interesting point to consider as the scientific data and research landscape continues to evolve.
The CDI coordinators are working to get a draft agenda and an entry in the USGS Conference Database in the next month!
Help us by contributing and voting on ideas! https://usgs1.uservoice.com/forums/398538-annual-meeting-fy17-ideas
We will work with this data during the September Monthly Meeting.
The FY17 Community for Data Integration RFP will be announced at the September Monthly Meeting, it is right around the corner.
Prepare for the CDI FY17 RFP - View past funded projects: https://www2.usgs.gov/cdi/products-publications.html
Or use the collaboration forum to look for collaborators: FY17 RFP Collaboration Forum
The USGS held a Lidar Science Innovation Workshop (internal link) from 8/2 to 8/4 in Fort Collins, CO.
What is Lidar
Light Detection And Ranging, a way to measure distance with a laser light, used to make high-resolution elevation maps. (At varying resolution but commonly at approximately 1 meter resolution (an elevation point every square meter) with +/-10 centimeter vertical resolution => much higher resolution than people were used to previously.)
A bare-earth DEM describes details of the Great Sand Dunes National Park and Preserve in Colorado. (Credit: USGS) (Source: Stoker, 2016)
Highlights
The workshop was a great opportunity for newcomers to get up to date with USGS elevation projects, use of Lidar, and new Lidar technologies. Meanwhile, Lidar experts could network, present their research, and find out how their colleagues in USGS are using Lidar.
The 3D Elevation Program has an ambitious task ahead of it to map the nation in high-resolution Lidar over the next 8 years.
Some participants wanted more training opportunities for working with Lidar data, while others “hoped to retire without ever downloading a point cloud” (meaning that they prefer the derived products).
There was the desire for the formation of a Lidar Community of Practice.
A nice app Poll Everywhere (pollev.com) was used very effectively to display audience opinion and ideas during the Innovation Panel and the Closing Plenary.
Miscellaneous things learned:
Ele-hydro: hydrography from elevation
3DEP data dictionary (don't use spatial metadata without it)
A map shows the FY16 status of available 3DEP data. (Credit: USGS) (Source: Stoker, 2016)
What were the next steps
(from the Closing Plenary, View Download, internal links)
Know your USGS National Map liaison, talk to them
Use the Seasketch site to submit your “requirements” for Lidar collection
Spread the word about Lidar and 3DEP
Build applications using Lidar
Attend the August 16 Mapping Innovation Kickoff and series (internal links)
Another nice summary of the 3DEP program: The 3D Elevation Program (3DEP): Learn the Details and Goals of this Ambitious USGS Project, Earth Imaging Journal, Stoker, Feb 2, 2016.
More in-depth
The USGS CDI had nice representation at the ESIP (Federation for Earth Science Information Partners) Summer Meeting in Durham, NC, from July 19-22, 2016. ESIP is known to some (me) as “the place where those cool Earth Science geeks hang out.”
Highlights:
Leslie Hsu and Viv Hutchison were part of the coordinating team for the Connecting Communities working session. Thirty-three participants identified challenges and solutions to connecting our many overlapping communities of practice and becoming more efficient in our working groups. The session outcomes are summarized in this document.
Madison Langseth convened an extremely well-attended session on the Use of Persistent Identifiers to link data, publications, people, and institutions. Notes and presentations are linked on the ESIP Commons site.
The FY16 CDI-funded project Data Management Training Clearinghouse presented its progress at the session: Data Management Training (DMT) Working Group Update. (Notes and presentation also available at the ESIP Commons site.)
ESIP used some new technologies very effectively to improve the meeting participants’ experience, including:
Sched.org: event organization app
imhere.esipfed.org: conference participant sign-in app
Here’s ESIP’s own Summer Meeting Summary, complete with a storify from the #esipfed tweets.
I’m looking forward to the next meeting to get the latest scoop on the Earth Science information landscape and to inspire ideas and connections for CDI.
From the ESIP Interoperability and Technology/Tech Dive Webinar Series page:
Summary: The OneStop Project is designed to improve NOAA's data discovery and access framework. Focusing on all layers of the framework and not just the user interface, OneStop is addressing data format and metadata best practices, ensuring more data are available through modern web services, working to improve the relevance of dataset searches, and improving both collection-level metadata management and granule level metadata systems to accommodate the wide variety and vast scale of NOAA's data.
Speaker: Ken Casey is the Deputy Director of the Data Stewardship Division in the NOAA National Centers for Environmental Information (NCEI). He leads the OneStop project, is active within NOAA's Big Earth Data Initiative and Big Data Project. Ken serves on a variety of national and international science and data management panels including the US Group on Earth Observations Data Management Working Group and the Group for High Resolution Sea Surface Temperature (GHRSST) Science Team. He co-chairs the Committee on Earth Observing Satellites SST Virtual Constellation and represents NCEI in the Federation of Earth Science Information Partners (ESIP). He holds a PhD in Physical Oceanography from the University of Rhode Island.
GoToMeeting Recording: https://youtu.be/wp7trIRFDOs
Slides: https://speakerdeck.com/esipfed/noaa-one-stop-ken-casey-ncei
The next Tech Stack webinar is scheduled for August 11, 2016: "UV-CDAT": Charles Doutriaux, LLNL. Details will be posted at http://wiki.esipfed.org/index.php/Interoperability_and_Technology/Tech_Dive_Webinar_Series#Tech_Dive_Webinars
The July 11th Data Management Working Group call featured information on the USGS Data Release Workbench, the USGS Science Data Catalog, ScienceBase, and USGS Data Sharing Agreements. Visit the meeting wiki page to access the recording and Q&A. (The .arf recording can be played with software available on the USGS WebEx Recording and Playback page.)
Viv Hutchison gave a tour of the USGS Data Release Workbench, developed to illustrate the general steps needed for releasing data in USGS.
Components of the Data Release Workbench.
Differences between USGS Science Data Catalog and ScienceBase are nicely illustrated in this table:
From the meeting wiki page:
JC Nelson has been working on drafting Data Sharing Agreements at USGS to guide centers/programs when they are working with collaborators/funders. These agreements cover who receives the data after the end of the project, which agency is responsible for releasing/preserving the data, etc. Contact JC to view a copy of the draft Data Sharing Agreement.
Visit the CDI Data Management Working Group wiki page and meetings page for more information.
Did you miss the Controlled Vocabularies Presentation and Demo? Here's a summary! (Caution: CDI Coordinator and author of this post is not a metadata/semantic web expert.)
The CDI Semantic Web Working Group has been working to improve the recall and precision of searches in USGS data catalogs by making controlled vocabularies conveniently useful. Work began with a CDI project in 2014 and 2015 that developed use cases and prototypes. On July 13 the team demonstrated vocabulary services and several metadata tools that use vocabulary services. They also talked about the vision, objectives, and next steps of the Working Group's Controlled Vocabulary Manifesto. (Fran Lightsom, Peter Schweitzer, and Alan Allwardt)
The Controlled Vocabulary Manifesto expresses a vision for the future where
People using USGS data catalogs will be confident that their search results are both comprehensive and focused, with good recall (nothing relevant missed) and good precision (nothing irrelevant included).
USGS will efficiently use controlled vocabularies available through services to achieve this result.
The manifesto strategy achieves these objectives.
Include controlled vocabulary terms in all USGS metadata.
Update USGS data catalog interfaces to take advantage of controlled vocabularies.
Educate and motivate the USGS workforce.
A controlled vocabulary is a set of terms that a community agrees to use, with clear definitions and consistent spellings.
Different vocabularies have been developed with different audiences and foci in mind, therefore we (users and web applications) may need to use multiple vocabularies to adequately annotate a resource. Browse and search the vocabularies available through USGS at the USGS Science Topics page: https://www2.usgs.gov/science/tab-term.html
The Semantic Web Working Group has developed new web services for these vocabularies. (Web services “support interoperable machine-to-machine interaction over a network” according to the W3C and wikipedia.) Documentation of the services is at https://www2.usgs.gov/science/services.html.
Applications that USGS users employ, such as Metadata Wizard, use the web services, making consistent keywords for search, and bringing us closer to the vision of the Controlled Vocabulary Manifesto where USGS data catalog searches are comprehensive, focused, have good recall (nothing relevant missed) and good precision (nothing irrelevant included).
In general, a normal user will interact with the USGS Controlled Vocabularies through the interface of an application (e.g., one for creating metadata), which will use the web services to provide a way to choose terms and related terms.
See the slides and recording here.
Read further Q&A coming from this session.
Join the Semantic Web Working Group if you are interested in helping the cause.
Register for the Reviewing Metadata Virtual Training.
We had over 90 WebEx attendees to our July Monthly Meeting, great attendance for a summer vacation month.
Head over to the CDI July Monthly Meeting page for the abstracts, working group reports, and for logged in members - access the presentations and recording.
--
The meeting started out with our latest Scientist’s Challenge, A Seismogenic Landslide Database: seeking the best way to make a diverse database accessible to others. This challenge highlights the need to integrate and serve very diverse data types, including seismic data, seismic network logs, landslide measurements, references, photos, GIS files, imagery, and even emails and blog posts. Thanks to Kate Allstadt and Brennah McVey for presenting this challenge, we'll update CDI on possible solutions at a future monthly meeting.
Next, Kimberly Scott and Vickie Backus gave an update on progress and activities of the USGS Cloud Hosting Solutions (CHS). Their presentations outlined the services offered and the near-term plan for moving USGS applications to the Cloud. For the latest CHS news and monthly updates, see the USGS internal site http://internal.usgs.gov/oei/cloud-hosting-solutions. Contact cloudservices@usgs.gov for general CHS questions, including access to the CHS Sandbox. Q&A about security, geospatial tech stacks, and the GitHub Enterprise license is archived on the CDI forum.
The final presentation, Implementing Controlled Vocabulary Services in USGS, combined a report from a FY14-15 CDI funded project and a demo. This may be the first CDI funded project that produced a manifesto. A Controlled Vocabulary Manifesto, that is! Although using controlled vocabularies may seem elementary at first (to some), the maintenance, implementation, and integration of different vocabularies for different users and applications is quite a complex matter. The result will be improved integrity and quality of the research results produced by the USGS for the Nation. Follow-up Q&A is on the CDI forum. Look for a summary of the Controlled Vocabulary demo (part of the CDI Virtual Training Series) in an upcoming post. Thanks to Fran Lightsom, Peter Schweitzer, and Alan Allwardt for keeping the CDI Community and the USGS itself up to date on semantic web issues.