Blog from August, 2016

The CDI August Monthly Meeting featured several presentations related to Data Access, as well as Announcements in preparation of FY17 events.

Access the slides and recording at the August Monthly Meeting Page: CDI Monthly Meeting 20160810

Scientist’s Challenge: Data Release for a Diverse Group of Geochemistry Labs

Leah Morgan from the Argon Geochemistry Laboratory told us about the types of data collected at the Southwest Isotope Research Laboratory, data formats, software, community guidelines, and asked what was the best way to get started with Data Release. 

Comments on the forum post include the Data Release Instructions page and examples of Data Dictionaries.


Public Access/Open Access at USGS: A story of a Science Data and Publishing Evolution

Viv Hutchison, Science Data Management Branch Chief, presented the history and progress of the USGS Public Access/Open Data Plan.

This timeline from her talk nicely summarizes where we’ve come from in dealing making scientific data public and open. (Click image to expand.)



If you want to read more about the Public Access plan and implementation updates, see https://www2.usgs.gov/quality_integrity/open_access/


The Data-Driven Discovery Initiative at Moore

Carly Moore’s talk about Data-Driven Discovery showed how some academic institutions are making changes to support data-driven research and tools.

An interesting concept discussed is the rise of the distribution of preprints. A Preprint is an early version of a manuscript. It may be a draft, an incomplete version, or a final version of an article.

Preprints are have been popular in certain disciplines for a long time. They are a great example of openly sharing research results in a very fast and efficient manner. Arxiv.org and ASAPBio.org are examples of preprint servers. Subsequent discussion brought up the point that many Federal policies would not allow such sharing before interpretive results are approved, an interesting point to consider as the scientific data and research landscape continues to evolve.


CDI Annual Workshop Planning

The CDI coordinators are working to get a draft agenda and an entry in the USGS Conference Database in the next month!

Help us by contributing and voting on ideas! https://usgs1.uservoice.com/forums/398538-annual-meeting-fy17-ideas

We will work with this data during the September Monthly Meeting.


CDI FY17 Request For Proposals and Collaboration Forum

The FY17 Community for Data Integration RFP will be announced at the September Monthly Meeting, it is right around the corner.

Prepare for the CDI FY17 RFP - View past funded projects: https://www2.usgs.gov/cdi/products-publications.html

Or use the collaboration forum to look for collaborators: FY17 RFP Collaboration Forum


--
More CDI Blog posts  


The USGS held a Lidar Science Innovation Workshop (internal link) from 8/2 to 8/4 in Fort Collins, CO.


What is Lidar

Light Detection And Ranging, a way to measure distance with a laser light, used to make high-resolution elevation maps. (At varying resolution but commonly at approximately 1 meter resolution (an elevation point every square meter) with +/-10 centimeter vertical resolution => much higher resolution than people were used to previously.)

A bare-earth DEM describes details of the Great Sand Dunes National Park and Preserve in Colorado. (Credit: USGS) (Source: Stoker, 2016)

 

Highlights

  • The workshop was a great opportunity for newcomers to get up to date with USGS elevation projects, use of Lidar, and new Lidar technologies.  Meanwhile, Lidar experts could network, present their research, and find out how their colleagues in USGS are using Lidar.

  • The 3D Elevation Program has an ambitious task ahead of it to map the nation in high-resolution Lidar over the next 8 years.

  • Some participants wanted more training opportunities for working with Lidar data, while others “hoped to retire without ever downloading a point cloud” (meaning that they prefer the derived products).

  • There was the desire for the formation of a Lidar Community of Practice.

  • A nice app Poll Everywhere (pollev.com) was used very effectively to display audience opinion and ideas during the Innovation Panel and the Closing Plenary.

  • Miscellaneous things learned:


A map shows the FY16 status of available 3DEP data. (Credit: USGS) (Source: Stoker, 2016)


What were the next steps
(from the Closing Plenary, View Download, internal links)


Another nice summary of the 3DEP program: The 3D Elevation Program (3DEP): Learn the Details and Goals of this Ambitious USGS Project, Earth Imaging Journal, Stoker, Feb 2, 2016.


More in-depth

  • You can check out the data.gov entry for the Lidar Point Cloud from the USGS National Map, in case you didn't believe this is big data (809,445 datasets in the collection...).
  • The National Enhanced Elevation Assessment (NEEA) - approximately 800 pages including appendices, documenting business uses for elevation needs across 34 Federal agencies, agencies from all 50 States, selected local government and Tribal offices, and private and not-for profit organizations.

--
More CDI Blog posts  


The USGS CDI had nice representation at the ESIP (Federation for Earth Science Information Partners) Summer Meeting in Durham, NC, from July 19-22, 2016.  ESIP is known to some (me) as “the place where those cool Earth Science geeks hang out.”

Highlights:


ESIP used some new technologies very effectively to improve the meeting participants’ experience, including:


Here’s ESIP’s own Summer Meeting Summary, complete with a storify from the #esipfed tweets.


I’m looking forward to the next meeting to get the latest scoop on the Earth Science information landscape and to inspire ideas and connections for CDI.


--
More CDI Blog posts  

From the ESIP Interoperability and Technology/Tech Dive Webinar Series page:

Summary: The OneStop Project is designed to improve NOAA's data discovery and access framework. Focusing on all layers of the framework and not just the user interface, OneStop is addressing data format and metadata best practices, ensuring more data are available through modern web services, working to improve the relevance of dataset searches, and improving both collection-level metadata management and granule level metadata systems to accommodate the wide variety and vast scale of NOAA's data.

Speaker: Ken Casey is the Deputy Director of the Data Stewardship Division in the NOAA National Centers for Environmental Information (NCEI). He leads the OneStop project, is active within NOAA's Big Earth Data Initiative and Big Data Project. Ken serves on a variety of national and international science and data management panels including the US Group on Earth Observations Data Management Working Group and the Group for High Resolution Sea Surface Temperature (GHRSST) Science Team. He co-chairs the Committee on Earth Observing Satellites SST Virtual Constellation and represents NCEI in the Federation of Earth Science Information Partners (ESIP). He holds a PhD in Physical Oceanography from the University of Rhode Island.

GoToMeeting Recording: https://youtu.be/wp7trIRFDOs

Slides: https://speakerdeck.com/esipfed/noaa-one-stop-ken-casey-ncei

 

The next Tech Stack webinar is scheduled for August 11, 2016: "UV-CDAT": Charles Doutriaux, LLNL. Details will be posted at http://wiki.esipfed.org/index.php/Interoperability_and_Technology/Tech_Dive_Webinar_Series#Tech_Dive_Webinars

--
See all CDI Blog posts

 

The July 11th Data Management Working Group call featured information on the USGS Data Release Workbench, the USGS Science Data Catalog, ScienceBase, and USGS Data Sharing Agreements. Visit the meeting wiki page to access the recording and Q&A. (The .arf recording can be played with software available on the USGS WebEx Recording and Playback page.)


Viv Hutchison gave a tour of the USGS Data Release Workbench, developed to illustrate the general steps needed for releasing data in USGS. 

Components of the Data Release Workbench.


Differences between USGS Science Data Catalog and ScienceBase are nicely illustrated in this table:


 

From the meeting wiki page:

 JC Nelson has been working on drafting Data Sharing Agreements at USGS to guide centers/programs when they are working with collaborators/funders. These agreements cover who receives the data after the end of the project, which agency is responsible for releasing/preserving the data, etc. Contact JC to view a copy of the draft Data Sharing Agreement.


Visit the CDI Data Management Working Group wiki page and meetings page for more information. 

--
See all CDI Blog posts

Did you miss the Controlled Vocabularies Presentation and Demo? Here's a summary! (Caution: CDI Coordinator and author of this post is not a metadata/semantic web expert.)

Summary

The CDI Semantic Web Working Group has been working to improve the recall and precision of searches in USGS data catalogs by making controlled vocabularies conveniently useful. Work began with a CDI project in 2014 and 2015 that developed use cases and prototypes. On July 13 the team demonstrated vocabulary services and several metadata tools that use vocabulary services. They also talked about the vision, objectives, and next steps of the Working Group's Controlled Vocabulary Manifesto. (Fran Lightsom, Peter Schweitzer, and Alan Allwardt)

The Controlled Vocabulary Manifesto

The Controlled Vocabulary Manifesto expresses a vision for the future where 

People using USGS data catalogs will be confident that their search results are both comprehensive and focused, with good recall (nothing relevant missed) and good precision (nothing irrelevant included).

USGS will efficiently use controlled vocabularies available through services to achieve this result.

The manifesto strategy achieves these objectives.

  1. Select controlled vocabularies and make them available through services. 
  2. Include controlled vocabulary terms in all USGS metadata. 

  3. Update USGS data catalog interfaces to take advantage of controlled vocabularies. 

  4. Educate and motivate the USGS workforce. 

Five points to take away

  1. A controlled vocabulary is a set of terms that a community agrees to use, with clear definitions and consistent spellings.

  2. Different vocabularies have been developed with different audiences and foci in mind, therefore we (users and web applications) may need to use multiple vocabularies to adequately annotate a resource. Browse and search the vocabularies available through USGS at the USGS Science Topics page: https://www2.usgs.gov/science/tab-term.html

  3. The Semantic Web Working Group has developed new web services for these vocabularies. (Web services “support interoperable machine-to-machine interaction over a network” according to the W3C and wikipedia.) Documentation of the services is at https://www2.usgs.gov/science/services.html.

  4. Applications that USGS users employ, such as Metadata Wizard, use the web services, making consistent keywords for search, and bringing us closer to the vision of the Controlled Vocabulary Manifesto where USGS data catalog searches are comprehensive, focused, have good recall (nothing relevant missed) and good precision (nothing irrelevant included).

  5. In general, a normal user will interact with the USGS Controlled Vocabularies through the interface of an application (e.g., one for creating metadata), which will use the web services to provide a way to choose terms and related terms.

More things you can do


We had over 90 WebEx attendees to our July Monthly Meeting, great attendance for a summer vacation month.

Head over to the CDI July Monthly Meeting page for the abstracts, working group reports, and for logged in members - access the presentations and recording.

--

The challenge of integrating diverse data types

The meeting started out with our latest Scientist’s Challenge, A Seismogenic Landslide Database: seeking the best way to make a diverse database accessible to others. This challenge highlights the need to integrate and serve very diverse data types, including seismic data, seismic network logs, landslide measurements, references, photos, GIS files, imagery, and even emails and blog posts. Thanks to Kate Allstadt and Brennah McVey for presenting this challenge, we'll update CDI on possible solutions at a future monthly meeting.

USGS Cloud Hosting Solutions (CHS)

Next, Kimberly Scott and Vickie Backus gave an update on progress and activities of the USGS Cloud Hosting Solutions (CHS). Their presentations outlined the services offered and the near-term plan for moving USGS applications to the Cloud. For the latest CHS news and monthly updates, see the USGS internal site http://internal.usgs.gov/oei/cloud-hosting-solutions. Contact cloudservices@usgs.gov for general CHS questions, including access to the CHS Sandbox. Q&A about security, geospatial tech stacks, and the GitHub Enterprise license is archived on the CDI forum.

USGS Controlled Vocabulary Services

The final presentation, Implementing Controlled Vocabulary Services in USGS, combined a report from a FY14-15 CDI funded project and a demo. This may be the first CDI funded project that produced a manifesto. A Controlled Vocabulary Manifesto, that is! Although using controlled vocabularies may seem elementary at first (to some), the maintenance, implementation, and integration of different vocabularies for different users and applications is quite a complex matter. The result will be improved integrity and quality of the research results produced by the USGS for the Nation. Follow-up Q&A is on the CDI forumLook for a summary of the Controlled Vocabulary demo (part of the CDI Virtual Training Series) in an upcoming post. Thanks to Fran Lightsom, Peter Schweitzer, and Alan Allwardt for keeping the CDI Community and the USGS itself up to date on semantic web issues.

 --
See more CDI Blog posts