Here’s a roundup of recent CDI collaboration area topics from the month of May!
VeeAnn Cross Cross, VeeAnn A and Peter Schweitzer Schweitzer, Peter N. reviewed use of keywords in the USGS Science Data Catalog. Choosing good keywords is an important part of creating a USGS data release, and there is an opportunity to work together to better align the terms being used. One tip is to make sure there are USGS Thesaurus and ISO terms being used, and not to make up keywords that are not part of one of the suggested vocabularies.
Here’s more guidance on keywords and suggested vocabularies from our trusty Data Management website:
Recent post on the Metadata Reviewers forum: Data Dictionaries as a standalone product?
Announcement: If you are using code.usgs.gov, the official source code archive for USGS, there is a collaborator request form and a public repository request process. Contact firstname.lastname@example.org for these resources.
Kyle Goodwin from GitLab presented on “GitLab as a web-based DevOps lifecycle tool.” GitLab is not Github, although they both use git, a distributed version-control system for tracking changes in source code during software development. GitLab is a platform for a DevOps-driven software development lifecycle. GitLab is the official source code archive for USGS (https://code.usgs.gov/public/) and provides a toolchain that includes project management, software repo, CI/CD (continuous integration/continuous deployment), metrics, and monitoring. Some groups use GitHub for repository features but, GitLab for CI/CD. https://about.gitlab.com/devops-tools
Did you know that members of the Semantic Web Working Group created the buzzword bingo sheets for the CDI workshop? Thank you to Peter Schweitzer and Fran Lightsom for coordinating. Do you hear the following during CDI presentations? Leverage, game changer, compliance, smart data, portal, authoritative, internet of things, analytics, best practices, takeaway, innovative, crowdsourcing, buy-in, stakeholder, framework, workflow, carrot and stick, pitch deck, quick win, modernization, cultural change...
A couple example bingo sheets:
Chris Holmes presented on the SpatioTemporal Asset Catalog (STAC) specification, an emerging standard to make it easier to find geospatial information. It aims to enable a cloud-native geospatial future by providing a common layer of metadata for search and discovery, while playing well with the web and existing geospatial standards.
Find out more about STAC at http://stacspec.org/
Chris Holmes is also involved with the Radiant Earth Foundation, which you can read more about here: Creating a Machine Learning Commons for Global Development
The Tech Stack working group meets jointly with the ESIP Information Technology and Interoperability Committee, and is led by Dave Blodgett Blodgett, David L. and Rich Signell Signell, Richard P. .
Tamar Norkin Norkin, Tamar and Ricardo McClees-Funinan McClees-Funinan, Ricardo presented on “Behind the Scenes at ScienceBase: How Data Release happens in your USGS Trusted Digital Repository (TDR).”
They highlighted the ScienceBase Data Release (SBDR) Tool, a very handy way to start your USGS data release. The SBDR Tool can be found here: https://www.sciencebase.gov/datarelease.
For data release questions and requests, you can get in touch with the data release team at email@example.com.
After filling out the ScienceBase Data Release Tool form, you get a new landing page in ScienceBase, a reserved Digital Object Identifier, and an email with instructions!
Chris Barber Barber, Christopher from USGS EROS presented on XGBoost in Continuous Change Detection and Classification (CCDC). Chris explained how XGBoost (Extreme Gradient Boost, an open-source software library which provides a gradient boosting framework) improved the efficiency and accuracy of segment classification and land cover extraction for LCMAP (Land Change Monitoring, Assessment, Projection). Chris gave a good introduction to the concepts of decision trees, decision tree ensembles, and boosted trees. However, he urged us to remember that there is no substitute for appropriate training data. Email Chris for his extensive reference list - firstname.lastname@example.org.
Peter Esselman Esselman, Peter C. , USGS Great Lakes Science Center also presented on Deep learning to quantify benthic habitat.
From Peter Esselman's talk - image showing various tools that are the future of Great Lakes science.
The Citizen-Centered Innovation group discussed the final draft of the OSTP Report on Prizes and Citizen Science Projects, the USGS Open Innovation Strategy, and the DOI Generic Information Collection Request. They also highlighted relevant upcoming seminars and events in the broader Federal sphere.
Sara McBride @McBride, Sara K of the Earthquake Science Center presented on Social Science 101: a Primer. Some conclusions: Social science is a big field with a lot of disciplines, each examining the human experience with its own unique lens. Doing it well requires years of study, therefore DIY social science is not recommended. There are a number of social scientists within the USGS: reach out and ask us questions!
Some related resources
Wilkins EJ, Miller HM, Tilak E, Schuster RM (2018) Communicating information on nature-related topics: Preferred information channels and trust in sources. PLoS ONE 13(12): e0209013. https://doi.org/10.1371/journal.pone.0209013
https://my.usgs.gov/hd/: HDgov is a multi-agency website for all things human dimensions of natural resources. Here you can access a variety of resources to assist you in your work.
Dale Cox Cox, Dale A. presented on SAFRR (Science Application for Risk Reduction) Projects and Scenarios for Risk Reduction.Dale has been involved in many scenario projects and is in the process of looking back and evaluate some of the scenarios. What is a scenario? Principles of a scenario: A single, large, but plausible event that we need to be ready for, integrate across many disciplines, use best hazard science, consensus among leading experts, create study with community partners, and results presented in products that fit the user, not the scientist.
Some related resources:
USGS Earthquake Scenario Map: https://earthquake.usgs.gov/scenarios/related.php
Slides explaining the components of a scenario and the Science Application for Risk Reduction (SAFRR) scenarios.
The Risk group also announced its inaugural RFP Awards!
Chris Merkes Merkes, Christopher M. from UMESC presented on Choosing the right eDNA assay: Developing standards for Limit of Detection and Limit of Quantification. This work is planned to be soon released in a new environmental DNA journal.
A resource discussed: https://github.com/cmerkes/qPCR_LOD_Calc
Merkes CM, Klymus KE, Allison MJ, Goldberg C, Helbing CC, Hunter ME, Jackson CA, Lance RF, Mangan AM, Monroe EM, Piaggio AJ, Stokdyk JP, Wilson CC, Richter C. (2019) Generic qPCR Limit of Detection (LOD) / Limit of Quantification (LOQ) calculator. R Script. Available at: https://github.com/cmerkes/qPCR_LOD_Calc. DOI: https://doi.org/10.5066/P9GT00GB.
Slide fro Chris Merkes' talk illustrating Limit of Detection and Limit of Quantification.
The Software Development Cluster discussed Docker basics for code development.
More notes and links at their meeting notes in their Meeting Notes (accessible to DOI users).
All CDI Collaboration Areas may be browsed on the CDI wiki.