October’s CDI meeting explored current practices on some technical aspects of delivering scientific data and results.
David Hughes and Rob Djurasaj, from the CDI DevOps Community of Practice, presented on DevOps' role in data integration and delivery. In addition to an introduction to DevOps (the bringing together of software Development and Operations), they explained how certain new technologies are allowing the USGS NGTOC (National Geospatial Technical Operations Center) to improve their efficiency and save on costs. For example, efficiencies are gained by employing IaC, Infrastructure as Code, the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
Keith Maull and Matt Mayernik from the National Center for Atmospheric Research (NCAR) Library joined us to present on Packaging data and software. NCAR has been publishing reports since the 1960s, and increasingly both text and a code repository are part of the publication. How can they make the science in the publications reproducible for decades into the future? Some of their code can only run on supercomputers, and even tools like Jupyter notebooks sometimes do not fully run in new environments. They initiated discussion on optimal approaches for identifying, validating, characterizing, and preserving scientific information and tools.
Reproducibility considerations related to user needs and computational environments.
See the recording and slides at the meeting page.
The presentation topic was the USGS Bird Banding Lab ReportBand application.
The Bird Banding Laboratory (BBL) is an integrated scientific program established in 1920 supporting the collection, archiving, management and dissemination of information from banded and marked birds in North America. Their ReportBand application facilitates reporting of bird bands. It is a serverless application that utilizes an S3 Bucket configured to deliver a single page application running off Amazon Web Services Lambda functions using CloudFront and Route 53. The presentation covered the infrastructure-as-code CloudFormation templates stored in GitLab that configure and provision the application and backend in the Cloud Hosting Solutions environment. You can see the app at www.reportband.gov.
Image of Bird Bands from https://www.usgs.gov/centers/pwrc/science/about-federal-bird-bands
See more at the DevOps wiki page.
Karl Benedict (University of New Mexico) presented on the Data Management Training Clearinghouse (DMTC, https://dmtclearinghouse.esipfed.org/). The DMTC was initiated with CDI funds in FY2016, but has been supported by other funds since then, including DataONE, ESIP, and IMLS (Institute of Museum and Library Services). Current goals are to diversify the contents in discipline and target audience.
Madison Langseth presented on USGS-specific training materials and resources. These included trainings and resources on the USGS Data Management website. She also reviewed challenges from the CDI Workshop Speed Data-ing session, which included publishing large data sets that are user friendly, downloadable, & interactive; Publishing data dictionaries; Automating parts of data review tasks; Data management planning strategies relevant to scientists and data managers; Implementing records management with data management; Convincing scientists of the value of data management.
See the recording and slides on the meeting page.
Daryl Van Dyke, a spatial analyst for the Fish and Wildlife Service, presented on Utilizing Deep Neural Networks for Landscape Conservation: An Application of Google’s Tensorflow for a Cannabis Production Inventory in Northern California.
From the abstract: This presentation shows how an off-the-shelf Deep Neural Network (DNN) algorithm – Inception v2 – was retrained into a production classifier and applied to the problem of locating and sizing cannabis production on private lands in Trinity County.
Daryl included a vision where anyone doing a GIS project in the Department of the Interior would be required to submit bounding rectangles on a 10-meter grid for every single feature they worked with. This illustrates how defining standardized products at the Enterprise level would be powerful for things like creating a training dataset for landscape level classification.
A slide about the training dataset used to locate and size cannabis production on private lands in California.
See more information and the recording at their meeting page.
Michael Rilee, of Rilee Systems Technologies LLC presented about STARE (SpatioTemporal Adaptive Resolution Encoding for Scalable Integrative Analysis).
From the abstract: Aligning and integrating different kinds of Earth Science data is a laborious process, leading most researchers to focus on more generic, high level data products that are more easily compared. Dealing with the great volume and variety of Earth Science data is the goal of the NASA/ACCESS-17 STARE project. STARE is a unifying indexing scheme addressing variety and is well suited for applying distributed storage and computing resources to address volume.
Dealing with the great volume and variety of Earth Science data is the goal of the NASA/ACCESS-17 STARE project.
See the recording at the ESIP Tech Dive page.
From the Risk CoP meetings page:
We had two thought-provoking presentations that included insights from the economics and cultural anthropology fields.
The WiRē Team: a long-term research-practice collaboration for supporting wildfire adaptedness in Wildland-Urban Interface (WUI) communities. James Meldrum, Research Economist, Fort Collins Science Center, presented on the Wildfire Research (WiRē) Team and an innovative approach to uniting research and practice, collecting and using community-specific survey and risk assessment data to support local solutions to wildfire risk while also advancing the academic literature on the topic. See their website for more info: https://wildfireresearchcenter.org/.
Emily Brooks, USGS American Association for the Advancement of Science (AAAS) Science & Technology Policy Fellow, presented on Community-Centered Climate Planning for People and Parks. She described a new toolkit she developed with the National Park Service for community-centered climate change planning. Her talk included some hard but important lessons about the future of our cultural resources, including parks and historical sites, in the face of slow-onset disruptions related to climate change, as well as some useful tips for working with different stakeholders and communities.
The WiRē approach - see more at https://wildfireresearchcenter.org/approach.
The topic for September’s Software Development Cluster meeting was Databases, and Beyond…. Topics covered included the AWS Database Freedom Project, database scenarios, and an Overview of AWS databases. Recommended Video Resource on the AWS YouTube channel: How to Choose the Right Database for the Job
Image from Amazon Aurora intro video. Amazon Aurora is being tested for some USGS databases that experience high traffic spikes (for example, during natural disasters).
At the September 11, 2019 CDI Monthly Meeting, we heard about some ongoing projects in the Water Resources Mission Area.
Katie Skalak presented on the USGS Water Prediction Work Program (2WP). The Water Prediction Work Program (2WP) will take advantage of the USGS observational network and the wide body of process-based research to guide prediction of Earth surface processes that govern water resources and water quality. 2WP is very much aligned with our broader CDI priority of integrated predictive science capacity. Hundreds of people are involved, and the project aims for open science by design and building the integrated science culture. These concepts will help the team of teams to achieve simultaneous action and awareness.
Title slide from the 2WP presentation.
Roland Viger told us more about the National Hydrologic Geospatial Fabric: a framework for the integration of water information. NHGF is hydrologically informed information architecture for integrated science. Roland presented the concept as categorized by DNA, skeleton, and meat, which are related to hydroinformatics, information architecture, and high value data themes, respectively. The types of data and information involved include river corridor data, dynamic landscape characteristics, data models, data gathering, harmonization, and data integration.
River corridor information planned for the National Hydrologic Geospatial Fabric.
Being from a different USGS mission area and not having heard much about these before, the presentations were incredibly informative. Both of these projects are going to be integrating huge amounts of data and information, and I look forward to keeping up with their progress! If you have suggestions for other USGS initiatives that you’d like to learn more about at CDI Monthly meetings, let us know at email@example.com.
See the recording and slides at the September Monthly Meeting page.
Madison Langseth led a discussion about the proposed page on the Data Management Website about reviewing metadata. Details from the discussion can be seen on the group’s meeting notes page, and the Metadata Review webpage is now available and packed with useful information!
Creating a metadata record doesn’t end when the content is generated. Metadata review is an essential part of the process of creating a metadata record, and can involve finding a reviewer, checking technical and content aspects of the metadata, and communicating with the author. - from the USGS Data Management website
Screenshot from the USGS Metadata Review webpage.
Jim McAndrew from the NPS presented on National Park Service Vector Tiling activities. When web maps were first introduced, they relied on tiled raster data. But now vector data is used in the basemap itself instead of raster data. This allows web developers to access vector information in the basemap, and allows for custom on-the-fly styling. Jim discussed how the National Park Service made the switch to vector tiles, how they dealt with legacy applications, how they are making use of the newly available vector data, use in mobile apps, and future plans for combining information from other agencies in real time and displaying it in our web maps.
Check out the Web Mapping Tools for the NPS site.
Matthew Purss from Geoscience Australia presented on The Challenge of Location and How Discrete Global Grid Systems can enable Spatial Data Integration.
Existing approaches and disconnected infrastructures coupled with the myriad of ways to describe and store location information limit our ability to discover, access and integrate spatial data across organisation and jurisdiction boundaries to produce reliable and actionable information. The Location Index project (LOC-I) aims to introduce a consistent way to access, analyse and use location data to support the effective integration of socio-economic, statistical and environmental data from multiple data providers to support the spatially enabled delivery of Government policies and initiatives.
Bridging the vector/raster divide to enable data integration.
CDI Tech Stack meetings are held jointly with the ESIP IT&I Tech Dive series, See recording here.
The Data Management working group had two presentations at their August meeting:
USGS National Hydrography Data Management. Karen Adkins, Jerry Ornelas, and Lisa Kok presented an overview of USGS National Hydrography and related database management systems operations, including lessons in data management best practices.
Publishing bi-transect-extractor - A first experience publishing processing code with its derivative data. Emily Sturdivant presented on the challenges and opportunities when publishing a suite of related resources for a project: a methods open file report, a data release, software, and a journal publication, and getting them all linked together. See the officially released code at https://code.usgs.gov/cmgp/bi-transect-extractor
Linking software and data release (and journal articles and open file reports!)
The meeting recording is available at the DMWG August meeting page.
Daniel Buscombe from Northern Arizona University presented on Continuous streamflow and nearshore wave monitoring from time-lapse cameras using deep neural networks. He described a proof-of-concept study into designing and implementing a single deep learning framework that can be used for both stream gaging and wave gauging from appropriate time-series of imagery.
From sensor to decision maker for hydrodynamic monitoring.
More details and the recording are available on the AI/ML meetings page.
The August Risk Community of Practice meeting featured a summary of July workshop and two presentations.
Our speakers for today's meeting had a water theme, with Curt Storlazzi sharing some exciting work on analyzing the cost benefit analysis of coral reefs for reducing coastal hazards and Athena Clark sharing insights and actionable solutions on improving the display of USGS water data. Watch the recording to learn more! - Kris Ludwig
Rigorously Quantifying the Coastal Hazard Risk Reduction Provided by US Coral Reefs - Curt Storlazzi, Research Geologist, Pacific Coastal and Marine Science Center
I have an idea! Alternative ways of displaying our data - Athena Clark, Science Advisor, USGS Southeast Region and USGS Storm Team Lead
Title slide from Curt Storlazzi's presentation.
The recording and notes are available to CDI members at the Risk Meetings page.
The Software Development Cluster met in August to brainstorm about the CDI FY20 Request For Proposals! The group also led an excellent presentation about the role of software in USGS data integration to the CDI Monthly Meeting on August 14.
Find out more about the group and their activities at their wiki page.
Our August 14, 2019 Monthly Meeting featured an overviews of NASA EarthData and our own CDI Software Development Cluster.
Cynthia Hall from NASA gave an overview of some of the data types in NASA EarthData that overlap with our USGS Mission Areas. In addition, she presented some of the resources she developed to help data users navigate the system. There are toolkits (entry points to access NASA Earth science data) and pathfinders (designed to guide you through the process of selecting data products and learning how to use them). Both were developed with user input.
An image from Cynthia Hall's EarthData presentation.
CDI Software Development Cluster leads Jeremy Newson, Cassandra Ladino, and Michelle Guy were joined by Laura DeCicco and Emily Sturdivant to give an excellent overview that covered What is software? Who creates it? Why is it integral to data integration and delivery? What are some USGS Resources to improve software? What are some examples of USGS software for data integration and delivery?
A few relevant links that they presented:
What is software? Code? Applications? Answer: All of the above.
Software is important in many steps of data collection, analysis, and delivery.
See the archived slides and recording on the meeting page when you are logged in.
Peter Burkholder, a senior innovation specialist from 18F, was the guest presenter. 18F builds effective, user-centric digital services focused on the interaction between government and the people and businesses it serves. Peter is a DevOps engineer who has worked to develop cloud.gov and implement devops practices at 18F. He is also a geophysicist who previously worked at IRIS PASSCAL. His presentation covered best practices and technical implementation of automated infrastructure, resilient cloud operations, and continuous delivery pipelines.
Peter’s favorite 18F tools include
Viv Hutchison and Madison Langseth led a discussion that included a brief overview of the CDI DMWG session at the in-person meeting in June, data manager position descriptions for USGS, and contributed slides from working group members about data management staffing at their USGS science centers. In response to the poll question “If you consider yourself to be a data manager for your center, what is your current position description title?,” there were 19 different responses!
See slides and recording at the DMWG July meeting wiki page.
Josh Bradley and Dennis Walworth presented on the Open Source Metadata Toolkit, which was supported by the CDI from 2014-2015 (see project page on ScienceBase) and is still going strong!
The CDI Tech Stack group meets jointly with the ESIP IT&I group - access the slides and recording at the ESIP Tech Dive page.
Topics at the July meeting of the Fire Science Community of Practice (one of CDI’s newest collaboration areas). Mark Miller provided a short community update presentation. Josh Picotte gave a science talk describing the LANDFIRE remap effort that is currently underway. LF Remap is designed to produce vegetation and fuels data that inform wildland fire and ecological decision support systems. Sheila Murphy gave a second talk called "Arsenic and old mines - Wildfire remobilizes historical mining waste." Other relevant files from July are included on the meeting page, such as a Menlo Park lecture on USGS Fire Science that was given by Paul Steblein earlier in the month.
This month’s Fire Science CoP summary was provided by Mark Miller. See slides and other materials at the July 16 Fire Science Community of Practice Meeting page.
The Risk Community of Practice July meeting was "live" from the first Risk CoP meeting in Golden, CO. On July 18, 2019, after some brief announcements, the group heard short presentations from the PIs of the FY19 Risk CoP funded projects:
Title slide from Jaiswal, Nassar, et al. Risk project - Assessing the risk of global copper supply disruption from earthquakes.
Stakeholder engagement is an important piece of the USGS Risk Plan. But what does it mean to engage with stakeholders? What does co-production mean? What tools are used for engaging stakeholders and over what timelines during the course of a project? What types of challenges arise during stakeholder engagement? What are some of the surprising considerations to keep in mind while working with stakeholders?
This special session was live from the Risk CoP meeting in Golden, CO and featured a panel discussion on stakeholder engagement. Panelists answered the following questions: 1) What does stakeholder engagement mean to you? What does co-production mean to you? 2) When, during the course of a project, do you engage your stakeholders? 3) Describe three tools you use for engaging stakeholders? 4) Can you give an example of a challenge you have faced in doing stakeholder engagement and how you overcame these challenges? (Paperwork Reduction Act, protected information, confidentiality issues) 5) What are some surprising considerations to keep in mind when doing stakeholder engagement? (e.g., inclusivity, ethics, manner of approach).
This month’s Risk CoP summary was provided by Kris Ludwig.
Recordings are available on the Risk CoP meetings page (log in required).
Anyone following this blog may notice that I am making an effort to get up to speed to the present day, but am still a little bit behind. I still have great optimism about catching up, and these posts may help you reminisce about the summer.
At the July 10, 2019 CDI Monthly Meeting, we heard a proposal for ways to increase reusability of USGS datasets, and presentations from two map-based visualization and analysis tools. In addition, Kevin Gallagher reported on demographics, presentation materials, and take-aways from the CDI Workshop “From Big Data to Smart Data” that was held in June 2019 in Boulder, CO.
Responses to the CDI post-workshop survey showing the varied job descriptions in our community.
Richie Erickson presented a Scientist’s Challenge in exploring the use of Jupyter Notebooks to increase reusability of USGS datasets. He is focusing on smaller, project-level datasets that require explanation of disciplinary expertise and statistical analyses. To learn more, you can get in contact with Richie Erickson at firstname.lastname@example.org. See his slides here.
Image of the CDI-funded Online Landslide Inventory.
Ben Mirus’s presentation on a new national landslide inventory highlighted important considerations when integrating incomplete and disparate data. State boundaries often showed mismatches in data quantity or quality. Other topics of CDI interest included defining confidence metrics for the landslides, deciding on dataset update frequency, putting data releases through internal review, best practices for viewing heterogeneous data, identifying areas that need better data collection, and links from our science to governmental policy. Read more at Landslide Risks Highlighted in New Online Tool. This project is an FY18 CDI Funded Project, which more information at its ScienceBase page.
Example of US Topo map with National park boundary and water data.
Elizabeth McCartney and Greg Matthews’ presentation on the National Digital Trails Network showed a system that took existing trails and then uses an algorithm to identify and evaluate potential connections between trail systems using data like land type (owner), slope, and hydrography/river crossings. If you are interested in learning more you can contact the team at any of the following addresses: email@example.com, firstname.lastname@example.org, email@example.com.
The recording of the meeting is available at the monthly meeting page if you are signed in as a CDI member.
In addition to the meetings described below, several collaboration areas met during the face-to-face CDI Workshop in Boulder, CO, June 3-7!
Chris Gorgolewski presented on “Google Dataset Search: Facilitating data discovery in an open ecosystem."
Talk description: There are thousands of data repositories on the Web, providing access to millions of datasets. In this talk, I will discuss recently launched Google Dataset Search, which provides search capabilities over potentially all dataset repositories on the Web. I will talk about the open ecosystem for describing and citing datasets that we hope to encourage and the technical details on how we went about building Dataset Search. Finally, I will highlight research challenges in building a vibrant, heterogeneous, and open ecosystem where data becomes a first-class citizen.
Related links: https://toolbox.google.com/datasetsearch (Accessible when not signed in with a Dept of Interior Google account), https://www.blog.google/products/search/making-it-easier-discover-datasets/
Slide from Chris Gorgolewski's talk on Google Dataset Search.
The recording can be found on the ESIP Tech Dive meetings page. Dave Blodgett and Rich Signell are the group leads.
Pete Doucette provided a review of recent AI/ML-related Strategic Science Planning at USGS. This included thoughts captured from the recent USGS 21st Century Science Workshop (May 2019) at the National Conservation Training Center, and the CDI Workshop in Boulder, CO (June 2019).
The recording can be found on the AI/ML Meetings page. Pete Doucette and JC Nelson are the group leads.
The Semantic Web Working Group's June discussion centered on persistent identifiers for metadata records and vocabularies that are consistent with the FAIR principles. The group identified next steps on persistent identifiers for metadata records (could DataCite DOIs be used?) and next steps for achieving FAIR vocabularies (persistent identifiers for keywords, which is related to encouraging or requiring keywords that are from online vocabularies, and will be a step toward interoperability of vocabularies through use of ontologies.)
Text contributed by Fran Lightsom, SWWG lead! See more at the SWWG meeting notes page.
The group heard an engineer's perspective on risk from from Nico Luco who discussed the Earthquake Hazards Program's “Engineering and Risk” project, that contributes to delivering information for building codes and risk assessments. Next, Nate Wood provided an overview of the "Strategic Hazard Identification and Risk Assessment (SHIRA) on DOI Resources" Project, including an introduction to the DOI Risk Map, related data resources, and a relative threat matrix currently in development. This month’s summary is contributed by Risk CoP co-lead Kris Ludwig!
See more at the Risk Community of Practice Meetings page.
Related publication: Wood, N., Pennaz, A., Ludwig, K., Jones, J., Henry, K, Sherba, J., Ng, P., Marineau, J., and Juskie, J., 2019, Assessing hazards and risks at the Department of the Interior—A workshop report: U.S. Geological Survey Circular 1453, 42 p., https://doi.org/10.3133/cir1453.
The Software Development Cluster welcomed new cluster co-lead Jeremy Newson, and reminded participants that the USGS Software Management Website is up and running at https://www.usgs.gov/products/software/software-management/.
At the June meeting, the cluster reviewed the many related sessions at the CDI workshop, including Software Release Q&A, the Software Development Cluster Breakout Session, a Software Release Practicum, and a Software Birds-of-a-Feather Lunch. Discussions in those sessions included considerations and ideas for cross-USGS collaboration, institutional support, and software developer career paths at the USGS.
Some ideas and take-aways from the discussions include:
Full notes can be found at the workshop Slides, Recordings, and Notes page (if you log in as a CDI member). Cassandra Ladino, Michelle Guy, and Jeremy Newson are the cluster leads.
The May 8, 2019 CDI Monthly Meeting featured two CDI project teams and a presentation about NSF-funded lidar data management capabilities.
Hans Vraga presented on the motivation and technical details of an Ice Jam Hazard website and reporting system. The cloud-first system demonstrated use of the latest cloud technologies in a USGS mobile-friendly application. Hans is part of the Web Informatics and Mapping (WIM) team, that develops web-based tools that support USGS science and other federal science initiatives. You can see some of their other projects here: https://wim.usgs.gov/i/projects/
Jess Walker presented on her experience in developing a workflow for lidar processing and analysis in the cloud for USGS datasets. Working with the USGS Cloud Hosting Solutions team, she searched for solutions for processing and analyzing smaller-size (long-tail) lidar datasets using software like Entwine (https://entwine.io/) and Potree (http://potree.org/).
Chris Crosby from UNAVCO showed how OpenTopography (https://opentopography.org/) facilitates community access to high-resolution, Earth science-oriented, topography data, and related tools and resources. He also described upload and archiving for small to moderate sized topographic datasets in the Community Dataspace.
The OpenTopography Tool Registry provides a community populated clearinghouse of software, utilities, and tools oriented towards high-resolution topography data (e.g. collected with lidar technology) handling, processing, and analysis.
Here’s a roundup of recent CDI collaboration area topics from the month of May!
VeeAnn Cross Cross, VeeAnn A and Peter Schweitzer Schweitzer, Peter N. reviewed use of keywords in the USGS Science Data Catalog. Choosing good keywords is an important part of creating a USGS data release, and there is an opportunity to work together to better align the terms being used. One tip is to make sure there are USGS Thesaurus and ISO terms being used, and not to make up keywords that are not part of one of the suggested vocabularies.
Here’s more guidance on keywords and suggested vocabularies from our trusty Data Management website:
Recent post on the Metadata Reviewers forum: Data Dictionaries as a standalone product?
Announcement: If you are using code.usgs.gov, the official source code archive for USGS, there is a collaborator request form and a public repository request process. Contact firstname.lastname@example.org for these resources.
Kyle Goodwin from GitLab presented on “GitLab as a web-based DevOps lifecycle tool.” GitLab is not Github, although they both use git, a distributed version-control system for tracking changes in source code during software development. GitLab is a platform for a DevOps-driven software development lifecycle. GitLab is the official source code archive for USGS (https://code.usgs.gov/public/) and provides a toolchain that includes project management, software repo, CI/CD (continuous integration/continuous deployment), metrics, and monitoring. Some groups use GitHub for repository features but, GitLab for CI/CD. https://about.gitlab.com/devops-tools
Did you know that members of the Semantic Web Working Group created the buzzword bingo sheets for the CDI workshop? Thank you to Peter Schweitzer and Fran Lightsom for coordinating. Do you hear the following during CDI presentations? Leverage, game changer, compliance, smart data, portal, authoritative, internet of things, analytics, best practices, takeaway, innovative, crowdsourcing, buy-in, stakeholder, framework, workflow, carrot and stick, pitch deck, quick win, modernization, cultural change...
A couple example bingo sheets:
Chris Holmes presented on the SpatioTemporal Asset Catalog (STAC) specification, an emerging standard to make it easier to find geospatial information. It aims to enable a cloud-native geospatial future by providing a common layer of metadata for search and discovery, while playing well with the web and existing geospatial standards.
Find out more about STAC at http://stacspec.org/
Chris Holmes is also involved with the Radiant Earth Foundation, which you can read more about here: Creating a Machine Learning Commons for Global Development
The Tech Stack working group meets jointly with the ESIP Information Technology and Interoperability Committee, and is led by Dave Blodgett Blodgett, David L. and Rich Signell Signell, Richard P. .
Tamar Norkin Norkin, Tamar and Ricardo McClees-Funinan McClees-Funinan, Ricardo presented on “Behind the Scenes at ScienceBase: How Data Release happens in your USGS Trusted Digital Repository (TDR).”
They highlighted the ScienceBase Data Release (SBDR) Tool, a very handy way to start your USGS data release. The SBDR Tool can be found here: https://www.sciencebase.gov/datarelease.
For data release questions and requests, you can get in touch with the data release team at email@example.com.
After filling out the ScienceBase Data Release Tool form, you get a new landing page in ScienceBase, a reserved Digital Object Identifier, and an email with instructions!
Chris Barber Barber, Christopher from USGS EROS presented on XGBoost in Continuous Change Detection and Classification (CCDC). Chris explained how XGBoost (Extreme Gradient Boost, an open-source software library which provides a gradient boosting framework) improved the efficiency and accuracy of segment classification and land cover extraction for LCMAP (Land Change Monitoring, Assessment, Projection). Chris gave a good introduction to the concepts of decision trees, decision tree ensembles, and boosted trees. However, he urged us to remember that there is no substitute for appropriate training data. Email Chris for his extensive reference list - firstname.lastname@example.org.
Peter Esselman Esselman, Peter C. , USGS Great Lakes Science Center also presented on Deep learning to quantify benthic habitat.
From Peter Esselman's talk - image showing various tools that are the future of Great Lakes science.
The Citizen-Centered Innovation group discussed the final draft of the OSTP Report on Prizes and Citizen Science Projects, the USGS Open Innovation Strategy, and the DOI Generic Information Collection Request. They also highlighted relevant upcoming seminars and events in the broader Federal sphere.
Sara McBride @McBride, Sara K of the Earthquake Science Center presented on Social Science 101: a Primer. Some conclusions: Social science is a big field with a lot of disciplines, each examining the human experience with its own unique lens. Doing it well requires years of study, therefore DIY social science is not recommended. There are a number of social scientists within the USGS: reach out and ask us questions!
Some related resources
Wilkins EJ, Miller HM, Tilak E, Schuster RM (2018) Communicating information on nature-related topics: Preferred information channels and trust in sources. PLoS ONE 13(12): e0209013. https://doi.org/10.1371/journal.pone.0209013
https://my.usgs.gov/hd/: HDgov is a multi-agency website for all things human dimensions of natural resources. Here you can access a variety of resources to assist you in your work.
Dale Cox Cox, Dale A. presented on SAFRR (Science Application for Risk Reduction) Projects and Scenarios for Risk Reduction.Dale has been involved in many scenario projects and is in the process of looking back and evaluate some of the scenarios. What is a scenario? Principles of a scenario: A single, large, but plausible event that we need to be ready for, integrate across many disciplines, use best hazard science, consensus among leading experts, create study with community partners, and results presented in products that fit the user, not the scientist.
Some related resources:
USGS Earthquake Scenario Map: https://earthquake.usgs.gov/scenarios/related.php
Slides explaining the components of a scenario and the Science Application for Risk Reduction (SAFRR) scenarios.
The Risk group also announced its inaugural RFP Awards!
Chris Merkes Merkes, Christopher M. from UMESC presented on Choosing the right eDNA assay: Developing standards for Limit of Detection and Limit of Quantification. This work is planned to be soon released in a new environmental DNA journal.
A resource discussed: https://github.com/cmerkes/qPCR_LOD_Calc
Merkes CM, Klymus KE, Allison MJ, Goldberg C, Helbing CC, Hunter ME, Jackson CA, Lance RF, Mangan AM, Monroe EM, Piaggio AJ, Stokdyk JP, Wilson CC, Richter C. (2019) Generic qPCR Limit of Detection (LOD) / Limit of Quantification (LOQ) calculator. R Script. Available at: https://github.com/cmerkes/qPCR_LOD_Calc. DOI: https://doi.org/10.5066/P9GT00GB.
Slide fro Chris Merkes' talk illustrating Limit of Detection and Limit of Quantification.
The Software Development Cluster discussed Docker basics for code development.
More notes and links at their meeting notes in their Meeting Notes (accessible to DOI users).
All CDI Collaboration Areas may be browsed on the CDI wiki.
Summary extracted from notes of Fran Lightsom Lightsom, Frances L. , lead of the Metadata Reviewers group:
Sheryn Olson Olson, Sheryn Joy demonstrated the metadata collecting system used by MonitoringResources.org to encourage discussion of how it might be simpler and easier to use, as well as good ideas that the rest of us can copy. MonitoringResources.org is part of the Pacific Northwest Aquatic Monitoring Partnership (PNAMP) and uses the metadata to provide an index of monitoring activities, especially the ecology of streams of the U.S. Pacific Northwest, and the procedures, protocols, and monitoring designs that are in use.
View more notes and the presentation slides on the Metadata Reviewers Meetings page.
Summary provided by Derek Masaki Masaki, Derek , co-lead of the DevOps group:
Presenters: Kevin Portanova, Director of IT for Public and Indian Housing, and Mel Hurley, DevOps Manager. The presentation provided an overview of the shift that HUD is taking away from traditional on-premise IT operations toward cloud-focused DevOps. Kevin and Mel took us through their process of re-organizing a contractor based IT environment, re-factoring their development process, and creating a Federal employee centric staff oriented toward Agile and a DevOps workflow in the Microsoft Azure environment.
See the slides on the DevOps Meetings page.
The DMWG heard two presentations, first from John Faundeen Unknown User (email@example.com) and Natalie Latysh Latysh, Natalie about “Becoming a USGS Trusted Digital Repository,” and second from Viv Hutchison Hutchison, Vivian B. and John Faundeen on “Progress on a USGS Data Manager Position Description Series.”
The slides and recording are posted on the meeting page.
John Karabaic presented on Pachyderm, a data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. Read the docs here: http://docs.pachyderm.io/en/latest/index.html
Tech Stack calls are joint with the ESIP Interoperability and Technology Tech Dive Webinars. You can review the recording here.
Kevin Lafferty Lafferty, Kevin D. , senior ecologist at Western Ecological Research Center, presented on White Shark eDNA. In recent work he has been refining methods to get better data from white shark eDNA. Kevin is based in Santa Barbara, CA, and surely made many people jealous while describing data collection with instruments on paddle boards.
View the recording on the Bioinformatics Meetings page.
Kevin is looking for new collaborations within USGS and you can email him at firstname.lastname@example.org if interested. (Remember: data collection with instruments on paddle boards.)
Sophia Liu Liu, Sophia led a discussion covering many topics, including the OSTP Draft Report to Congress for the Crowdsourcing and Citizen Science Act, a Dept of the Interior Generic Information Collection Request, the USGS Open Innovation Strategy, the CitizenScience.gov Website, including USGS CCS Projects, and Past and Upcoming Events like the Citizen Science Association (CSA) Conference - March 13-17, 2019, and the Federal Crowdsourcing Webinar - Episode 1: Citizen Science, and upcoming Federal Crowdsourcing Webinars that can currently be found on this page: https://digital.gov/events/. Sophia’s use of Mentimeter added a great element of interactivity to the meeting. See more on the group wiki page.
Kris Ludwig Ludwig, Kristin A. and Dave Ramsey Ramsey, David W. lead the Risk CoP and hosted a call with presentations about the benefits of communities of practice (Leslie Hsu lhsu, CDI Coordinator) and user engagement in the development of ShakeCast (Dave Wald Wald, David J. , Seismologist).
With respect to user engagement, Dave shared several titles that present “logical approaches for bringing products to users,” including The Power of Habit, Contagious, To Sell is Human, Nudge, Made to Stick, Diffusion of Innovators, and The Undoing Project. Book club, anyone?
View the presentations and recording on the Risk Meetings page.
Reads related to user engagement recommended by David Wald.
Cassandra Ladino Ladino, Cassandra C. led a discussion on building connections, inspired by this Better Scientific Software post: Building Connections and Community within an Institution.
The group had recently fielded a question about desktop installers, and the challenges of code signing. An internal site on application and script signing was shared. Some group members were also of the opinion that providing a method to install your application using Anaconda (on all OSs) was adequate.
A huge thanks to the three CDI Project teams who presented at our April Monthly Meeting.
Caitlin Andrews Andrews, Caitlin Marie , a landscape ecologist in the Southwest Biological Science Center, explained how she used Rshiny and Amazon Web Services to create an interactive, online, front-end for a proven model of ecosystem water balance, SOILWAT2. This tool helps to predict and understand site-specific risk of future drought. Lots of lessons here for people who want to make user-friendly online tools out of more traditional scientific models within the USGS IT ecosystem. Code repository at https://github.com/DrylandEcology
Matt Neilson Neilson, Matthew E. , a fishery biologist and co-lead for the Nonindigenous Aquatic Species Database program, delivered the line of the day: We are living in a machine-readable world. His project uses natural language processing and the xDD (eXtract Dark Data, formerly GeoDeepDive) literature database to improve, modernize, and greatly increase the efficiency of literature review. For people who used to walk to the library and photocopy stuff (and record radio songs on cassettes and dial with rotary phones), this is strange, but I will attempt to evolve with the times. See more information, like code repositories, in the Related External Resources links on the project's ScienceBase page.
Jon Warrick Warrick, Jonathan , research geologist in the Coastal/Marine Hazards and Resources Program described the software tools, resources, and training workshops developed to allow USGS scientists to apply deep learning to remotely sensed imagery and better understand natural hazards and habitats. The 2 in-person workshops on these tools held in 2018 were able to accommodate only a fraction of the interested applicants. The CDI hopes to be able to provide more trainings like this to help build deep learning expertise and capacity in the USGS. See more at https://github.com/dbuscombe-usgs/cdi_dl_workshop and https://github.com/dbuscombe-usgs/dl_tools.
Log in to see the meeting recording and slides at the meeting page.
Cassandra Ladino led a brainstorming session for topics that could be discussed within the Software Development cluster, using sli.do to collect ideas and trello to organize them. Some ideas included: code.usgs.gov - what is it, who should use it and when; Using US Web Design System in USGS web sites; Docker training for distributing scientific software; Python APIs using Swagger and/or Flask; How to grow grassroots development efforts to enterprise systems; Creating a community of practice for unit testing code so that it can be easily reviewed by anyone in the software dev community; Should there be separation between scientific software and web development software discussions? (pros and cons). Lots of exciting topics!
Risk Community of Practice leads Kris Ludwig and Dave Ramsey introduced the new Risk Community of Practice, reviewed the USGS Risk Plan and implementation plans for FY19, and announced the FY19 Risk RFP. The purpose of the group is to
build connections across centers, programs, mission areas
create a central point of contact for USGS risk research and applications
identify needs and opportunities to benefit the community
generate project ideas
share resources, expertise
Besides the Risk Plan, another recent publication mentioned was Assessing Hazards and Risks at the Department of the Interior—A Workshop Report, by Nate Wood, Alice Pennaz, Kristin Ludwig, Jeanne Jones, Kevin Henry, Jason Sherba, Peter Ng, and others.
Mattia Almansi from Johns Hopkins University presented on Integrating SciServer and OceanSpy. OceanSpy is an open-source and user-friendly Python package that enables scientists and interested amateurs to use ocean model data sets with out-of-the-box analysis tools. OceanSpy builds on software packages developed by the Pangeo community (in particular xarray, dask, and xgcm). OceanSpy accelerates and facilitates exploration (including visualization) of terascale data. (Adapted from the presentation abstract.)
See more, including a link to the recorded session, on the group presentation website, hosted by ESIP - the Earth Science Information Partners. TSWG contacts are Dave Blodgett Blodgett, David L. and Rich Signell Signell, Richard P. .
The Semantic Web Working Group held a discussion about Semantic Web elements at the upcoming CDI Workshop. Ken Bagstad mentioned the breakout session he is co-leading at the workshop, which will include semantics in the context of predictive modelling, intersecting with artificial intelligence and machine learning. Other topics included FAIR (findability, accessibility, interoperability, and reusability) in machine- and human-readable contexts and the importance of standard data dictionaries.
What I learned at the AI/ML group call:
USGS is setting up a new machine for AI, it is named Tallgrass after this NPS park in Kansas
Projected timeline for the set up: mid April - Tallgrass Installation; Early May - friendly testing; early June - general availability.
Reminder of what GPUs are vs. CPUs
AI for Ecosystem Services: What if our data and models could talk to one another, and decision makers could use scientific information to more quickly and reliably answer questions about today’s most urgent problems? Find out more at http://www.integratedmodelling.org
JC pointed out some activity on the AI/ML forum and encouraged members to post
Group leads reminded members to contribute to a spreadsheet for collecting USGS AI/ML project descriptions to communicate to USGS leadership.
You should think of this image whenever we mention the Tallgrass infrastructure. (from the NPS Tallgrass Prarie website)
Cassandra Ladino led the working group in a discussion of topics to be discussed at the CDI Workshop or at future DMWG meetings. Some ideas for further discussion included:
Data Management Plans - streamlining process from DMP to publishing; enforcing; hosting
QMS (Quality Management System for USGS labs) integration with data management and records management
Metadata for the National Digital Catalog
More information and guidance on USGS Software Release
UAS (Unmanned Aircraft Systems/AKA Drone) data
Data sharing agreements
Martin Folkoff, lead DevOps engineer at Booz Allen Hamilton provided a technical overview of the DevOps environment he has designed and the CI/CD (continuous integration/continuous deployment) pipeline employed by his teams at BAH. He provided a look at the tools he uses to orchestrate his production environments.
The Metadata Reviewers Community of Practice will be hosting a breakout session at the CDI Workshop to provide guidance for data and metadata review, and tips and tricks for data and metadata authors. Virtual participation is planned.
The ISO Content Specs project will be hosting workshop sessions on Thursday and Friday at the CDI Workshop. The sessions will focus on collecting requirements for metadata specification modules, most likely modules for experimental data, computational data, and observational data. To learn more, contact Dennis Walworth Walworth, Dennis H. , Fran Lightsom Lightsom, Frances L. , or Lisa Zolly Zolly, Lisa .
At the March 13, 2019 monthly meeting, CDI’s executive sponsor Kevin Gallagher talked about the theme of this year’s CDI workshop: From Big Data to Smart Data - this concerns turning our huge volumes of diverse data into usable, actionable, integratable, or “smart” data. Registration for the workshop (June 4-7, 2019 in Boulder, CO) is open and can be found on the workshop wiki page.
We heard presentations from three FY18 CDI Funded Projects:
Wesley Daniel Daniel, Wesley Michael presented on the Nonindigenous Aquatic Species Alert Risk Mapper and reported that the team will be posting a write-up of their challenges transitioning to ArcGIS Pro as part of their outcomes. See more accomplishments on their ScienceBase page.
Dennis Walworth Walworth, Dennis H. and Fran Lightsom Lightsom, Frances L. presented on the Transition to ISO metadata project and reported that the project team will host several activities at the CDI workshop, they are looking for users to test their interface. They are using the previously-funded mdEditor application (ScienceBase page) in their work.
Nate Wood Wood, Nathan J. and Jeanne Jones Jones, Jeanne M. presented on the Department of Interior Risk and CDI Risk Map. They reported many links that are available for Department of Interior users to test out, including data description, codebase, the risk map, GeoServer, and the API. CDI members, go to the meeting page and log in to view their slides - links are on the last slide.
The DOI Risk Workshop Report is out! Wood, N., Pennaz, A., Ludwig, K., Jones, J., Henry, K, Sherba, J., Ng, P., Marineau, J., and Juskie, J., 2019, Assessing hazards and risks at the Department of the Interior—A workshop report: U.S. Geological Survey Circular 1453, 42 p., https://doi.org/10.3133/cir1453.
Hans Vraga from the Web Informatics and Mapping Program (WIM, wim.usgs.gov) gave an overview of the group, of which he is the Project Manager. WIM is a web development shop that has cooperators from both within and outside of the USGS. Some of their products include a SPARROW model output visualizer, StreamStats, and a WHISPers wildlife event reporting system (coming soon).
As you can imagine, their expertise is in high demand. Things they look for in cooperators include a match of scientific/subject matter expertise to complement their group’s technical expertise, the cooperator as an active product owner, focusing on development and minimizing time for operations, and fast turnaround time projects. Check out their website or contact Hans Vraga, Hans Wegmueller for more information.
In February, the group had two major questions come up for discussion - these were passed along to the appropriate committees and officials for guidance and answers were produced quickly!
First: Is there updated guidance on the volume of data necessary to trigger a separate data release? (As opposed to a table in a publication.) Short answer: Having the data in the paper is ok - however, if data is big enough to be moved into a supplemental section of the paper, it has to be a USGS data release.
Second: How should authors reference data that is not publicly available when writing a manuscript? Short answer: there is updated guidance on the FSP “Guide to Data Releases” page for data that are not available at the time of publication, or that have limited availability owing to restrictions, in the section Data Associated with a Publication.
John Stock @ of the USGS Innovation Center joined to talk about some opportunities available for postdoctoral research, future workshops, and future discussions related to AI/ML in the USGS. The joint USGS-NASA postdoctoral fellowships are now posted: https://geography.wr.usgs.gov/InnovationCenter/fellowship.html
Pete Doucette Doucette, Peter Joseph presented a talk “Ruminations on AI and Land Imaging.” He included a great intro on the difference between the AI and machine learning of decades ago versus the capabilities now (e.g. neural networks versus DEEP neural networks). Several land imaging projects and datasets at the USGS are becoming more “analysis-ready” for data science, predictive analytics, and to inform decisions. For example, see “Continuous change detection and classification of land cover using all available Landsat data.” Zhu and Woodcock 2014.
A major theme was the need for the combination of disciplinary expertise and AI/ML expertise, essentially team science, in order to reach the full potential of AI/ML. (See the NAS report Enhancing the Effectiveness of Team Science.)
A White House Fact Sheet on “Accelerating America’s Leadership in Artificial Intelligence” was shared with the group by Mona Khalil @mkhalil and Leah Colasuonno Colasuonno, Leah Taylor .
A few slides from Pete Doucette's talk on AI and Land Imaging.
Cassandra Ladino Ladino, Cassandra C. stepped in to lead the February Semantic Web Working Group discussion, which focused on the theme of FAIR (Findable, Accessible, Interoperable, Reusable) in USGS. The group discussed ideas for a proposed FAIR Workshop, including the topic of new approaches and technologies to further enhance FAIRness at USGS. See the meeting notes for more resources and references.
The joint ESIP Tech Dive - CDI Tech Stack presentation was on “Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo,” by Scott Henderson, University of Washington. “The integration of new technologies with several high-level Python packages are enabling Cloud-native workflows and circumvent the bottleneck of downloading large amounts of data.”
Aptly summarized: “If that doesn’t get people excited I don’t know what will,” said Rich Signell Signell, Richard P. , co-chair of the Tech Stack Group.
Screenshot from a demo linked to the post "Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo."
The latest monthly eDNA webinars organized by Scott Cornman Cornman, Robert S. was on CALeDNA (California Environmental DNA), by Rachel Meyer of UCLA. CALeDNA capitalizes on the enthusiasm of citizen scientists - they provide kits for collection of data in the field. Data collectors also take iNaturalist observations for benchmarking. The data are provided online for the public to identify patterns, and are also used for academic research on topics like phylogenetic diversity and functional diversity.
CALeDNA used the Kobo toolbox to build their data collection form, they found it to be the most robust platform for cell phone data collection. https://www.kobotoolbox.org/
rANACAPA - an R package developed so that non-specialists without community ecology background can generate the relevant plots. “Ranacapa: An R package and Shiny web app to explore environmental DNA data with exploratory statistics and interactive visualizations” https://f1000research.com/articles/7-1734/v1
Check out one of their case studies and the data visualizations available! https://data.ucedna.com/research_projects/pillar-point
A few slides from Rachel Meyer's talk on the California eDNA program.
The Software Development Cluster hosted a discussion on Cloud and Big Data in the Cloud. Cassandra Ladino started off the discussion with a presentation on Cloud and Big Data, including a summary of resources she has been using to learn more. There is information in the notes on how to sign up for a USGS Cloud Hosting Solutions Sandbox.