The CDI Collaboration Areas are keeping me busy. You can get to all of these groups and sign up for mailing lists on the CDI Collaboration Area wiki page.
From upper left corner, clockwise: DevOps: image from Tidelift website; SoftwareDev: logo for uvicorn; Risk: Impact360 worksheet; AI/ML: image from AI/ML DELTA presentation; Semantic Web: image from Garillo and Poveda-Villalon; Open Innovation: image from OI wiki page; Tech Stack: image from Unidata gateway webpage; Usability: image from Sayer's Paperwork Reduction Act presentation
In April the Metadata Reviewers group dove into a question about including the date of a revision or release in the title of the data release. Doing so would help to distinguish between different versions of a data release. After much discussion the group concluded that two metadata records should not have the same title in their citation elements.
See more notes on the discussion at their Meetings wiki page.
The DevOps group heard a presentation from Tidelift. Tidelift partners with open source maintainers in order to support application development teams. This saves time and reduces risk when using open source packages to build applications.
See the recording and slides on the DevOps Meeting page. If you are interested in using Tidelift for a USGS application, get in touch with Derek Masaki at dmasaki@usgs.gov. If you'd like a presentation from Tidelift, contact Melanie Gonglach at melanie@tidelift.com.
The group discussed "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web" by Daniel Garijo and Marıa Poveda-Villalon. The discussion focused on sections 2 and 3 of the paper, URIs (uniform resource identifiers) and Documentation. The group recognized that implementation of the best practices in the paper (for example, stable, permanent identifiers) would depend not only on semantic specialists, but also those who set policy for the USGS network. This point was communicated to the group that is working on enabling FAIR practices in the USGS.
See more at the Semantic Web meetings page.
Julien Chastang presented on the Unidata Science Gateway (https://science-gateway.unidata.ucar.edu/) Unidata is exploring cloud computing technologies in the context of accessing, analyzing, and visualizing geoscience data. From the abstract: "With the aid of open-source cloud computing projects such as OpenStack, Docker, and JupyterHub, we deploy a variety of scientific computing resources on Jetstream for our scientific community. These systems can be leveraged with data-proximate Jupyter notebooks, and remote visualization clients such as the Unidata Integrated Data Viewer (IDV) and AWIPS CAVE."
Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.
Lisa Zolly presented on changes coming with the USGS Science Data Catalog version 3. Today, the Science Data Catalog (https://data.usgs.gov/) has more than 21,000 metadata records. In order to serve its human and machine stakeholders, a number of changes are planned in order to address the changing landscape of federal data policy, substantial growth of the catalog, improvement of workflows, improvement of usability, and more robust reporting and metrics.
Slides and recording are posted at the meeting wiki page.
Jack Eggleston (USGS), John Stock (USGS), and Michael Furlong (NASA) presented on "Fine scale mapping of water features at the national scale using machine learning analysis of high-resolution satellite images: Application of the new AI-ML natural resource software - DELTA." The availability of high-resolution satellite imagery, combined with machine learning analysis to rapidly process the satellite imagery, provides the USGS with a new capability to map natural resources at the national scale.
The recording is posted at the meeting wiki page.
James Sayer presented on the Paperwork Reduction Act (PRA) and Usability Testing. The PRA is designed to protect the public from inappropriate data collection. All agencies have their own PRA procedures, so implementation in other agencies won't necessarily translate to USGS implementation. James reviewed Fast Track procedures and exclusions. His advice included to start early in thinking about PRA in your usability work, and to talk to your ICCO (Information Collection Clearance Officer) if you have any questions.
The slides, notes, and recording are posted on the meeting wiki page. Do you have more questions? Contact James at jsayer@usgs.gov.
The Risk Community of Practice April meeting was part 3 of a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. Emphasis was given to the tools for product evaluation/testing ("[Re]Solve") and integrating solutions into strategy ("[Re]Integrate"). Worksheets were provided to "Create and Test a Solution in Three Acts." A follow-up session on April 23 discussed examples of the worksheets.
Access the slides and recording, and handouts at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).
April was Citizen Science Month! At the Open Innovation meeting, Sophia B Liu (USGS Open Innovation Lead) provided an overview of the various open innovation efforts inside and outside of government that have emerged in response to COVID-19. She also discussed The Opportunity Project Earth Sprint and proposed Problem Statements.
See more information and list of COVID-19 sites at the meeting wiki page.
James Meldrum and Ned Molder of the USGS Fort Collins Science Center presented on Analysis of stakeholder input on USGS fire science communication and outreach, science priorities, and critical science needs. The group also heard updates on the USGS Fire Science strategy, recent fire activity, and held a discussion on "How is Covid 19 affecting your fire science"?
Contact Paul Steblein (psteblein@usgs.gov) or Rachel Loehman (rloehman@usgs.gov) for more information.
The Software Dev cluster had Brandon Serna and Jeremy Fee present about their work using FastAPI with some comparisons to Flask. I am not a developer so I will summarize by pasting some links, tag lines, and interesting things I heard.
Recommended resources.
I'm going to take a little bit of space to list some of the things I Googled while listening to this call, because to me these descriptions (and some of the logos) are fascinating. It would be fun to do a tagline-logo-name matching game.
See more at the Software Dev wiki meetings page.
We continued our exploration of 2019's CDI funded projects in April's monthly meeting with presentations on the Climate Scenarios Toolbox, developing cloud computing capability for camera image velocity gaging, and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database.
For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki.
Aparna Bamzai-Dodson, USGS, presented on the Climate Scenarios Toolbox (now renamed to the Climate Futures Toolbox!), an open-source tool that helps users formulate future climate scenarios for adaption planning. Scenario planning is a way to consider the range of possible outcomes by using projections based on climate data to develop usually 3-5 plausible divergent future scenarios (ex: hot and dry; moderately hot with no precipitation change; and warm and wet). Resource managers and scientists can use these scenarios to help predict the effects of climate change and attempt to select appropriate adaptation strategies. However, climate projection data can be difficult to work with in areas of discovery, access, and usage, involving multiple global climate model repositories, downscaling techniques, and file formats. The Climate Futures Toolbox aims to take the pain out of working with climate data.
The creators of the Toolbox wanted a way to make working with climate data easier by lowering the barrier to entry, automating common tasks, and reducing the potential for errors. The Climate Futures Toolbox uses a seamless R code workflow to ingest historic and projected climate data and generate summary statistics and customizable graphics. Users are able to contribute open code to the Toolbox as well, building on its existing capabilities and empowering a larger user community. The Climate Futures Toolbox was created in collaboration with University of Colorado-Boulder's Earth Lab, the U.S. Fish and Wildlife Service, and the National Park Service.
CDI members are encourage to become engaged in the Toolbox by installing and using it, providing feedback on issues, and contributing code to the package. Since April's monthly meeting, the project has developed and undergone renaming, so this is a rapidly evolving endeavor.
Frank Engel at the USGS Texas Water Science Center presented next on a CDI project involving non-contact stream gaging within a cloud computing framework.
Measuring stream flow is an important aspect of USGS' work in the Water Mission Area, and stream gaging, a way to measure water quantity, is a technique with which many scientists are familiar. However, it is sometimes difficult to obtain measurements with traditional stream gaging, like at times of flooding, or when measurement points are unsafe or unreachable. Additionally, post flood measurement methods can often be expensive and not as accurate.
To get around these issues, scientists have developed non-contact methods with which to measure water quantity. For example, cameras are utilized to view a flooding river, which can produce a velocity measurement after processing and other analysis steps. This is a complicated method and requires many steps and extensive training. Thus, the goal of this project is to make this process work automatically utilizing cloud computing and IoT.
The first step required building a cloud infrastructure, with the help of Cloud Hosting Solutions (CHS). This involves connecting the edge computing (camera and raspberry PI footage of a stream) to an Amazon Web Services (AWS) IoT system and depositing camera footage and derivative products into a S3 bucket. The code for this portion of the product is in a preliminary GitLab repository that is projected to be published as a part of the long-term project. The team is also still working toward building the infrastructure through to data serving and dissemination.
Other successes accomplished with this project so far include auto-provisioning (transmitting location and metadata) of edge computing systems to the cloud; establishing global actions (data is transmitted to the cloud framework and can roll into automated processing, like extracting video into frames); and building automated time-lapse computation.
Engel and the project team have taken away a couple lessons from their experience with this project: first, cloud computing knowledge takes a lot of work and time to acquire, and second, in the short term, It can be difficult to establish a scope that encompasses the needs and wants of all stakeholders.
Jason Ferrante with the Wetland and Aquatic Research Center discussed his team's project on establishing standards for eDNA data in the USGS Nonindigenous Aquatic Species database (NAS).
eDNA is genetic material released by an organism into its environment, such as skin, blood, saliva, feces. By collecting water, soil, and air samples, scientists can detect the presence of a species with eDNA. Ferrante's project aims to combine the traditional specimen sightings already available in the NAS with eDNA detections for a more complete distribution record and improved response time to new invasions.
There is currently a need for an open, centralized eDNA database. eDNA data is currently scattered among manuscripts and reports, and thus not easily retrievable via web searches. Additionally, there are no databases dedicated to Aquatic Invasive Species (AIS), which are the species of interest for this project. A centralized, national AIS viewer will allow vetting and integration of data from federal, academic, and other sources, increase data accessibility, and improve coordination of research and management activities.
In order to successfully create a centralized AIS viewer, community standards need to be established so that data can be checked for quality and validity, especially within the FAIR data framework (Findable, Accessible, Interoperable, and Reusable). To establish community standards and successfully integrate eDNA into NAS, the project team accomplished several objectives:
1) Experimental Standards
2) Stakeholder Backing
3) Integration into NAS
Some challenges faced during the project included gaining consensus on the questions for the pre-submission form; staying organized and in communication; and meeting the needs of managers and researchers. Ferrante and the project team would love to follow up with CDI for help developing new tools which use eDNA data across databases to inform management; and providing feedback on an upcoming manuscript about the project's process.