Confluence Retirement

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is scheduled for retirement on January 27th, 2023. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.

Blog from May, 2020

The CDI Collaboration Areas are keeping me busy. You can get to all of these groups and sign up for mailing lists on the CDI Collaboration Area wiki page.

From upper left corner, clockwise: DevOps: image from Tidelift website; SoftwareDev: logo for uvicorn; Risk: Impact360 worksheet; AI/ML: image from AI/ML DELTA presentation; Semantic Web: image from Garillo and Poveda-Villalon; Open Innovation: image from OI wiki page; Tech Stack: image from Unidata gateway webpage; Usability: image from Sayer's Paperwork Reduction Act presentation


4/6 Metadata Reviewers - revision or release information in titles

In April the Metadata Reviewers group dove into a question about including the date of a revision or release in the title of the data release. Doing so would help to distinguish between different versions of a data release. After much discussion the group concluded that two metadata records should not have the same title in their citation elements.

See more notes on the discussion at their Meetings wiki page.

4/7 DevOps - managed open source with Tidelift

The DevOps group heard a presentation from Tidelift. Tidelift partners with open source maintainers in order to support application development teams. This saves time and reduces risk when using open source packages to build applications.

See the recording and slides on the DevOps Meeting page. If you are interested in using Tidelift for a USGS application, get in touch with Derek Masaki at dmasaki@usgs.gov. If you'd like a presentation from Tidelift, contact Melanie Gonglach at melanie@tidelift.com.

4/9 Semantic Web - implementing FAIR vocabularies and ontologies

The group discussed  "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web" by Daniel Garijo and Marıa Poveda-Villalon. The discussion focused on sections 2 and 3 of the paper, URIs (uniform resource identifiers) and Documentation. The group recognized that implementation of the best practices in the paper (for example, stable, permanent identifiers) would depend not only on semantic specialists, but also those who set policy for the USGS network. This point was communicated to the group that is working on enabling FAIR practices in the USGS.

See more at the Semantic Web meetings page.

4/9 Tech Stack - Unidata Science Gateway

Julien Chastang presented on the Unidata Science Gateway (https://science-gateway.unidata.ucar.edu/) Unidata is exploring cloud computing technologies in the context of accessing, analyzing, and visualizing geoscience data. From the abstract: "With the aid of open-source cloud computing projects such as OpenStack, Docker, and JupyterHub, we deploy a variety of scientific computing resources on Jetstream for our scientific community. These systems can be leveraged with data-proximate Jupyter notebooks, and remote visualization clients such as the Unidata Integrated Data Viewer (IDV) and AWIPS CAVE."

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.

4/13 CDI Data Management - changes to the USGS Science Data Catalog

Lisa Zolly presented on changes coming with the USGS Science Data Catalog version 3. Today, the Science Data Catalog (https://data.usgs.gov/) has more than 21,000 metadata records. In order to serve its human and machine stakeholders, a number of changes are planned in order to address the changing landscape of federal data policy, substantial growth of the catalog, improvement of workflows, improvement of usability, and more robust reporting and metrics.

Slides and recording are posted at the meeting wiki page.

4/14 Artificial Intelligence / Machine Learning - fine scale mapping of water features at the national scale

Jack Eggleston (USGS), John Stock (USGS), and Michael Furlong (NASA) presented on "Fine scale mapping of water features at the national scale using machine learning analysis of high-resolution satellite images: Application of the new AI-ML natural resource software - DELTA." The availability of high-resolution satellite imagery, combined with machine learning analysis to rapidly process the satellite imagery, provides the USGS with a new capability to map natural resources at the national scale.

The recording is posted at the meeting wiki page.

4/15 Usability - how the Paperwork Reduction Act affects usability studies

James Sayer presented on the Paperwork Reduction Act (PRA) and Usability Testing. The PRA is designed to protect the public from inappropriate data collection. All agencies have their own PRA procedures, so implementation in other agencies won't necessarily translate to USGS implementation. James reviewed Fast Track procedures and exclusions. His advice included to start early in thinking about PRA in your usability work, and to talk to your ICCO (Information Collection Clearance Officer) if you have any questions.

The slides, notes, and recording are posted on the meeting wiki page. Do you have more questions? Contact James at jsayer@usgs.gov.

4/16 Risk - Product evaluation/testing and integrating solutions into strategy

The Risk Community of Practice April meeting was part 3 of a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. Emphasis was given to the tools for product evaluation/testing ("[Re]Solve") and integrating solutions into strategy ("[Re]Integrate"). Worksheets were provided to "Create and Test a Solution in Three Acts." A follow-up session on April 23 discussed examples of the worksheets.

Access the slides and recording, and handouts at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).

4/17 Ignite Open Innovation - Open Innovation and COVID-19

April was Citizen Science Month! At the Open Innovation meeting, Sophia B Liu (USGS Open Innovation Lead) provided an overview of the various open innovation efforts inside and outside of government that have emerged in response to COVID-19. She also discussed The Opportunity Project Earth Sprint and proposed Problem Statements.

See more information and list of COVID-19 sites at the meeting wiki page.

4/21 Fire Science - stakeholder input on USGS Fire Science

James Meldrum and Ned Molder of the USGS Fort Collins Science Center presented on Analysis of stakeholder input on USGS fire science communication and outreach, science priorities, and critical science needs. The group also heard updates on the USGS Fire Science strategy, recent fire activity, and held a discussion on "How is Covid 19 affecting your fire science"?

Contact Paul Steblein (psteblein@usgs.gov) or Rachel Loehman (rloehman@usgs.gov) for more information.

4/23 Software Dev - FastAPI

The Software Dev cluster had Brandon Serna and Jeremy Fee present about their work using FastAPI with some comparisons to Flask. I am not a developer so I will summarize by pasting some links, tag lines, and interesting things I heard.

Recommended resources.

I'm going to take a little bit of space to list some of the things I Googled while listening to this call, because to me these descriptions (and some of the logos) are fascinating. It would be fun to do a tagline-logo-name matching game.

  1. FastAPI, https://fastapi.tiangolo.com/: FastAPI framework, high performance, easy to learn, fast to code, ready for production
  2. Flask: https://flask.palletsprojects.com/en/1.1.x/: web development, one drop at a time
  3. Hot reloading <- this sounds very exciting, and according to the internet it is "The idea behind hot reloading is to keep the app running and to inject new versions of the files that you edited at runtime. This way, you don't lose any of your state which is especially useful if you are tweaking the UI"
  4. Uvicorn: https://www.uvicorn.org/: The lightning-fast ASGI server
  5. Cookiecutter https://cookiecutter.readthedocs.io/en/1.7.2/: Better Project Templates
  6. Gunicorn: https://gunicorn.org/: Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy
  7. Pyenv: https://github.com/pyenv/pyenv: pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well
  8. Pipenv: https://pipenv-fork.readthedocs.io/en/latest/: Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world. Windows is a first-class citizen, in our world
  9. Hypercorn: https://pgjones.gitlab.io/hypercorn/: Hypercorn is an ASGI web server based on the sans-io hyper, h11h2, and wsproto libraries and inspired by Gunicorn

See more at the Software Dev wiki meetings page.


--
More CDI Blog Posts

We continued our exploration of 2019's CDI funded projects in April's monthly meeting with presentations on the Climate Scenarios Toolbox, developing cloud computing capability for camera image velocity gaging, and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Open-source and open-workflow Climate Scenarios Toolbox for adaptation planning 

Aparna Bamzai-Dodson, USGS, presented on the Climate Scenarios Toolbox (now renamed to the Climate Futures Toolbox!), an open-source tool that helps users formulate future climate scenarios for adaption planning. Scenario planning is a way to consider the range of possible outcomes by using projections based on climate data to develop usually 3-5 plausible divergent future scenarios (ex: hot and dry; moderately hot with no precipitation change; and warm and wet). Resource managers and scientists can use these scenarios to help predict the effects of climate change and attempt to select appropriate adaptation strategies. However, climate projection data can be difficult to work with in areas of discovery, access, and usage, involving multiple global climate model repositories, downscaling techniques, and file formats. The Climate Futures Toolbox aims to take the pain out of working with climate data.

Collection of photos of people collaborating around climate scenarios and adaptation planning graphs.

The creators of the Toolbox wanted a way to make working with climate data easier by lowering the barrier to entry, automating common tasks, and reducing the potential for errors. The Climate Futures Toolbox uses a seamless R code workflow to ingest historic and projected climate data and generate summary statistics and customizable graphics. Users are able to contribute open code to the Toolbox as well, building on its existing capabilities and empowering a larger user community. The Climate Futures Toolbox was created in collaboration with University of Colorado-Boulder's Earth Lab, the U.S. Fish and Wildlife Service, and the National Park Service. 

CDI members are encourage to become engaged in the Toolbox by installing and using it, providing feedback on issues, and contributing code to the package. Since April's monthly meeting, the project has developed and undergone renaming, so this is a rapidly evolving endeavor. 

Develop Cloud Computing Capability at Streamgages using Amazon Web Services GreenGrass IoT Framework for Camera Image Velocity Gaging 


Frank Engel at the USGS Texas Water Science Center presented next on a CDI project involving non-contact stream gaging within a cloud computing framework. 

Measuring stream flow is an important aspect of USGS' work in the Water Mission Area, and stream gaging, a way to measure water quantity, is a technique with which many scientists are familiar. However, it is sometimes difficult to obtain measurements with traditional stream gaging, like at times of flooding, or when measurement points are unsafe or unreachable. Additionally, post flood measurement methods can often be expensive and not as accurate. 

To get around these issues, scientists have developed non-contact methods with which to measure water quantity. For example, cameras are utilized to view a flooding river, which can produce a velocity measurement after processing and other analysis steps. This is a complicated method and requires many steps and extensive training. Thus, the goal of this project is to make this process work automatically utilizing cloud computing and IoT. 

The first step required building a cloud infrastructure, with the help of Cloud Hosting Solutions (CHS). This involves connecting the edge computing (camera and raspberry PI footage of a stream) to an Amazon Web Services (AWS) IoT system and depositing camera footage and derivative products into a S3 bucket. The code for this portion of the product is in a preliminary GitLab repository that is projected to be published as a part of the long-term project. The team is also still working toward building the infrastructure through to data serving and dissemination. 

Workflow for getting streamflow data into a cloud computing system.

Other successes accomplished with this project so far include auto-provisioning (transmitting location and metadata) of edge computing systems to the cloud; establishing global actions (data is transmitted to the cloud framework and can roll into automated processing, like extracting video into frames); and building automated time-lapse computation. 

Engel and the project team have taken away a couple lessons from their experience with this project: first, cloud computing knowledge takes a lot of work and time to acquire, and second, in the short term, It can be difficult to establish a scope that encompasses the needs and wants of all stakeholders. 

Establishing standards and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database 

Jason Ferrante with the Wetland and Aquatic Research Center discussed his team's project on establishing standards for eDNA data in the USGS Nonindigenous Aquatic Species database (NAS). 

eDNA is genetic material released by an organism into its environment, such as skin, blood, saliva, feces. By collecting water, soil, and air samples, scientists can detect the presence of a species with eDNA. Ferrante's project aims to combine the traditional specimen sightings already available in the NAS with eDNA detections for a more complete distribution record and improved response time to new invasions. 

There is currently a need for an open, centralized eDNA database. eDNA data is currently scattered among manuscripts and reports, and thus not easily retrievable via web searches. Additionally, there are no databases dedicated to Aquatic Invasive Species (AIS), which are the species of interest for this project. A centralized, national AIS viewer will allow vetting and integration of data from federal, academic, and other sources, increase data accessibility, and improve coordination of research and management activities. 

In order to successfully create a centralized AIS viewer, community standards need to be established so that data can be checked for quality and validity, especially within the FAIR data framework (Findable, Accessible, Interoperable, and Reusable). To establish community standards and successfully integrate eDNA into NAS, the project team accomplished several objectives: 

List of steps taken in integrating eDNA data into the Nonindigenous Aquatic Species Database

1) Experimental Standards 

  • Collating best standards and practices for sampling design and collection, laboratory processing, and data analysis, in an eDNA literature review. 

2) Stakeholder Backing 

  • Gathered a group of five other prominent/active eDNA researchers within DOI to discuss standards and vetting process 
  • Teleconferences to gain consensus 
  • Plan to produce a white paper 

3) Integration into NAS 

  • Pre-submission form about eDNA scientists' design and methodology in order to vet data 
  • Prototype web viewer (see meeting recording for more; must be logged into CDI wiki) 

Some challenges faced during the project included gaining consensus on the questions for the pre-submission form; staying organized and in communication; and meeting the needs of managers and researchers. Ferrante and the project team would love to follow up with CDI for help developing new tools which use eDNA data across databases to inform management; and providing feedback on an upcoming manuscript about the project's process.