March 10, 2021: CDI FY20 Projects - Waterbody Rapid Assessment Tool (WaterRAT), FAIR practices for eDNA data in the USGS Nonindigenous Aquatic Species database, and a real-time coastal salinity index

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.


Meeting Recording and Slides

Recordings and slides are available to CDI Members approximately 24 hours after the completion of the meeting.

These are the publicly available materials. Log in to view all the meeting resources. If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.


Meeting Recording and Slides

Download the meeting recording (when ready) [.mp4]

20210310-cdi-monthly-pt1.mp4

20210310-cdi-monthly-pt2.mp4

20210310-cdi-monthly-pt3.mp4


During the call, you can ask and up-vote questions at slido.com, event code #CDIMAR.

Agenda (in Eastern time)

11:00 am Welcome and Opening Announcements

11:15 am Working Group Announcements

11:25 am Waterbody Rapid Assessment Tool (WaterRAT): 3-dimensional Visualization of High-Resolution Spatial Data, Andrea Medenblik, USGS

11:45 am Implementing FAIR practices: Storing and displaying eDNA data in the USGS Nonindigenous Aquatic Species databaseJason Ferrante, USGS

12:05 pm  Real-time Coastal Salinity Index for monitoring coastal drought and ecological response to changing salinity valuesMatt Petkewich, USGS

12:30 pm  Adjourn

Abstracts

The presentations this month are all from FY2020 CDI-supported projects. 

Waterbody Rapid Assessment Tool (WaterRAT): 3-dimensional Visualization of High-Resolution Spatial Data

Autonomous Underwater Vehicles (AUVs) are instruments that collect water-quality, depth, and other data in waterbodies. They produce complex and massive datasets. There is currently no standard method to store, organize, process, quality-check, analyze, or visualize this data. The Waterbody Rapid Assessment Tool (WaterRAT) is aPython application that processes and displays water-quality data with interactive two-dimensional and three-dimensional figures, but it runs offline with few capabilities and for just one study site. This project will transition WaterRAT to an online application that the public can easily use to view all AUV data. A database of all AUV datasets will be developed to improve accessibility, organization, consistency of publication procedures, and publication time. Stakeholders and the public will be able to quickly access and visualize water-quality data, which is especially critical for hazardous incident response. This project will enhance our knowledge of complex water-quality issues and enhance management of natural resources.

Implementing FAIR practices: Storing and displaying eDNA data in the USGS Nonindigenous Aquatic Species database

We are working to incorporate environmental DNA (eDNA) data into the Nonindigenous Aquatic Species (NAS) database, which houses over 570,000 records of nonindigenous species nationally, and already is used by a broad user-base of managers and researchers regularly for invasive species monitoring. eDNA studies have allowed for the identification and biosurveillance of numerous invasive and threatened species in managed ecosystems. Managers need such information for their decision-making efforts, and therefore require that such data be produced and reported in a standardized fashion to improve confidence in the results. As we work to gain community consensus on such standards, we are finalizing the process for submitting such data to the NAS database. We are seeking support to expand the NAS database to store and present this newly Findable source of eDNA data so that it is Accessible to the public, Interoperable for use with new tools, and ultimately Resuable without limitation.

Real-time Coastal Salinity Index for monitoring coastal drought and ecological response to changing salinity values

Many coastal areas are experiencing departures from normal conditions due to changing land use and climate patterns, including increased frequency, severity, or duration of floods and droughts, in some cases combinations of the two.  To address these issues, the U.S. Geological Survey developed the Coastal Salinity Index (CSI) to identify and communicate fluctuating salinity conditions due to such disturbance events through quantitative analyses of long-term salinity records.  This project aims to make the CSI broadly useful as a monitoring, forecasting, and decision-making tool, extending the platform to enable real-time reporting of disturbance events as they unfold and covering a larger user base than what existing resources allow. The framework that supports this work addresses the Community for Data Integration theme of producing Findable, Accessible, Interoperable, and Reusable data by acquiring existing real-time salinity data, integrating into an accessible database, computing gage CSI statistics, and creating and displaying web-based visualization products.

Highlights

  1. Reminder that the CDI workshop will be held virtually from May 25-28.
  2. Look out for WaterRAT Online, coming soon. Metadata and data templates for AUV data are available from Andrea Medenblik or Bradley Huffman.
  3. eDNA team has worked to integrate data into the Nonindigenous Aquatic Species (NAS) database through an Aquatic Invasive Species (AIS) viewer.
  4. A real-time Coastal Salinity Index is currently being reviewed for release to the the public; in the meantime, there is a code repository available.

Notes

Welcome and Opening Announcements

  1. For more frequent announcements, join our Microsoft Teams or wiki forum.
  2. Kevin Gallagher remarks
    1. Reminder that the CDI workshop will be held virtually from May 25-28.
    2. The program team is currently going through session proposals from the CDI community.

Working Group Announcements

  1. For more on all collaboration areas, see slides or wiki.
  2. Geomorphology
    1. Next event 4/27: specific gage analysis
  3. Metadata Reviewers
    1. Next event: 4/5
  4. Semantic Web
    1. Next event: 3/11, continuing discussion on semantic web 101 session at CDI Workshop and potential ESIP lab project
  5. Data Management Working Group
    1. Next event: 4/12, Update on QMS implementation and the role of data management
  6. Usability
    1. Next events: 4/7 and 4/21, user research overview and demo of a user research technique
  7. eDNA
    1. New publication from members: https://onlinelibrary.wiley.com/doi/10.1111/mec.15811
    2. Logo contest to represent the eDNA community of practice
  8. Model Catalog
    1. Next event: 3/24, update on the Model Catalog - ongoing development and resources to support scientific model best practices
  9. Imagery
    1. New collaboration area; planning session at CDI workshop
  10. TechStack
    1. Next event: 3/11, earth data extraction, exploration and visualization using the AppEARS Platform
  11. DevOps
    1. Next event: 4/6, USGS NGTOC - dynamic mapping

Waterbody Rapid Assessment Tool (WaterRAT): 3-dimensional Visualization of High-Resolution Spatial Data, Andrea Medenblik, USGS

  1. Background
    1. WaterRAT was designed for data from AUV's (autonomous underwater vehicles)
    2. AUVs are underwater vehicles that swim around collecting data like conductivity, pH, dissolved oxygen, etc.
    3. They can also estimate unknown water quality
  2. Problem
    1. AUVs collect high-resolution 3D data which can be difficult to process, organize
    2. Usually required proprietary software like Matlab
    3. AUV data are also not very accessible
      1. Spread across ScienceBase, difficult to compare/combine datasets
      2. No standard method to organize these large datasets
  3. Solution
    1. Created an AUV database with all data releases from across USGS
      1. Used python to import datasets into database
    2. Created a metadata and data template to standardize AUV data organization
  4. WaterRAT Online
    1. Can select different parameters to view them within the AUV path/location
    2. Can use color bar sliders to edit how visualizations appear
    3. Prototype was previously available as offline Python software; this new iteration is easily accessible online without Python knowledge, interactive, infinite options for data viz
  5. Connecting WaterRAT with database
    1. Challenges
      1. Data integration
        1. Compiling 8 sites and 727,000 data points
          1. Looking at previous versions of AUVs and newer ones
          2. Split up metadata to conform to best design practices of database management
      2. Data processing
        1. Defining AUV mission center lines
          1. In the past, have manually created the center line
          2. This time, wanted to define center line programmatically
          3. Developed scripts which worked well for some center lines, but not all
          4. Created a custom interface where users can draw the center line
      3. Getting WaterRAT online
        1. Needed a web server to securely expose WaterRAT to public internet traffic
          1. Used Apache and Mod_WSGI to work with WaterRAT on NatWeb
          2. Caching data retrieved from the MySQL database and output from computationally expensive tasks
  6. How can AUV operators use WaterRAT?
    1. Code is under review, not publicly available yet, but look out for it soon
      1. Incorporate WaterRAT into your work plan
      2. Use the metadata template and data template (contact Andrea Medenblik or Bradley Huffman)
  7. How can scientists and the public use WaterRAT?
    1. Explore available datasets
    2. Interact with data - understand variations in water quality in water bodies
    3. Share with cooperators - online interface is intuitive and accessible
  8. Follow-up questions and future
    1. How to integrate with EarthMAP?
    2. Integrate other 3D data (natural hazard response, tree canopy structure, fish abundances and habitat structure)
    3. Add built-in quality assurance prior to database entry to ward against errors and duplications
    4. Automate database - so AUV operators can publish directly to database (depends on quality assurance)
    5. Expand analytical capabilities

Implementing FAIR practices: Storing and displaying eDNA data in the USGS Nonindigenous Aquatic Species databaseJason Ferrante, USGS

  1. eDNA is environmental DNA that is shed by an organism into its environment
    1. eDNA allows scientists to collect water and soil samples particularly on cryptic species
    2. Allows for biosurveillance of numerous invasive and threatened species
  2. The Nonindigenous Aquatic Species (NAS) database
    1. Houses over 570,000 visual sightings nationally - generally pictures or publications with field sampling
    2. Regularly used by a broad user-base of managers and researchers working with invasive species
  3. Problem
    1. Managers need data and metadata produced and reported in an accurate and standardized fashion to inform decision making efforts and ensure confidence
  4. Developed robust standards for reporting accurate data to the NAS database that are FAIR (findable, accessible, interoperable, reusable)
    1. Common question: why not more open source, why use NAS?
      1. NAS is target species focused
      2. NAS gave a structure to build into
  5. A fair, open, centralized database
    1. eDNA data was scattered among manuscripts and reports, not easily retrievable, none dedicated to aquatic invasive species
    2. AIS viewer allowed data accessibility, vetting and integration of data from multiple sources, and improves coordination of research an management activities
  6. Process for consensus derived standards to submit and simplify eDNA data in NAS
    1. Started internally with an advisory panel, put together proposed documentation, and did outreach to the community for peer review and feedback
  7. How it's FAIR
    1. Findable: worked with in an existing database; putting data in front of people looking at similar data
    2. Accessible: existing mechanism for obtaining data form the database
    3. Interoperable: data submission template standardizes data submissions
    4. Reusable: Interest in secondary analysis (modelers, etc.)
  8. NAS distribution map mock up
    1. See recording or slides
    2. Map will have educational component about eDNA caveats
  9. Challenges
    1. Community feedback
      1. Planned for face to face meetings, but pivoted to webinars, virtual meetings, shared online documents, etc.
        1. More work to synthesize the input, but broader reach
    2. Takeaways
      1. eDNA data added to NAS creates a more complete distribution records of target species
      2. Can help with response time to new invasions
      3. Improves estimation of cryptic species occurrence rates, etc.
      4. Putting together manuscript on community outreach for this project
    3. Follow-up
      1. Looking for help developing new tools which use eDNA data to inform management
      2. Re-started eDNA community of practice

Real-time Coastal Salinity Index for monitoring coastal drought and ecological response to changing salinity valuesMatt Petkewich, USGS

  1. Motivation
    1. Drought early-warning system for North and South Carolina is missing components like coastal salinity
      1. Higher than normal salinity levels affect fisheries, municipal water intakes
      2. Societal, economic, and ecological effects have not been conveyed clearly before
  2. CSI is an approach similar to the Standardized Precipitation Index
    1. CD stands for coastal drought
    2. CW stands for coastal wet
  3. Example graphs
    1. See slides for average salinity over time graphs
  4. Existing Products and Resources
    1. CSI R-package located in GitHub
    2. Historic CSIs along the Gulf and SE Atlantic Coasts publication, plus three others
  5. Used CDI Funding to:
    1. Develop CSI R-scripts for ecological analyses
      1. Allows comparison of anomalies in climate, hydrology, and salinity
      2. See slides for graphs
    2. Identify and integrate new salinity datasets
      1. USGS (71 gage)
      2. National Park Service (35 gage)
      3. National Estuarine Research Reserve System (23 gages)
      4. over 16 years of records, need to be electronically available
    3. Enhance existing website and user interface to accommodate new CSIs
      1. Conducted a survey and held a usability webinar to get feedback from potential users
      2. Surveyed federal, state and academia users
      3. Usability webinar walked through the website and obtained feedback.
    4. CSI webpage
      1. Draft, hopefully published in the next few weeks
      2. See recording/slides for screenshots and tour of website
      3. Includes interactive map, time interval options, CSI classification table, and station legend
      4. Station pop-ups allow users to get a link to the originating agency data, can download input salinity data and access CSI drafts
      5. CSI About page gives background and team information
      6. Additional Data tab allows users to select data pertaining to salinity, water temperature, etc., and time interval options
      7. Graphs are also available for download from Additional Data tab
      8. Resources page contains CSI tools and resources, publications, and more

Questions

  1. AUVs
    1. Did you follow any specific standards while creating the AUV metadata? Are they ISO or CSDGM compliant?
      1. Consulted CDI folks on how best to make a metadata template. Followed practices recommended.
      2. Jordan Wilson: The metadata are CSDGM compliant. The AUV metadata template was created using the Metadata Wizard as a launching point.
    2. Can you share a link to the metadata and data standard that you created?
    3. What viz technology was used in the online tool?
      1. Plotly, a Python package for graphics. The maps themselves are Matbox and another Python library.
      2. Dash & Plotly information: https://plotly.com/dash/
    4. Does any one know if a similar effort is being conducted for UAV data?
    5. Is there a URL for WaterRAT?
      1. Not yet; it is under review.
    6. Does any one know if a similar effort is being conducted for UAV data?
      1. I wish we had more information. This is of interest, but we didn't have time to compile that data.
    7. How can we join the AUV working group
      1. Contact waterrat@usgs.gov
      2. Please email Lee Bodkin at ljbodkin@usgs.gov for information on the AUV Working Group and interfacing AUV/UAV data.
    8. Have you explored ways to utilize and/or visualize data from the AUV's ADCP?
      1. Bradley Huffman: ...in the past we've explored visualizing the ADCP data but there were two hurdles. One, I've been told that the data isnt of a high quality from another AUV user which kills the motivation to attempt processing it myself. The second, is time 🙂 I've got the documentation from the manufacturer on what the parameters mean in the log files produced by the ADCP but I'd need to have the time and funding to look further at correcting/processing it. We as a center dont publish the ADCP data to ScienceBase but that doesnt mean that cant change
    9. Maybe publish the AUV working group within the CDI if there is enough interest?
  2. eDNA
    1. Will this be ported up to GBIF?
      1. Yes, that is in the plan
    2. If data are submitted to NAS, will they have a DOI for citation in a publication? 
      1. I don't believe so. The data are submitted freely; some are from publications which already have a DOI. We ask in the submission form for linkages to already published works.
      2. Matt Neilson: We do not mint a DOI, but we do provide a unique suggested citation.
    3. Do you guys have a data template for eDNA data that others can use?
      1. We do have a specific template for submitting to the database. Not what you might like to use for other projects. We will provide in documentation the data fields that are required as part of the submission process, and the format data are required to be in to be interoperable. Additional part through the eDNA working group.
  3. CSI
    1. Were there challenges in integrating the three different sources of data?
      1. All have challenges. Gages we could use were limited because we only accepted ones with over 16 years of continuous data. Oher agency metadata can be difficult to sort through.
    2. Any recommendations for expansion to other source gages? How could they be more easily aligned/integrated?
      1. The short time frame limited the depth; potentially would like to work with individuals maintaining the data at the different locations to see how it's easiest to filter metadata.
    3. Can  you share the link to the R pacakage?
      1. https://github.com/USGS-R/CSI
    4. Could you talk about what kind of modeling users can do with this data?
      1. Hoping that modelers can pull in CSI data and correlate CSI to ecological results/information they're interested in.
    5. Is salinity data collected along the west coast, AK, HI?
      1. Yes. In early times, we evaluated the data throughout all the U.S. (Washington, California, Oregon); timing and funding forced us to focus on East Coast and the Gulf; hope to expand out west.


Date
2021-03-10
Presentation Title(s)

Waterbody Rapid Assessment Tool (WaterRAT): 3-dimensional Visualization of High-Resolution Spatial Data

Implementing FAIR practices: Storing and displaying eDNA data in the USGS Nonindigenous Aquatic Species database

Real-time Coastal Salinity Index for monitoring coastal drought and ecological response to changing salinity values

Speaker(s)

Andrea Medenblik, USGS

Jason Ferrante, USGS

Matt Petkewich, USGS