Skip to end of metadata
Go to start of metadata

March 11, 2020: CDI Projects - Subsidence, Biosurveillance, Invasive Species 

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.

Meeting Recording and Slides

Recordings and slides are available to CDI Members approximately 24 hours after the completion of the meeting.

These are the public slides. Log in as a CDI member to view ALL of the meeting resources, including recording.
If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.

Agenda (in Eastern time)

11:00 am Welcome and Opening Announcements

11:15 am Working Group Announcements

11:30 am Subsidence Susceptibility Map for the Conterminous U.S. Jeanne Jones, USGS

11:45 am High-Resolution, Interagency Biosurveillance of Threatened Surface Waters in the United States Sara Eldridge, USGS (Elliott Barnhart, USGS presented)

12:00 pm National Public Screening Tool for Invasive and Non-native Aquatic Species Data Wesley Daniel, USGS

12:30 pm  Adjourn


Highlights and Links

  1. Sign up for Group Learning with the CDI for the Spring
  2. 2020 CDI Funded Projects Announced
  3. New EarthMAP information links (for USGS employees) | recent blog post | Intranet page | MS Team
  4. Send feedback about elements of the EarthMAP conceptual model to help the EarthMAP project team
  5. What's happening around the CDI - Join a Collaboration Area here
  6. Dave Blodgett (dblodgett@usgs.gov) wants your suggestions for presentations to the Tech Stack group on the theme "Putting Data to Work"
  7. Jeanne Jones presented on the subsidence susceptibility map - the team built a national-scale map of sinkhole subsidence susceptibility. This dataset is being incorporated with other hazard and risk layers to inform Dept of Interior agencies. The output is also a dataset ready for machine learning. 
    1. Jeanne's question to the CDI: How do different methods for flow accumulation processing with DEMs compare in terms of speed, consistency of results, max size of raster for high performance computing? (for example, Arcpy, TauDem, RichDem) Contact Jeanne at jmjones@usgs.gov
  8. Elliott Barnhart presented about a project to incorporate Environmental Sample Processors (ESPs) at USGS stream gauging stations and collect near real-time eDNA surveillance of invasive species or pathogens. The project developed a data science pipeline and highlighted the benefits of combining information and methods from MBARI, KBase, and NGWOS (USGS Next-generation Water Observing System).
  9. Wes Daniel presented about SEINed - a tool for Screening and Evaluating Invasive and Non-native Data. This tool will be launched in April on the Nonindigenous Aquatic Species site, and help to get non-native species occurrence data from groups not focused on invasive species. 

Notes

  1. Opening
    1. Group learning opportunities
      1. Sign up for topics on usability, intro to netCDF, Unit Testing, Microsoft Power Automat and Power Apps here 
  2. Tim Quinn chat
    1. Announcement of the 2020 CDI Funded Project Teams
      1. Fourteen projects funded - see full list here
      2. Many projects address broader USGS goals of expanding predictive capabilities and actionable intelligence
      3. We thank all applicants and appreciate their attention and hard work!
    2. CDI is a discussion venue
      1. CDI is a diverse group with different interests and areas of expertise, one of its strengths
      2. Any CDI member should feel comfortable to ask a question, offer points of view to help others, and build connections
      3. All feedback is used in planning future CDI activities
      4. Opportunity to provide feedback today for the EarthMAP project
  3. EarthMAP update
    1. New communications venues
      1. Recent blog post
        1. Covered the why, how and what of EarthMAP (see slides or blog post)
      2. Intranet page
      3. Microsoft Team
    2. EarthMAP conceptual model
      1. Form on which portion of the model interests you here
      2. Venn diagram: data & information integration, integrated predictive science, and actionable intelligence, with EarthMAP int the middle!
        1. Data & information integration
          1. Improved framework for science data and information (collecting, assessing, analyzing, integrating)
            1. Readily available and accessible
            2. Embrace relevant data
            3. Recognize changing definition of data
        2. Integrated predictive science
          1. a system of integrated, scalable models that simulate and predict changes in connected human and natural systems
            1. Advanced modeling
            2. Integrated across boundaries, disciplines, geographies, sectors
            3. Developed in collaborative partnership with stakeholders
        3. Actionable intelligence
          1. Observation and predictions developed with partners to provide information at the speed and scales needed to inform their decision-making
            1. Decision support tools and processes
            2. Operational capacity
            3. Iterative improvements
        4. How do your projects fall into EarthMAP's model? One or two areas? In the middle?
  4. Collaboration Area Announcements
    1. Interagency Collaborative for Environmental Modeling and Monitoring
      1. Next event: Annual ICEMM meeting scheduled for March 17-18, 2020 at USGS in Reston, VA: "Integrated Modeling, Monitoring, and Working with Nature"
      2. Contact: pglynn@usgs.gov 
    2. Semantic Web Working Group
      1. Thursday, March 12th, 12pm MT, discussion with Sky Bristol: a practical example of semantic technology in action
      2. Contact: Fran Lightsom, flightsom@usgs.gov
    3. Metadata Reviewers Community of Practice
      1. Next meeting: April 6th, 12pm MT: What about metadata for software and code?
      2. Contact: Fran Lightsom, flightsom@usgs.gov
    4. Tech Stack Working Group
      1. Next meeting: March 12th, 3-4pm ET: "Discrete Global Grid Systems in action: Provision of rapid response during Australian bushfires and other applications" by Shane Crossman and Irina Bastrakova
      2. Looking for ideas for future tech dive talks on this years ESIP theme: Putting Data to Work. Please email dblodgett@usgs.gov with ideas
      3. Contacts: Dave Blodgett, dblodgett@usgs.gov; Rich Signell, rsignell@usgs.gov
    5. Data Management Working Group
      1. Next event: April 13th: "Upcoming changes to the Science Data Catalog", Lisa Zolly
      2. In past meetings, created draft value propositions
      3. Contact: Madison Langseth, mlangseth@usgs.gov 
    6. Risk Community of Practice
      1. Next meeting: March 19th, 1PM ET: "Human Centered Design and Inclusive Problem Solving Training with Impact 360, part 2" Register here
      2. Funding 7 projects of 29 received for the FY20 Risk funding
        1. See 3/6/20 Leader's Blog / recent NTK for full list of awards
      3. Contact: riskyworld@usgs.gov 
    7. Open Innovation Community
      1. Next meeting: March 12, 10:30am PT: "Innovation Center Talk: Automatic satellite-based flood mapping for disaster response" (More details here)
      2. Next meeting: March 16th, 3PM ET / 9AM HT: Using Volcanic Hazards in Hawai'i as a STEM platform for problem-based learning with raspberry shakes
      3. Received Risk funding for Open Innovation Playbook for Risk
      4. Working on open innovation community newsletter
      5. Contact: Sophia Liu, sophialiu@usgs.govopeninnovation@usgs.gov 
    8. Software development
      1. Next event: March 26th, 3:30pm ET: "Cloud Efforts - Automated deployment for scientific processing with AWS cloud formation" by Kirstie Haynie
      2. Contacts: mguy@usgs.govjknewson@usgs.govccladino@usgs.gov 
    9. Usability
      1. Next event: March 18th, resource review: "Using analytics to inform how our web pages/tools are being used"
      2. Next town hall meeting: April 15th, 3pm ET/1pm MT: "How to select test users and how many to test?"
      3. Contact: Sophie Hou, chungyihou@usgs.gov 
  5. Subsidence Susceptibility Map for the Conterminous U.S. Jeanne Jones, USGS
    1. Subsidence susceptibility - sinkholes and areas susceptible to developing sinkholes
      1. Focused on karst regions
    2. Why is this important?
      1. Sinkholes are hazardous; focus contaminated/polluted surface water into groundwater
      2. Create instability in the foundations of buildings roads, etc.
    3. The U.S. lacks a consistent national map
    4. Working to incorporate this dataset into the SHIRA (CDI) Risk map for use by DOI emergency agencies
    5. Used the National Map, karst research, and the Yeti supercomputer
    6. Five step process
      1. Hydrological conditioning of DEM
      2. Identification of closed depressions
      3. Screening and morphometric statistics
      4. Validation against state maps
      5. Creation of heat map
    7. See slides for diagram on processing steps for conditioning DEMs and finding closed depressions
    8. See slides for map of sinkhole hotspots
    9. Challenges
      1. Data collection and screening
        1. Screening data visually
          1. Patching in other data to close gaps
        2. Screening data in Python
      2. Processing issues, edge effects
        1. DEM was too large to process well with ArcGIS
        2. Had to do each individual DEM at a time
      3. Open source software
        1. Existing data used different software that defined some terms and statistics differently
      4. Closed depression screening
        1. Screened out wetlands, open water, urban areas; soils with a flood signature; quarries or strip mining sites; to shallow, too small, wrong shape, close to roads (drainage ditches), etc.
      5. Post-processing
        1. Used geologic information and expert knowledge to remove depressions that may have formed through non-karst processes
    10. Project data, tools, products to share
      1. Closed depression polygons
      2. Sink density and hot spot raster datasets
      3. 10-meter DEMs, NHD streams and roads on Yeti
      4. Technique - pubs by Dan Doctor and others
      5. Code on code.usgs.gov
    11. Follow-up collaboration
      1. Have a great training data set for machine learning
    12. Follow-up question for CDI
      1. Flow accumulation for processing with DEMs
        1. Arcpy, TauDem, RichDem: How do these compare?
  6. High-Resolution, Interagency Biosurveillance of Threatened Surface Waters in the United States Sara Eldridge, USGS (presented by Elliott Barnhart)
    1. Project to incorporate ESPs at USGS stream gauging stations; to provide near real-time DNA surveillance of invasive species or pathogens
    2. With high frequency data collection, we need rapid analysis
      1. Need to give resource managers time to respond, so time is of the essence
    3. ESPs installed in stream gauges along the Yellowstone River
      1. Tested for non-native species
    4. Needed a way to combine stream gauge data and weather data
    5. Data Science Pipeline
      1. Created a cloud-hosted digital ocean database that combines all the collected data
      2. Can easily incorporate eDNA and other data streams into models that can indicate presence/absence of organisms
    6. See slides for figure on processing steps on creating the Digital Ocean PostgreSQL database
    7. Challenges and lessons learned
      1. Quality control filters from multiple data sources
      2. Linking the benefits and capabilities of: 
        1. MBARI ESP in situ sample collection and analysis at stream gauges
        2. Dept of Energy Systems Biology Knowledgebase (Kbase), open environment for computational systems biology
        3. USGS Next-generation Water Observing System (NGWOS)
  7. National Public Screening Tool for Invasive and Non-native Aquatic Species Data Wesley Daniel, USGS
    1. Central repository for spatially referenced accounts of introduced aquatic species
    2. Tracks over 1,290 aquatic species, with over 600k observations
    3. Data is national, dating back to 1800's
    4. Constantly updating data sources and adding new information
    5. Data aggregated from museum collections, researchers, state and federal agencies, scientific literature, and public sighting reports
    6. The problem
      1. How does the NAS database get non-native occurrence data from groups not focused on invasive species?
    7. Biosurveillance tool
      1. SEINeD tool allows stakeholders to upload any biological dataset; these datasets will be screened for invasive occurrences
    8. See slides for diagram of SEINeD tool process
      1. Automated process
        1. Checks the spatial accuracy
        2. Checks for taxonomic errors (misspelled names, old or non-specific nomenclature)
        3. Native status filter
          1. Flags non-native species that are exotic (from other countries/continents), AND non-native species from within the U.S. (rainbow trout native to the west coast on the east coast)
        4. SEINeD Tool does not store any of the information provided
          1. User receives two CSVs back: the original, and one that only contains the data that SEINeD would like to use.
        5. Benefits
          1. Automated
          2. Easy way to link sampling data to multiple spatial GIS layers
          3. Early detection screening tool for potential new invasions
          4. Incentivize stakeholders that utilize the tool to contribute their data to the NAS program
          5. Increase the visibility of the NAS program
        6. SEINeD goes live May 4th: https://nas.er.usgs.gov 

Questions

  1. Jeanne: when you found "strange" data (weird min/max, no data), did you have a path to communicate that to someone? Were there contacts for these datasets?
    1. Talked to National Map people about the gaps in road data
    2. Put a call out on the GIS talk listserv and asked about missing points
    3. Some incorrect grids may have been due to proximity to water
  2. PRISM delivers historical data (lagged by a year, right?); but, one could 'get' real-time or forecasted met data from NCEP... What is the latency on the eDNA data? If there was an emergency application, what could you reduce it to (at least pragmatically for now)?
    1. Historically, it takes a long time to analyze DNA data. However, it can be provided a lot faster now. With MBARI's 2nd generation robot, it only needs a half hour to analyze. QPCR and others are still a bit of a challenge. Right now, a half hour is the time it needs, but could become faster with new technology, or slower with more data.
  3. Thank you for the great presentation Jeanne! Suggestion: clarify distinction between sinkhole subsidence vs GW-pumping subsidence (i.e. CA, Houston)
  4. Jake - can you expand NCEP?
    1. National Center for Environmental Prediction...essentially the source of National Weather Service data
  5. How did you find the right contacts to work on this project?
    1. For this project, the right contacts were already in place. Going to conferences is a great place to meet potential collaborators. First ran into MBARI robots at a conference.
  6. Wes, can you give an example of users you are targeting that have this info but are not usually concerned with non-native species? (what orgs or professions?)
    1. NGOs, university  that are conservation-focused, state employees who don't have the time to screen data themselves.
  7. How do you validate the scientific names?...assuming you compare them to an index?
    1. ITIS index for scientific names. Also using an internal index for more recent changes.
  8. As a non-specialist, can you suggest some news sources to get current news on invasive and non-native species?
    1. NAS database has an alert email and Twitter account that notifies users of new non-native species. Depending on the region, you can contact your regional representative (email Wesley, wdaniel@usgs.gov for info on this)
  9. What type of outreach will you do to let people who don't know about NAS know about this new tool?
    1. Canvas as many resources as possible: professional society newsletters on fish/water, tapping botanists to look at other professional societies, social media, internal news link, advertising through all state contacts
  10. Seined wont automatically harvest the 3rd party dataset after tagging; but, what are the incentives to get users to come back and share their dataset with "NAS... Could you build a checkbox so that their analyzed set could be shared with NAS "automatically?"
    1. Initial thought is that the second CSV comes with an email talking about the importance of sharing the data. Many people are not getting back to NAS with the second CSV.
  11. iNaturalist might be great early collaborator
    1. Work closely with iNaturalist, will notify them of this new tool.




1 Comment

  1. To answer Question 3: Groundwater pumping is known to induce sinkholes in many karst regions. However, if the question was in reference to regional-scale subsidence due to aquifer depletion in non-karst areas, then our methods would not include those zones since we screened our data to include karst-prone regions only. If the region was indeed prone to karst development, then the size of the depression (which in the case of regional aquifer depletion can be many square kilometers in overall extent) would be the discriminating factor. Our work identified a closed depression wherever a closed contour line might show up on a 10m DEM which was greater than 600 square meters in area and more than 2 m in maximum depth. A large area of regional aquifer depletion may contain tens or even hundreds of such smaller closed depressions.

    -Dan Doctor