Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

 

For August's CDI Monthly Meeting, we heard a presentation on integrating short-term climate forecast into a restoration management support tool, and had our first session of the CDI Pop-Up Lab. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

News & Announcements 

Look out for Machine Learning Mondays: a weekly course on image analysis using machine learning. Dan Buscombe will be offering this course covering image recognition, object recognition, image segmentation, and semi-supervised image classification. The course is targeted at USGS employees and contractors in the fields of satellite aerial imaging, image analysis, geospatial analysis, machine learning software development. The course is only available to those with a USGS email, at no charge. Experience in Python and the command line interface are recommended as a pre-requisite.  

Integrating short-term climate forecast into a restoration management support tool - Caitlin Andrews, USGS 

 

Alternate text: slide summarizing the goals of a short-term soil moisture forecaster and resulting example heat maps of the U.S. 

The goal of this FY19 project is to create a link between data and how it can be used in a management context. Climate forecasts are typically spatially or temporally coarse data, while managers need more temporally fine and site-specific data. For example, the success of seeding and planting rely on short-term monthly and seasonal climate that occurs immediately after seeding and planting. There is a 90% failure rate for seeding and planting in the western U.S. 

The project facilitates the link between climate data/climate knowledge and management need by creating a short term moisture forecaster application. In the western U.S., water is a limiting factor and drought is a natural part of ecosystem, and expected to be exacerbated further in the coming years. For managers, seeding/planting and drought are connected, and managers are in need of more information on climate forecast for after they seed or plant. Climate knowledge for this use case generates probabilities on whether conditions will be hotter or colder and dryer or wetter. This is coarse information that needs translation so managers can use it. 

The SOILWAT2 model is essentially a translation tool, wherein the user provides the model info on a specific site (climate, vegetation, soil), and the model output will provided probabilities on where water moves on a daily basis and measurements of soil moisture at different depths. The National Weather Service provides one prediction for each of 102 regions for a time period, but this multi-month forecast data is very coarse. 

 The application team is currently developing code to synthesize short term climate predictions to a finer temporal and spatial scale in order to derive a better soil moisture model. 

Spatially and temporally refining this data was a challenge. A Jupyter Notebook that details the steps the project team took is available to USGS employees: https://code.chs.usgs.gov/candrews/shorttermdroughtforecaster 

A quick summary of the process: 

  1. Gather a historical record of site-specific data from GridMET (1980-yesterday) 
  2. Generate samples of what the future will look like (30 future realizations) 
  3. Apply future realization to the years in the historical record. This is how future anomalies are integrated with historical patterns. 
  4. Produces 900 climate futures 

This process produces an example output that is explained in detail in the meeting recording (log in to Confluence to access). The application will be integrated into the Land Treatment Exploration Tool (LTET), a Bureau of Land Management and USGS collaboration intended for managers planning restoration projects. 

CDI Pop-up Lab: Q&A with the CDI community 

Alternate text: slide showing the information on cloud-optimized GeoTIFFs summarized below, as well as a map and code snippet. 

Cloud optimized files and new transfer options  - Theo Barnhart and Drew Ignizio 

The CDI project Theo Barnhart is working on this year involves generating a set of continuous basin characteristics for all of contiguous U.S., resulting in many very large GeoTIFFs. The need arose for a solution with the following characteristics: geospatial format, easy to generate, good compression, stand-alone, and avoiding maintaining a server to access the data. Cogeo.org and Rasterio were identified through a trial & error process of working through examples using Jupyter Notebooks. 

Drew Ignizio is working on an approach for handling large files from the ScienceBase side. What is a Cloud-Optimized GeoTIFF (COG) and why is it useful? In a previous approach, a user can download a 240 gig file from where it is stored in an S3 bucket. After downloading, the user can then work with data locally. With COG, users can avoid downloading data, instead just accessing the file in place. COG enables users to publish to a public S3 bucket and connect to the COG through a Jupyter Notebook. They can also be read directly from a viewer. 

Irregular meshes for data operations - quadtrees  - Thomas Rapstine 

While mapping out ground failure for a project in Alaska, an issue was identified with the diversity and variety of data inputs. The inputs to models can differ in many ways. They can be: 

  • Grids, points, polygons, lines and more 
  • Categorical, physical, or temporal 
  • With their own notion of uncertainty, or not 
  • Pulled from a global or local raster 

How can we structure diverse datasets in a way that enables robust, calculable integration and evaluation? Rapstine proposed using multi-scale, hierarchical data structure to represent data on varying scales; representation that allows for multiple resolution grids to be put together (a quadtree). A quadtree divides regions into squares. Quadtree mesh areas (using Python package discretize) result in finer representation in the mesh areas. 

Questions for the CDI community: 

  1. How are others solving these data integration issues? 
  2. Any other solution recommendations other than quadtrees? 
  3. Thoughts on using quadtrees for solving these challenges? 
  4. Are you using quadtrees? What packages would you recommend? 

See the slides for more info on the wiki and reach out to trapstine@usgs.gov if you have an answer to these questions or would like further discussion. 

Streamstats - Kitty Kolb 

StreamStats is a USGS website that allows users to  delineate watersheds for an area of interest, built to be used by civil engineers to design highway bridges and culverts. Kolb wanted to know the answers to these questions: What's the biggest flood I can expect in a given year? How do we get information on un-gaged areas? To answer these questions, there is a need for a GIS system to calculate things quickly and efficiently. 

StreamStats is built on ArcHydro, SSHydro, and Leaflet. StreamStats provides an image of your watershed and a report, with an option to download the watershed outline and table. StreamStats Training docs and webinars, as well as classes on ArcHydro are useful in learning how to harness this tool. 

Speaking Git 

"At today's Metadata Reviewers meeting, I had the feeling that many of us were discovering that we need to know what these Git terms mean: main branch, fork, issue, tag." 

Some places to start: 

18F: How do I speak Git(hub) 

Git(hub) Glossary 

GS-Software Microsoft Team 

USGS Software Management Website 

--  
All CDI Blog Posts  

CDI collaboration areas bring us focused information and tools to help us work with our data. See all collaboration areas and how to join. 


Slides from Dan Beckman's presentation to the Software Development Cluster, where he discussed the creation of synthetic data for training artificial intelligence algorithms.

Data Management, 8/10 - Department of Interior Records Management Repository and Data Exit Story Time on "the data they left behind"

Lynda Speck and Jim Nagode from the U.S. Bureau of Reclamation presented on their records and document management cloud solution, eERDMS. Tara Bell, Robin Tillitt, and Sue Kemp shared experiences on "Departing Scientists and the Data They Left Behind." Recording and other resources at the wiki meeting page.

DevOps, 8/4 - EPA Data Management and Analytics Platform DevOps

Dave Smith from the Environmental Protection Agency presented on "EPA Data Management and Analytics Platform DevOps." Included in the discussion was - How to get to DevSecOps? (How to add Security to Development and Operations.) "Security as usual breaks DevOps automation." Recording and slides available on the DevOps meeting page.

Fire Science, 8/18 - Department of Interior Wildland Fire Information & Technology Strategy

Roshelle Pederson from the Dept of Interior Office of Wildland Fire presented on the Wildland Fire Information & Technology Strategy. The discussion included the role of USGS research and successful paths to integrate research information, data, and tools in fire management information systems. Join the Fire Science mailing list here.

Metadata Reviewers CoP, 8/3 - Metadata for public release of legacy data

Tara Bell, Matt Arsenault, and Sofia Dabrowski led a discussion on metadata for public release of legacy data for which full documentation is not available.

Risk, 8/11-8/13 - Annual Risk Meeting

The Risk Community of Practice held their Annual Risk Meeting virtually, from August 11-13. The meeting agenda included a keynote on "An evaluation of the risk of SARS-CoV2 transmission from humans to bats" by Mike Runge, a session with the EarthMAP project management team, presentations from FY19 Risk Proposal Awardees, a risk analysis panel discussion, virtual networking, and sessions on engaging diverse stakeholders and tools for virtual stakeholder meetings. To join the Risk Research and Applications Community of Practice, visit https://listserv.usgs.gov/mailman/listinfo/cdi-risk.

Usability, 8/19 - Human-Centered Approach and Usability

Jamie Albrecht from Impact360 Alliance presented on Inclusive Problem-Solving to Reduce Natural Hazard Impacts & Disaster Risk. Inclusive problem-solving is Impact360’s process to bring together natural hazard researchers and practitioners to solve wicked problems. Several contributing foundational frameworks on the topics of mutual gains, joint fact finding, systems thinking, design thinking, social innovation, and equity-centered community design were introduced for consideration. Notes, slides, and recording are accessible on the meeting page.

Software Dev, 8/27 - Synthetic data and build process for AI imagery and deep learning methods

Dan Beckman presented on "Synthetic data and build process for AI imagery and deep learning methods." He described a solution for the challenge of not having enough training data, using synthetic stand-in data to make the volume of data needed. Dan referenced some code he used from Adam Kelly and here is a related medium post. Read the post to follow up on the statement "I’ve found, from both researching and experimenting, that one of the biggest challenges facing AI researchers today is the lack of correctly annotated data to train their algorithms." Software Development Cluster wiki page.

-- 
All CDI Blog Posts 

CDI collaboration areas bring us focused information and tools to help us work with our data. See all collaboration areas and how to join. 


Screenshots from the USGS COVID-19 Case Finder and Viz Palette - two resources discussed at the July Data Viz call.

Artificial Intelligence/Machine Learning,  7/14 - Gage Cam - computer vision for water surface elevation

Daniel Beckman presented on Gage Cam, a low cost, custom built wireless web camera paired with a custom deep learning algorithm that allows for a computer vision method to measure water surface elevation (stage). Daniel's slides also cover a list of additional topics include U-Nets, synthetic data, algorithms for text, suggested books on deep learning, and more!

Slides and recording at the AI/ML Meeting Notes page.

Data Management, 7/13 - Collections management Informational Memo and Center-level collection management plans

Lindsay Powers presented on a new Collections Management Instructional Memo (IM CSS 2019-01) and associated website, released last August, providing policy and guidance for the management of scientific working collections.

Brian Buczkowski, from the Woods Hole Coastal and Marine Science Center, presented on Center-level collection management plans, which can help ensure that these samples and specimens continue to have value as assets to the public and scientific community.

Slides and recording can be found on the meeting notes page.

Data Visualization, 7/2 - Kickoff meeting COVID-19 Case Finder

Chuck Hansen from the California Water Science Center presented on the COVID-19 Case Finder, built on Tableau. The app that allows a USGS employee planning a trip to get COVID information on their destination, with preloaded USGS facilities and gage sites. A conversation on color maps ensued, sharing tools like this one - https://projects.susielu.com/viz-palette - which enables you to import your own color schemes and see what they look like based on different types of color deficiencies.

The Data Visualization group plans to hold quarterly calls. See more at their wiki page.

Fire Science, 7/21 - Climate-fire science synthesis

As fire continued to increase in July, Paul Steblein and Rachel Loehman led the Fire Science Community of Practice call. After a Fire update from Paul, Madeleine Ruben stein from the Climate Adaptation Science Centers presented on a workplan to conduct a synthesis of Climate-Fire Science.

Join the Fire Science mailing list here.

Metadata Reviewers, 7/6 - Metadata for software and code

Eric Martinez joined the Metadata Reviewers group to chat about different types of code releases, different options for code repositories at USGS, code.json documentation, and more. He shared some links including the USGS Software Management website and the code.json schema, where controlled vocabularies can be found (search for 'enum' for enumerated lists).

See more notes on the Metadata Reviewers meeting notes page.

Model Catalog Working Group - Scientific model categorization and finding information about USGS models

A working group that is advising on the development of a new USGS Model Catalog was briefed (by email) on the sources used for populating the initial model catalog and asked about categorization of models by type and action. Project updates can be seen on this wiki page. Anyone interested in contributing to the direction of the model catalog can find out more on the working group home page, subscribe to the mailing list, and get in touch with the point of contact, which would be me, Leslie Hsu, lhsu@usgs.gov.

Risk CoP, 7/16 - Project presentations from the FY19 Risk RFP awardees (Round 2)

Four speakers gave final Risk project presentations on the topics of the global copper supply disruption from earthquakes (Kishore Jaiswal), how scientific research affects policy and earthquake preparedness (Sara McBride), the Hazard Exposure Analyst Tool (HEAT) (Jason Sherba), and ecological forecasts for risk management (Jake Weltzin and Alyssa Rosemartin).

See more at the Risk CoP meeting notes page (sign in as a CDI member to view).

Semantic Web, 7/9 - the Semantic Zoo

A group from the Semantic Web WG discussed the article "The Semantic Zoo - Smart Data Hubs, Knowledge Graphs, and Data Catalogs." This led to a discussion on the basic question of "How do we get data cleaned up so that many different places can use it?"

Usability Resource Review, 7/15 - Mobile UX Design Principles and Best Practices

Sophie Hou posted a resource review on Mobile UX Design Principles and Best Practices. The resource addresses topics like creating a seamless experience across devices, allowing for personalization, good onboarding practices, using established gestures, mobile layout design, focusing on speed, minimizing data input, and more.

See the full review and summary on the resource review wiki page.

-- 
All CDI Blog Posts 

For July's CDI Monthly Meeting, we heard two presentations: one on science data management within USGS, and another on the NGA's new mobile and web applications for field data collection! 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

It's July 2020: Do you know what your data are doing? SAS Science Data Management: Contributing to USGS progress in management of scientific data - Viv Hutchison, USGS

Overview of CSS-SAS-SDM 

Most of us have probably heard of data management, but why should we take care to manage our scientific data well? Good data management increases reproducibility and integrity for Earth science data. As such, it's important that data is FAIR (findable, accessible, interoperable, and reusable) and well-maintained. 

The Science Data Management (SDM) branch within Science Analytics and Synthesis (SAS) leverages tools and expertise for good data management, and encourages community engagement around this topic. SDM has made strides towards better data management and measuring impact of data. 

ScienceBase (SB), an online digital repository for science data, became a trusted digital repository (TDR) in 2017, meeting rigorous standards to attain this status. Many journals require that data accompanying an article is made publicly available, and ScienceBase is an easy way to accomplish this requirement. The ScienceBase Data Release (SBDR) Tool, which allows scientists to easily complete a data release, connects seamlessly to other USGS tools such as the DOI Tool, IPDS, and the Science Data Catalog (SDC)The SBDR Tool can be customized to reflect a science center's specific workflow as well. There are currently 92 USGS science centers use SB for data release. The TDR has seen a steady increase in usage over time, and is now accompdating approximately 1,000 data releases per year. The upcoming SBDR Summary Dashboard will share data release metrics by science center, program, region, mission area. For help with ScienceBase data release, find the instructions page here, or contact sciencebase_datarelease@usgs.gov with other questions. 

SDM has strengthened the connection between USGS publications and supporting data. The team worked with the USGS Pubs Warehouse to collect information on known primary publications related to a data release, then added those related publications to the ScienceBase landing page and the DOI Tool. This link has proven useful for letting data authors know how others are using their data, and for understanding some impacts of the data. In a similar vein, SDM uses xdd (previously GeoDeepDive) to track references to USGS data DOIs, with plans to display these data citations on ScienceBase landing pages in the future.  

Citation of data is an emerging practice, but many data releases in ScienceBase have seen multiple instances of reuse in subsequent scientific research. For example, data for a Geospatial Fabric data release has been cited or reused by seventeen publications. Another data release on the global distribution of minerals was cited in U.S. public policy on critical minerals. 

Other projects in the works are aimed at analyzing USGS data for reuse. A recent "state of the data" project, consisting of analyzing 165 random data release samples against several established data maturity matrices, aims to determine how mature and FAIR USGS data contained in ScienceBase is, and to document an assessment methodology that is scalable to other bureau data repositories and platforms. 

SDM has undertaken initiatives in the past few years to make data easier to work with, access, and publish. USGS Data Tool, a python wrapper around a set of system APIs, is one such tool. USGS Data Tools creates a bridge between various systems (DOI Tool, Pubs Warehouse, BASIS plus, Metadata Parser, SBDR), making data management easier and more intuitive. Other systems have also recently gained connections, such as the SBDR Tool which now contains an option to auto fill information from IPDS. 

The USGS Model Catalog is another recent project spearheaded by SDM. The goals for the Catalog are to increase discovery and awareness of scientific models and link models to their related literature, code, data and other resources. The Model Catalog effort is informing practices that will allow the latest information for models to be dynamically updated. CDI is currently assisting by gathering input from modelers across the Bureau - contact Leslie Hsu at lhsu@usgs.gov for more information. 

So, in sum, what are your data doing?: 

  1. Your data are contributing to USGS successes in open and FAIR data 
  2. Citations to your data are being tracked 
  3. Data are connected between ScienceBase and your publication in Pubs Warehouse 
  4. Your data are accessible to the scientific community! 

For more information, see the USGS Data Management Website. 

Field Data Collection using NGA's free and open source Mobile Awareness GEOINT Environment (MAGE) and MapCache Mobile Apps 338px- Ben Foster and Justin Connelly, National Geospatial-Intelligence Agency 

Overview of the MAGE mobile application. 

The MAGE and MapCache mobile applications are open source field data collection apps designed to reach a wide audience, even those without GIS experience. The GitHub repository for these applications can be found here: http://ngageoint.github.io/MAGE/ 

Please see the meeting recording for a live demonstration. Some highlights from the demonstration: 

  1. The mobile application allows users to upload geo-located video and photo observations. 
  2. The mobile app also allows creation of lines and polygons. 
  3. Information added will be visible to other team members on the app. 
  4. The web application has a similar interface, but with more robust features. 

To join the CDI Event on MAGE for govt employees: (1) Request an account from NGA's Protected Internet Exchange (PiX): https://www.pixtoday.net (use government email address for account registration) Once you have an account on PiX, send an email to help@pixtoday.net and request to be added to the "MAGE USGS CDI" event.

-- 
All CDI Blog Posts 


Artificial Intelligence/Machine Learning, 6/9 - Tallgrass Supercomputer for AI/ML

Natalya Rapstine presented "USGS Tallgrass Supercomputer 101 for AI/ML," an overview of the new USGS Tallgrass supercomputer designed to support machine learning and deep learning workflows at scale, and deep learning software and tools for data science workflows. Natalya's slides covered the software stack that supports Deep Learning, including PyTorch, Keras, and TensorFlow. She then illustrated the capabilities with the "Hello World!" example of Deep Learning - the MNIST Database of Handwritten Digits.

See many more links to resources in the Slides and recording available at AI/ML Meeting Notes

ASCII art for the Tallgrass supercomputer

Data Management, 6/8 - Data Curation Network - extending the research data toolkit

Guests from the Data Curation Network, Lisa Johnston, Wendy Kozlowski, Hannah Hadley, and Liza Coburn presented on their recent work.

CURATED stands for: Check files/code; Understand the data; Request missing info or changes; Augment metadata; Transform file formats, Evaluate for FAIRness; Document curation activities.

Checklists and primers related to these topics for specific file formats are available at: https://datacurationnetwork.org/resources/

Also of interest is an Excel Archival Tool, which programmatically converts Microsoft Excel files into open-source formats suitable for long-term archival, including .csv, .png, .txt, and .html: https://github.com/mcgrory/ExcelArchivalTool


Data Curation Network infographic at https://datacurationnetwork.org/resources/

DevOps, 6/2 - Elevation data processing at scale

Josh Trahern, Project Manager of the NGTOC Elevation Systems Team led a discussion titled "Elevation Data Processing At Scale - Deploying Open Source GeoTools Using Docker, Kubernetes and Jenkins CI/CD Pipelines"

The presentation highlighted the Lev8 (pronounced as "elevate" and doing petabyte scale processing of DEMs) & QCR Web Applications, produced by the Elevation team.  These tools are used by Production Ops to generate the National Elevation Dataset (NED). The NED dataset is a compilation of data from a variety of existing high-precision datasets such as LiDAR data, contour maps, USGS DEM collection, SRTM, and other sources which are combined into a seamless dataset, designed to cover all the United States territory in its continuity.

Moving away from proprietary software and owning the code base – to prevent trying to fit a square peg into a round hole. Working toward 100% automation, 100% documentation and moving to Linux environment. Making all of these changes while the system was operational.

See the recording on the DevOps Meetings page.

Fire Science, 6/16 - NLCD Rangeland Fractional Component Time-Series: Development and Applications

Fire Update: Paul Steblein gave a Fire Update and Matthew Rigge, (EROS) – presented on "NLCD Rangeland Fractional Component Time-Series: Development and Applications."

The Fire Science coordinators and CDI staff are working on syncing content on the internal OneDrive and the CDI wiki, contact Paul at psteblein@usgs.gov if you have any questions about the group.

Metadata Reviewers, 6/1 - What is important to metadata reviewers

The Metadata Reviewers group had a discussion about what matters to them when reviewing metadata. Some themes were making USGS data as findable and reusable as possible, avoiding unnecessary complexity, and making metadata easier to write.

See more notes on the discussion at their Meetings wiki page.

Open Innovation, 6/18 and 6/19 - Paperwork reduction and Community-based water quality monitoring

On June 18, the topic was "Tackling the Paperwork Reduction Act (PRA) in the Age of Social Media and Web-based Interactive Technology." Three Information Collection Clearance Officers from DOI (Jeff Parrillo), USGS (James Sayer), and FWS (Madonna Baucum) explained the basics of the Paperwork Reduction Act (PRA), discussed how the PRA applies to crowdsourcing, citizen science, and prize competition activities, and participated in a Q&A discussion with the audience. More information on the Open Innovation wiki.

On June 19, Ryan Toohey and Nicole Herman-Mercer presented on "Indigenous Observation Network (ION): Community-Based Water Quality Monitoring Project." ION, a community-based project, was initiated by the Yukon River Inter-Tribal Watershed Council (YRITWC) and USGS. Capitalizing on existing USGS monitoring and research infrastructure and supplementing USGS collected data, ION investigates changes in surface water geochemistry and active layer dynamics throughout the Yukon River Basin. More information on the Open Innovation wiki.

Risk, 6/18 - Funded Project Reports

This was "round 1" of final project presentations from the FY19 Risk RFP awardees. Please see the list below for presenters - each one is about 10-12 minutes in length. PIs from each project provided a project overview, a description of their team, accomplishments, deliverables, and lessons learned.

  • Quantifying Rock Fall Hazard and Risk to Roadways in National Parks: Yosemite National Park Pilot Project, Brian Collins, Geology, Minerals, Energy, and Geophysics Science Center
  • The State of Our Coasts: Coastal Change Hazards Stakeholder Engagement & User Need Assessment, Juliette Finzi-Hart, Pacific Coastal and Marine Science Center
  • Re-visiting Bsal risk: how 3 years of pathogen surveillance, research, and regulatory action change our understanding of invasion risk of the exotic amphibian pathogen Batrochochytrium salamandrivorans, Dan Grear, National Wildlife Health Center
  • Communications of risk - uranium in groundwater in northeastern Washington state, Sue Kahle, Washington Water Science Center

See more at the Risk community of practice wiki page.

Tech Stack, 6/11 - ESIP Collaboration Infrastructure 2.0

In June the joint Tech Stack and ESIP IT&I meeting hosted three presentations

Ike HechtWikiWorks on the ESIP Wiki. Mediawiki upgrade from v1.19 to 1.34 of the ESIP wiki

Lucas CioffiQiQoChat lead developer on the technical side of QiQoChat. Utilizing QiqoChat to bring together our asynchronous workspaces with our virtual conferences and meetings

Sheila Rabun, ORCID US Community Specialist on the ORCID API. Becoming an ORCID member to gain access to ORCID API keys to integrate ORCID authentication into the wiki.

See more at the IT&I meetings page

Software Dev, 6/25 - Serverless!

Carl Schroedl presented on "Using Serverless and GitLab CI/CD to Continuously Deliver AWS Step Functions." See: https://aws.amazon.com/lambda

Notes and more links:

-- 
All CDI Blog Posts 

Continuing our exploration of 2019's CDI funded projects, June's monthly meeting included updates on projects involving extending Sciencebase's current capabilities to aid disaster risk reduction, coupling hydrologic models with data services, and standardizing and making available 40 years of biosurveillance data. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Screenshot of a beta version of ScienceBase where an option to publish all files to ScienceBase appears.

Extending ScienceBase for Disaster Risk Reduction - Joe Bard, USGS 

The Kilauea volcano eruption in 2018 revealed a need for near real-time data updates for emergency response efforts. During the eruption, Bard and his team created lava flow update maps to inform decision-making, using email to share data updates. This method proved to be flawed, causing issues with versioning of data files and limitations on sharing with all team members at the same time. 

ScienceBase has emerged as an alternative way to share data for use by emergency response workers. When GIS data is uploaded to ScienceBase, web services are automatically created. Web services are a type of software that facilitates computer to computer interaction over a network. Users don't need to download data to access it; instead it can be easily accessed problematically. Additionally, data updates can be automatically propagated through web services, to avoid versioning issues. However, use of ScienceBase during the Kilauea volcano crisis met unforeseen issues around reliability related to hosting on the USGS server and an overload of simultaneous connections. 

This project explores a cloud-based instance of Geoserver on the AWS S3 platform wherein the user can publish geospatial services to this cloud-based server. This method is more resilient to simultaneous connections and takes into account load-balancing and auto-scaling. It also opens the possibility of dedicated Geoserver instances based on a team's needs. ScienceBase is currently working on a function to publish data directly to S3. 

A related Python tool for downloading data from the internet and posting on ScienceBase using ASH3D as an example is available on GitLab for USGS users.

Next steps for this project include finalizing cloud hosting service deployment and configuration settings, checking load balancing and quantifying performance, exploring set-up of multiple Geoserver instances in the cloud, evaluating load balancing technologies (e.g., Cloudfront), and ensuring all workflows are possible using a SB Python library. 

Presentation slide explaining the concept of a modeling sandbox.

Coupling Hydrologic Models with Data Services in an Interoperable Modeling Framework - Rich McDonald, USGS  

Integrated modeling is an important component of USGS priority plans. The goal of this project is to use an existing and mature modeling framework to test a Modeling and Prediction Collaborative Environment "sandbox" that can be used to couple hydrology and other environmental simulation models with data and analyses. 

Modeling frameworks are founded on the idea of component models. Model components encapsulate a set of related functions into a usable form. For example, going through a Basic Model Interface (BMI) means that no matter what the underlying language is, the model component can be made available as a Python component. 

To test the CSDMS modeling framework, the team took the PRMS (Precipitation-Runoff Modeling System) modeling system and broke it down into its 4 reservoirs (surface, soil, groundwater, and streamflow) and wrapped them in a BMI. They then re-coupled them back together. The expectation is that the user could then couple PRMS with other models. 

See the meeting recording for demonstration of the tool. You may note the model run-time interaction during the demo. You'll also see that PRMS is in Fortran, but is being run in Python. Code for this project is available on GitHub. 

Presentation slide of the interface of the Wildlife Health Information Sharing Partnership event reporting system, abbreviated Whispers

Transforming Biosurveillance by Standardizing and Serving 40 Years of Wildlife Disease Data - Neil Baertlein, USGS 

Did you know that over 70% of emerging infectious diseases originate in wildlife? The National Wildlife Health Center (NWHC) has been dedicated to wildlife health since 1975. Biosurveillance the NWHC has been involved in includes: lead poisoning, West Nile Virus, Avian influenza, white-nose syndrome, and SARS-CoV2. 

NWHC has become a major data repository for wildlife health data. To manage this data, WHISPers (Wildlife Health Information Sharing Partnership event reporting system) and LIMS (laboratory information management system) are utilized. WHISPers is a portal for biosurveillance data in which events are lab verified and the portal allows collaboration with various state and federal partners, as well as some international partners, such as Canada. 

There is a need to leverage NWHC data to inform public, scientists, and decision makers, but substantial barriers stand in the way of this goal: 

  1. Data is not FAIR (findable, accessible, interoperable, and reusable) 
  2. There are nearly 200 datasets in use 
  3. Data is not easy to find 
  4. Data exists in various file formats 
  5. There is limited to no documentation for data 

As a result, this project has formulated a five step process for making NWHC data FAIR: 

  1. Definition: creating a definition.  
    1. NWHC created a template in which they capture information such as users responsible for data, the file type of the data, and where the data is stored. A data dictionary was also created. 
  2. Classification: provide meaning and context for data.  
    1. In this step, NWHC classifies relationships with other datasets and other databases, and identifies inconsistencies in data. 
  3. Prioritization: identify high-priority datasets.  
    1. High-priority datasets are ones that NWHC needs to continue to use down the road or are currently high-impact. Non-priority datasets can be archived. 
  4. Cleansing: Next step for high-priority datasets.  
    1. Includes fixing data errors and standardizing data. 
  5. Migrating: map and migrate the cleansed data. 

To put this five step process into effect, NWHC hired two dedicated student service contractors to work on the project. Interviews with lab technicians, scientists, and principal investigators were conducted to gather input and identify high-priority datasets. Dedicated staff also documented datasets, organized said documentation, and began cleansing high-priority datasets by fixing errors and standardizing data. At the time of this presentation, 130 datasets are ready for archiving and cleansing. 

There have been some challenges faced during this process so far. Training of the staff responsible for making NWHC data FAIR and easier to work with has been a substantial time investment. The work is labor and time-intensive, and some datasets do not have any documentation readily available. The current databases in use were built with limited knowledge of database design. Finally, there are variations in laboratory methodology, field methodology, and between individuals or different teams. 

The project team are able to share several takeaways. Moving forward, data collectors need to think through data collection methods and documentations more thoroughly. Some questions a data collector may ask about their process are: Is it FAIR? Are my methods standardized? How is the data collected now and how will it be collected in the future? Documenting the process and management of data collection and compilation is also important. 

--   
All CDI Blog Posts   

CDI's May monthly meeting included updates on CDI projects focusing on FAIR data, a grassland productivity forecast, and animal movement visualization. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Building a Roadmap for Making Data FAIR in the U.S. Geological Survey, Fran Lightsom, USGS 

Fran Lightsom presented on the process of building a roadmap for making USGS data FAIR. FAIR stands for Findable, Accessible, Interoperable and Reusable and has become a popular way for organizations to improve the value and usefulness of data products. 

To begin building a roadmap for FAIR data, the project team conducted a survey of data producers, collected use cases of projects that integrate data, hosted a workshop on September 9th-11th, 2019, and drafted a report & list of recommendations. The workshop produced about 100 discrete recommendations, with 14 being deemed essential, 38 important, and 44 useful. 

Some broad thoughts that came out of the workshop included the assertion that open science requires extension of FAIR beyond data to samples, methods, software, and tools; a less-explored application of FAIR. Implementing recommendations would be the responsibility of many groups, and would require input from representatives of these groups. There may be a place for CDI to step in and coordinate in the future, as this effort continues. 

Further objectives coming out of this effort include increasing use of globally unique persistent identifiers (especially with physical samples and software), developing policy, researching best practices, creating support tools, enabling creation of digital products that are interoperable and usable by making use of existing standards, and improving interoperability through coordinated creation of shared vocab and ontology. 

An opportunity for CDI to view and provide feedback for the FAIR roadmap is upcoming. 

Implementing a Grassland Productivity Forecast Tool for the U.S. Southwest, Sasha Reed, USGS 

Grass-Cast is a CDI-funded project that is focused on producing near-term forecasts of grassland productivity for the U.S. southwest. The goal of the project is to bring together different kinds of data in order to provide upcoming growing season forecasts, updated very 2 weeks. This work started in the Great Plains to provide information about seasonal outlooks to ranchers. 

So, why are grasslands important? Grasslands provide a critical amount of ecosystem services. They are one of the largest single providers of agro-ecological services in the U.S., and they supply important habitat and food provision for wildlife. Productivity of grasslands helps to determine fire routines and how much carbon is coming from the atmosphere into the grass and soil. Dust reduction and problems associated with air quality can also be thought about from a grassland productivity perspective. 

Near-term productivity forecasts for grasslands can provide information to stakeholders on cattle stocking rates, where and how to allocate resources towards fire management, and rates of carbon sequestration. Grasslands are notably responsive to subtle changes in the environment and climate, and thus, they vary from year to year, making productivity predictions difficult. 

 

The diagram above outlines the process that informs Grass-Cast for the Great Plains, but the project team wants to expand to include the Southwest region. The Southwest region differs from the Great Plains in that it does not have the same homogeneous coverage of grasses, meaning that bare ground is often exposed, complicating the interpretation of remotely sensed data. The Southwest also has a more varied mix of vegetation types, including cacti and shrubs, which needs to be differentiated from grass cover. 

The Grass-Cast team aimed to take the same overarching process used in the Great Plains Grass-Cast, but adjust the methods to effectively use Grass-Cast in the Southwest. First, the team looked at different satellite indices for estimating grassland productivity in the hopes they might better address the challenges of the Southwest. They found that the previously utilized NDVI (normalized difference vegetation index) greenness index did work well in a lot of places in the Southwest, but not as well in others. These results supported the idea to try newer remote sensing platforms that don't rely on a greenness index, such as SIF (solar induced fluorescence). SIF is a different way of looking at plant activity that uses plant physiology to monitor how electrons are moving trough the photosynthetic chain. The Southwest is different from the Great Plains in that the dry environment means that you can have plants that are green but not very active, making the relationship between greenness and productivity more challenging. Additionally, many Southwestern grasslands have two growing seasons - spring and summer, representing a temporal challenge. Other remote sensing methods examined here were NIRv (near-infrared reflectance of vegetation), a greenness index that hones in specifically on green parts of remotely sensed pixels in images, and SATVI (Soil-Adjusted Total Vegetation Index), which takes into account soil brightness. 

The team compared results from these different indices using eddy covariance data, and found that neither SIF or NDVI provided good results. However, NIRv and SATVI did a good job of predicting grassland productivity for the Southwest, and there is some promise in SIF as a proxy for capturing the timing of the growing season. 

Grass-Cast now plans to incorporate data for the Southwest (Arizona and New Mexico) into the current tool. Ultimately,  the team wants to integrate across these different methods and go beyond Arizona and New Mexico. There is a lot of room for collaboration; stay tuned for upcoming workshops and seminars. 

GrassCast is available here. 

A generic web application to visualize and understand movements of tagged animals, Ben Letcher, USGS 

Tracking and tagging data on individual animals provides key information about movements, habitat use, interactions and population dynamics, and there is a lot of this type of data currently available. For example, the Movebank database currently has 2 billion observations. Tracking data is expensive and requires time and effort to collect; TAME (tagged animal movement explorer) aims to help maximize the value of this data and make it easier to interact with these complex data. 

TAME is a data exploration tool in the form of a web application, based on open source libraries. The TAME team's goal is to make TAME as easy to use as possible, and to allow for interaction and exploration of tagging data. Currently, TAME features include: 

  1. Four introduction videos 
  2. A user account system where users can upload their own data, with an option to publish and/or share 
  3. Ability to map observations to color, size, or outline 
  4. Ability to select individuals or select by area, with multiple area selections available 
  5. Ability to cross filter where users can filter any one variable, or multiple variables, and output a movie/time series of the data. 

 Screenshot of a slide showing features of the web application TAME

See the monthly meeting recording on the wiki page for a live demonstration of TAME, or explore for yourself on the TAME website. 

Ben Letcher (bletcher@usgs.gov) is excited to explore a podcast or video series centered on animal movement stories – please reach out to him if you have experience in this area! 

--  
All CDI Blog Posts  



Highlight images from the May 2020 Collaboration Area topics, from left to right: The User Experience Honeycomb (
source) (Usability), interfacing with hydrologic data with Hydroshare (Tech Stack), machine learning Train and Tune steps covered by SageMaker (AIML). 

CDI collaboration areas bring us focused information and tools to help us work with our data. Do you have an idea for a topic that you want to learn about or present to a group? Get in touch with us to coordinate! - Leslie, cdi@usgs.gov


Artificial Intelligence / Machine Learning, 5/12 - SageMaker for machine learning models 

Amazon Web Services personnel and USGS scientists presented on SageMaker and an example of its use at the USGS Volcano Science Center. SageMaker provides the ability to build, train, and deploy machine learning models quickly. Phil Dawson of USGS showed an application to the continuous seismic data that is collected at all USGS volcanic observatories, and how to apply the models even though "every volcano speaks a different dialect" (the seismic energy looks different).  

The recording is posted at the meeting wiki page 

Data Management, 5/11 - records management for electronic records 

Chris Bartlett presented on how records management is moving more aggressively to electronic records management, and it is a ripple of changes. She discussed what this means in relation to our records including data, our processes, and expectations. 

Slides and recording are posted at the meeting wiki page. 

Fire Science, 5/19 - scaling up tree-ring fire history 

The Fire Science Community of Practice heard the monthly fire update, discussion about fire science communications, and a science presentation from Ellis Margolis on Scaling up tree-ring fire history: from trees to the continent and seasons to centuries. 

Contact Paul Steblein or Rachel Loehman for more information. Future meeting dates are listed on the Fire Science wiki page

Metadata Reviewers, 5/4 - data publication versus research publication 

The group discussed the question "What type information (in the metadata) is necessary for a data publication vs research publication?" In addition, links were shared about an ongoing discussion on metadata for software and code.  

See more notes on the discussion at their Meetings wiki page. 

Risk, 5/12 - communicating hazard and risk science 

The Risk community of practice hosted a panel discussion on communicating hazard and risk science. The speakers were Sara McBride (USGS), Kerry Milch (Temple University), and Nanciann Regalado (Dept of Interior, US Fish and Wildlife Service). Each speaker shared news on some of their recent projects and lessons learned on the job. Projects discussed included ShakeAlert and aftershock forecasts, the USGS circular "Communicating Hazards – A Social Science Review to Meet U.S. Geological Survey Needs", and the Deepwater Horizon Oil Spill Natural Resource Damage Assessment Trustee Council.  

See more at the Risk community of practice wiki page 

Semantic Web, 5/14 - concept maps for modeling traceable adaptation, mitigation, and response plans 

Brian Wee presented on an experiment to use concept maps for documenting science-informed, data-driven workflows for climate-related adaptation, mitigation, and response planning. The ESIP wiki page on the concept map repository describes how concept maps can be used to describe your own data-to-decisions narrative, as a just-in-time (i.e. as needed) educational resource, to provide context awareness about where you fit in the big picture, and to experiment with ideas for context-aware knowledge discovery. 

See a link to the slides and recording at the Semantic Web meetings page. 

Software Dev, 5/28 - data warehousing and ETL pipelines 

May's topic was data warehousing and ETL (Extract, Transform, Load) pipelines. Cassandra Ladino presented on the use of Amazon Web Services (AWS) Redshift Data Warehouse as applied to the USGS Configuration Management Committee. Jeremy Newson presented on ETL pipelines using AWS Glue. 

See more at the Software Dev wiki meetings page. 

Tech Stack, 5/14 - HydroShare for sharing hydrologic resources 

The joint CDI Tech Stack and ESIP IT&I Tech Dive hosted a presentation on CUAHSI HydroShare by Jerad Bales, Anthony Castronova, and Jeff Horsburgh. HydroShare is a platform for sharing hydrologic resources (data, models, model instances, geographic coverages, etc.), enabling the scientific community to more easily and freely share products, including the data, models, and workflow scripts used to create scientific publications. 

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page. 

Usability, 5/20 - usability and trust 

resource review was posted on the topic of how usability and interface influence user experience, including credibility and use. "The resource highlights that user interface and credibility influence user experience because design elements can impact whether users trust and believe what is being presented or delivered to them." 

See more of the group's activity and resources on the Usability wiki page  

-- 
More CDI Blog Posts 

The CDI Collaboration Areas are keeping me busy. You can get to all of these groups and sign up for mailing lists on the CDI Collaboration Area wiki page.

From upper left corner, clockwise: DevOps: image from Tidelift website; SoftwareDev: logo for uvicorn; Risk: Impact360 worksheet; AI/ML: image from AI/ML DELTA presentation; Semantic Web: image from Garillo and Poveda-Villalon; Open Innovation: image from OI wiki page; Tech Stack: image from Unidata gateway webpage; Usability: image from Sayer's Paperwork Reduction Act presentation


4/6 Metadata Reviewers - revision or release information in titles

In April the Metadata Reviewers group dove into a question about including the date of a revision or release in the title of the data release. Doing so would help to distinguish between different versions of a data release. After much discussion the group concluded that two metadata records should not have the same title in their citation elements.

See more notes on the discussion at their Meetings wiki page.

4/7 DevOps - managed open source with Tidelift

The DevOps group heard a presentation from Tidelift. Tidelift partners with open source maintainers in order to support application development teams. This saves time and reduces risk when using open source packages to build applications.

See the recording and slides on the DevOps Meeting page. If you are interested in using Tidelift for a USGS application, get in touch with Derek Masaki at dmasaki@usgs.gov. If you'd like a presentation from Tidelift, contact Melanie Gonglach at melanie@tidelift.com.

4/9 Semantic Web - implementing FAIR vocabularies and ontologies

The group discussed  "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web" by Daniel Garijo and Marıa Poveda-Villalon. The discussion focused on sections 2 and 3 of the paper, URIs (uniform resource identifiers) and Documentation. The group recognized that implementation of the best practices in the paper (for example, stable, permanent identifiers) would depend not only on semantic specialists, but also those who set policy for the USGS network. This point was communicated to the group that is working on enabling FAIR practices in the USGS.

See more at the Semantic Web meetings page.

4/9 Tech Stack - Unidata Science Gateway

Julien Chastang presented on the Unidata Science Gateway (https://science-gateway.unidata.ucar.edu/) Unidata is exploring cloud computing technologies in the context of accessing, analyzing, and visualizing geoscience data. From the abstract: "With the aid of open-source cloud computing projects such as OpenStack, Docker, and JupyterHub, we deploy a variety of scientific computing resources on Jetstream for our scientific community. These systems can be leveraged with data-proximate Jupyter notebooks, and remote visualization clients such as the Unidata Integrated Data Viewer (IDV) and AWIPS CAVE."

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.

4/13 CDI Data Management - changes to the USGS Science Data Catalog

Lisa Zolly presented on changes coming with the USGS Science Data Catalog version 3. Today, the Science Data Catalog (https://data.usgs.gov/) has more than 21,000 metadata records. In order to serve its human and machine stakeholders, a number of changes are planned in order to address the changing landscape of federal data policy, substantial growth of the catalog, improvement of workflows, improvement of usability, and more robust reporting and metrics.

Slides and recording are posted at the meeting wiki page.

4/14 Artificial Intelligence / Machine Learning - fine scale mapping of water features at the national scale

Jack Eggleston (USGS), John Stock (USGS), and Michael Furlong (NASA) presented on "Fine scale mapping of water features at the national scale using machine learning analysis of high-resolution satellite images: Application of the new AI-ML natural resource software - DELTA." The availability of high-resolution satellite imagery, combined with machine learning analysis to rapidly process the satellite imagery, provides the USGS with a new capability to map natural resources at the national scale.

The recording is posted at the meeting wiki page.

4/15 Usability - how the Paperwork Reduction Act affects usability studies

James Sayer presented on the Paperwork Reduction Act (PRA) and Usability Testing. The PRA is designed to protect the public from inappropriate data collection. All agencies have their own PRA procedures, so implementation in other agencies won't necessarily translate to USGS implementation. James reviewed Fast Track procedures and exclusions. His advice included to start early in thinking about PRA in your usability work, and to talk to your ICCO (Information Collection Clearance Officer) if you have any questions.

The slides, notes, and recording are posted on the meeting wiki page. Do you have more questions? Contact James at jsayer@usgs.gov.

4/16 Risk - Product evaluation/testing and integrating solutions into strategy

The Risk Community of Practice April meeting was part 3 of a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. Emphasis was given to the tools for product evaluation/testing ("[Re]Solve") and integrating solutions into strategy ("[Re]Integrate"). Worksheets were provided to "Create and Test a Solution in Three Acts." A follow-up session on April 23 discussed examples of the worksheets.

Access the slides and recording, and handouts at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).

4/17 Ignite Open Innovation - Open Innovation and COVID-19

April was Citizen Science Month! At the Open Innovation meeting, Sophia B Liu (USGS Open Innovation Lead) provided an overview of the various open innovation efforts inside and outside of government that have emerged in response to COVID-19. She also discussed The Opportunity Project Earth Sprint and proposed Problem Statements.

See more information and list of COVID-19 sites at the meeting wiki page.

4/21 Fire Science - stakeholder input on USGS Fire Science

James Meldrum and Ned Molder of the USGS Fort Collins Science Center presented on Analysis of stakeholder input on USGS fire science communication and outreach, science priorities, and critical science needs. The group also heard updates on the USGS Fire Science strategy, recent fire activity, and held a discussion on "How is Covid 19 affecting your fire science"?

Contact Paul Steblein (psteblein@usgs.gov) or Rachel Loehman (rloehman@usgs.gov) for more information.

4/23 Software Dev - FastAPI

The Software Dev cluster had Brandon Serna and Jeremy Fee present about their work using FastAPI with some comparisons to Flask. I am not a developer so I will summarize by pasting some links, tag lines, and interesting things I heard.

Recommended resources.

I'm going to take a little bit of space to list some of the things I Googled while listening to this call, because to me these descriptions (and some of the logos) are fascinating. It would be fun to do a tagline-logo-name matching game.

  1. FastAPI, https://fastapi.tiangolo.com/: FastAPI framework, high performance, easy to learn, fast to code, ready for production
  2. Flask: https://flask.palletsprojects.com/en/1.1.x/: web development, one drop at a time
  3. Hot reloading <- this sounds very exciting, and according to the internet it is "The idea behind hot reloading is to keep the app running and to inject new versions of the files that you edited at runtime. This way, you don't lose any of your state which is especially useful if you are tweaking the UI"
  4. Uvicorn: https://www.uvicorn.org/: The lightning-fast ASGI server
  5. Cookiecutter https://cookiecutter.readthedocs.io/en/1.7.2/: Better Project Templates
  6. Gunicorn: https://gunicorn.org/: Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy
  7. Pyenv: https://github.com/pyenv/pyenv: pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well
  8. Pipenv: https://pipenv-fork.readthedocs.io/en/latest/: Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world. Windows is a first-class citizen, in our world
  9. Hypercorn: https://pgjones.gitlab.io/hypercorn/: Hypercorn is an ASGI web server based on the sans-io hyper, h11h2, and wsproto libraries and inspired by Gunicorn

See more at the Software Dev wiki meetings page.


--
More CDI Blog Posts

We continued our exploration of 2019's CDI funded projects in April's monthly meeting with presentations on the Climate Scenarios Toolbox, developing cloud computing capability for camera image velocity gaging, and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Open-source and open-workflow Climate Scenarios Toolbox for adaptation planning 

Aparna Bamzai-Dodson, USGS, presented on the Climate Scenarios Toolbox (now renamed to the Climate Futures Toolbox!), an open-source tool that helps users formulate future climate scenarios for adaption planning. Scenario planning is a way to consider the range of possible outcomes by using projections based on climate data to develop usually 3-5 plausible divergent future scenarios (ex: hot and dry; moderately hot with no precipitation change; and warm and wet). Resource managers and scientists can use these scenarios to help predict the effects of climate change and attempt to select appropriate adaptation strategies. However, climate projection data can be difficult to work with in areas of discovery, access, and usage, involving multiple global climate model repositories, downscaling techniques, and file formats. The Climate Futures Toolbox aims to take the pain out of working with climate data.

Collection of photos of people collaborating around climate scenarios and adaptation planning graphs.

The creators of the Toolbox wanted a way to make working with climate data easier by lowering the barrier to entry, automating common tasks, and reducing the potential for errors. The Climate Futures Toolbox uses a seamless R code workflow to ingest historic and projected climate data and generate summary statistics and customizable graphics. Users are able to contribute open code to the Toolbox as well, building on its existing capabilities and empowering a larger user community. The Climate Futures Toolbox was created in collaboration with University of Colorado-Boulder's Earth Lab, the U.S. Fish and Wildlife Service, and the National Park Service. 

CDI members are encourage to become engaged in the Toolbox by installing and using it, providing feedback on issues, and contributing code to the package. Since April's monthly meeting, the project has developed and undergone renaming, so this is a rapidly evolving endeavor. 

Develop Cloud Computing Capability at Streamgages using Amazon Web Services GreenGrass IoT Framework for Camera Image Velocity Gaging 


Frank Engel at the USGS Texas Water Science Center presented next on a CDI project involving non-contact stream gaging within a cloud computing framework. 

Measuring stream flow is an important aspect of USGS' work in the Water Mission Area, and stream gaging, a way to measure water quantity, is a technique with which many scientists are familiar. However, it is sometimes difficult to obtain measurements with traditional stream gaging, like at times of flooding, or when measurement points are unsafe or unreachable. Additionally, post flood measurement methods can often be expensive and not as accurate. 

To get around these issues, scientists have developed non-contact methods with which to measure water quantity. For example, cameras are utilized to view a flooding river, which can produce a velocity measurement after processing and other analysis steps. This is a complicated method and requires many steps and extensive training. Thus, the goal of this project is to make this process work automatically utilizing cloud computing and IoT. 

The first step required building a cloud infrastructure, with the help of Cloud Hosting Solutions (CHS). This involves connecting the edge computing (camera and raspberry PI footage of a stream) to an Amazon Web Services (AWS) IoT system and depositing camera footage and derivative products into a S3 bucket. The code for this portion of the product is in a preliminary GitLab repository that is projected to be published as a part of the long-term project. The team is also still working toward building the infrastructure through to data serving and dissemination. 

Workflow for getting streamflow data into a cloud computing system.

Other successes accomplished with this project so far include auto-provisioning (transmitting location and metadata) of edge computing systems to the cloud; establishing global actions (data is transmitted to the cloud framework and can roll into automated processing, like extracting video into frames); and building automated time-lapse computation. 

Engel and the project team have taken away a couple lessons from their experience with this project: first, cloud computing knowledge takes a lot of work and time to acquire, and second, in the short term, It can be difficult to establish a scope that encompasses the needs and wants of all stakeholders. 

Establishing standards and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database 

Jason Ferrante with the Wetland and Aquatic Research Center discussed his team's project on establishing standards for eDNA data in the USGS Nonindigenous Aquatic Species database (NAS). 

eDNA is genetic material released by an organism into its environment, such as skin, blood, saliva, feces. By collecting water, soil, and air samples, scientists can detect the presence of a species with eDNA. Ferrante's project aims to combine the traditional specimen sightings already available in the NAS with eDNA detections for a more complete distribution record and improved response time to new invasions. 

There is currently a need for an open, centralized eDNA database. eDNA data is currently scattered among manuscripts and reports, and thus not easily retrievable via web searches. Additionally, there are no databases dedicated to Aquatic Invasive Species (AIS), which are the species of interest for this project. A centralized, national AIS viewer will allow vetting and integration of data from federal, academic, and other sources, increase data accessibility, and improve coordination of research and management activities. 

In order to successfully create a centralized AIS viewer, community standards need to be established so that data can be checked for quality and validity, especially within the FAIR data framework (Findable, Accessible, Interoperable, and Reusable). To establish community standards and successfully integrate eDNA into NAS, the project team accomplished several objectives: 

List of steps taken in integrating eDNA data into the Nonindigenous Aquatic Species Database

1) Experimental Standards 

  • Collating best standards and practices for sampling design and collection, laboratory processing, and data analysis, in an eDNA literature review. 

2) Stakeholder Backing 

  • Gathered a group of five other prominent/active eDNA researchers within DOI to discuss standards and vetting process 
  • Teleconferences to gain consensus 
  • Plan to produce a white paper 

3) Integration into NAS 

  • Pre-submission form about eDNA scientists' design and methodology in order to vet data 
  • Prototype web viewer (see meeting recording for more; must be logged into CDI wiki) 

Some challenges faced during the project included gaining consensus on the questions for the pre-submission form; staying organized and in communication; and meeting the needs of managers and researchers. Ferrante and the project team would love to follow up with CDI for help developing new tools which use eDNA data across databases to inform management; and providing feedback on an upcoming manuscript about the project's process. 


 

  • No labels