Continuing our exploration of 2019's CDI funded projects, June's monthly meeting included updates on projects involving extending Sciencebase's current capabilities to aid disaster risk reduction, coupling hydrologic models with data services, and standardizing and making available 40 years of biosurveillance data.
For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki.
The Kilauea volcano eruption in 2018 revealed a need for near real-time data updates for emergency response efforts. During the eruption, Bard and his team created lava flow update maps to inform decision-making, using email to share data updates. This method proved to be flawed, causing issues with versioning of data files and limitations on sharing with all team members at the same time.
ScienceBase has emerged as an alternative way to share data for use by emergency response workers. When GIS data is uploaded to ScienceBase, web services are automatically created. Web services are a type of software that facilitates computer to computer interaction over a network. Users don't need to download data to access it; instead it can be easily accessed problematically. Additionally, data updates can be automatically propagated through web services, to avoid versioning issues. However, use of ScienceBase during the Kilauea volcano crisis met unforeseen issues around reliability related to hosting on the USGS server and an overload of simultaneous connections.
This project explores a cloud-based instance of Geoserver on the AWS S3 platform wherein the user can publish geospatial services to this cloud-based server. This method is more resilient to simultaneous connections and takes into account load-balancing and auto-scaling. It also opens the possibility of dedicated Geoserver instances based on a team's needs. ScienceBase is currently working on a function to publish data directly to S3.
A related Python tool for downloading data from the internet and posting on ScienceBase using ASH3D as an example is available on GitLab for USGS users.
Next steps for this project include finalizing cloud hosting service deployment and configuration settings, checking load balancing and quantifying performance, exploring set-up of multiple Geoserver instances in the cloud, evaluating load balancing technologies (e.g., Cloudfront), and ensuring all workflows are possible using a SB Python library.
Integrated modeling is an important component of USGS priority plans. The goal of this project is to use an existing and mature modeling framework to test a Modeling and Prediction Collaborative Environment "sandbox" that can be used to couple hydrology and other environmental simulation models with data and analyses.
Modeling frameworks are founded on the idea of component models. Model components encapsulate a set of related functions into a usable form. For example, going through a Basic Model Interface (BMI) means that no matter what the underlying language is, the model component can be made available as a Python component.
To test the CSDMS modeling framework, the team took the PRMS (Precipitation-Runoff Modeling System) modeling system and broke it down into its 4 reservoirs (surface, soil, groundwater, and streamflow) and wrapped them in a BMI. They then re-coupled them back together. The expectation is that the user could then couple PRMS with other models.
See the meeting recording for demonstration of the tool. You may note the model run-time interaction during the demo. You'll also see that PRMS is in Fortran, but is being run in Python. Code for this project is available on GitHub.
Did you know that over 70% of emerging infectious diseases originate in wildlife? The National Wildlife Health Center (NWHC) has been dedicated to wildlife health since 1975. Biosurveillance the NWHC has been involved in includes: lead poisoning, West Nile Virus, Avian influenza, white-nose syndrome, and SARS-CoV2.
NWHC has become a major data repository for wildlife health data. To manage this data, WHISPers (Wildlife Health Information Sharing Partnership event reporting system) and LIMS (laboratory information management system) are utilized. WHISPers is a portal for biosurveillance data in which events are lab verified and the portal allows collaboration with various state and federal partners, as well as some international partners, such as Canada.
There is a need to leverage NWHC data to inform public, scientists, and decision makers, but substantial barriers stand in the way of this goal:
As a result, this project has formulated a five step process for making NWHC data FAIR:
To put this five step process into effect, NWHC hired two dedicated student service contractors to work on the project. Interviews with lab technicians, scientists, and principal investigators were conducted to gather input and identify high-priority datasets. Dedicated staff also documented datasets, organized said documentation, and began cleansing high-priority datasets by fixing errors and standardizing data. At the time of this presentation, 130 datasets are ready for archiving and cleansing.
There have been some challenges faced during this process so far. Training of the staff responsible for making NWHC data FAIR and easier to work with has been a substantial time investment. The work is labor and time-intensive, and some datasets do not have any documentation readily available. The current databases in use were built with limited knowledge of database design. Finally, there are variations in laboratory methodology, field methodology, and between individuals or different teams.
The project team are able to share several takeaways. Moving forward, data collectors need to think through data collection methods and documentations more thoroughly. Some questions a data collector may ask about their process are: Is it FAIR? Are my methods standardized? How is the data collected now and how will it be collected in the future? Documenting the process and management of data collection and compilation is also important.