For August's CDI Monthly Meeting, we heard a presentation on integrating short-term climate forecast into a restoration management support tool, and had our first session of the CDI Pop-Up Lab.
For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki.
Look out for Machine Learning Mondays: a weekly course on image analysis using machine learning. Dan Buscombe will be offering this course covering image recognition, object recognition, image segmentation, and semi-supervised image classification. The course is targeted at USGS employees and contractors in the fields of satellite aerial imaging, image analysis, geospatial analysis, machine learning software development. The course is only available to those with a USGS email, at no charge. Experience in Python and the command line interface are recommended as a pre-requisite.
Alternate text: slide summarizing the goals of a short-term soil moisture forecaster and resulting example heat maps of the U.S.
The goal of this FY19 project is to create a link between data and how it can be used in a management context. Climate forecasts are typically spatially or temporally coarse data, while managers need more temporally fine and site-specific data. For example, the success of seeding and planting rely on short-term monthly and seasonal climate that occurs immediately after seeding and planting. There is a 90% failure rate for seeding and planting in the western U.S.
The project facilitates the link between climate data/climate knowledge and management need by creating a short term moisture forecaster application. In the western U.S., water is a limiting factor and drought is a natural part of ecosystem, and expected to be exacerbated further in the coming years. For managers, seeding/planting and drought are connected, and managers are in need of more information on climate forecast for after they seed or plant. Climate knowledge for this use case generates probabilities on whether conditions will be hotter or colder and dryer or wetter. This is coarse information that needs translation so managers can use it.
The SOILWAT2 model is essentially a translation tool, wherein the user provides the model info on a specific site (climate, vegetation, soil), and the model output will provided probabilities on where water moves on a daily basis and measurements of soil moisture at different depths. The National Weather Service provides one prediction for each of 102 regions for a time period, but this multi-month forecast data is very coarse.
The application team is currently developing code to synthesize short term climate predictions to a finer temporal and spatial scale in order to derive a better soil moisture model.
Spatially and temporally refining this data was a challenge. A Jupyter Notebook that details the steps the project team took is available to USGS employees: https://code.chs.usgs.gov/candrews/shorttermdroughtforecaster.
A quick summary of the process:
This process produces an example output that is explained in detail in the meeting recording (log in to Confluence to access). The application will be integrated into the Land Treatment Exploration Tool (LTET), a Bureau of Land Management and USGS collaboration intended for managers planning restoration projects.
Alternate text: slide showing the information on cloud-optimized GeoTIFFs summarized below, as well as a map and code snippet.
The CDI project Theo Barnhart is working on this year involves generating a set of continuous basin characteristics for all of contiguous U.S., resulting in many very large GeoTIFFs. The need arose for a solution with the following characteristics: geospatial format, easy to generate, good compression, stand-alone, and avoiding maintaining a server to access the data. Cogeo.org and Rasterio were identified through a trial & error process of working through examples using Jupyter Notebooks.
Drew Ignizio is working on an approach for handling large files from the ScienceBase side. What is a Cloud-Optimized GeoTIFF (COG) and why is it useful? In a previous approach, a user can download a 240 gig file from where it is stored in an S3 bucket. After downloading, the user can then work with data locally. With COG, users can avoid downloading data, instead just accessing the file in place. COG enables users to publish to a public S3 bucket and connect to the COG through a Jupyter Notebook. They can also be read directly from a viewer.
While mapping out ground failure for a project in Alaska, an issue was identified with the diversity and variety of data inputs. The inputs to models can differ in many ways. They can be:
How can we structure diverse datasets in a way that enables robust, calculable integration and evaluation? Rapstine proposed using multi-scale, hierarchical data structure to represent data on varying scales; representation that allows for multiple resolution grids to be put together (a quadtree). A quadtree divides regions into squares. Quadtree mesh areas (using Python package discretize) result in finer representation in the mesh areas.
Questions for the CDI community:
See the slides for more info on the wiki and reach out to trapstine@usgs.gov if you have an answer to these questions or would like further discussion.
StreamStats is a USGS website that allows users to delineate watersheds for an area of interest, built to be used by civil engineers to design highway bridges and culverts. Kolb wanted to know the answers to these questions: What's the biggest flood I can expect in a given year? How do we get information on un-gaged areas? To answer these questions, there is a need for a GIS system to calculate things quickly and efficiently.
StreamStats is built on ArcHydro, SSHydro, and Leaflet. StreamStats provides an image of your watershed and a report, with an option to download the watershed outline and table. StreamStats Training docs and webinars, as well as classes on ArcHydro are useful in learning how to harness this tool.
"At today's Metadata Reviewers meeting, I had the feeling that many of us were discovering that we need to know what these Git terms mean: main branch, fork, issue, tag."
Some places to start: