Confluence Retirement

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is scheduled for retirement on January 27th, 2023. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

For August's CDI Monthly Meeting, we heard a presentation on integrating short-term climate forecast into a restoration management support tool, and had our first session of the CDI Pop-Up Lab. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

News & Announcements 

Look out for Machine Learning Mondays: a weekly course on image analysis using machine learning. Dan Buscombe will be offering this course covering image recognition, object recognition, image segmentation, and semi-supervised image classification. The course is targeted at USGS employees and contractors in the fields of satellite aerial imaging, image analysis, geospatial analysis, machine learning software development. The course is only available to those with a USGS email, at no charge. Experience in Python and the command line interface are recommended as a pre-requisite.  

Integrating short-term climate forecast into a restoration management support tool - Caitlin Andrews, USGS 


Alternate text: slide summarizing the goals of a short-term soil moisture forecaster and resulting example heat maps of the U.S. 

The goal of this FY19 project is to create a link between data and how it can be used in a management context. Climate forecasts are typically spatially or temporally coarse data, while managers need more temporally fine and site-specific data. For example, the success of seeding and planting rely on short-term monthly and seasonal climate that occurs immediately after seeding and planting. There is a 90% failure rate for seeding and planting in the western U.S. 

The project facilitates the link between climate data/climate knowledge and management need by creating a short term moisture forecaster application. In the western U.S., water is a limiting factor and drought is a natural part of ecosystem, and expected to be exacerbated further in the coming years. For managers, seeding/planting and drought are connected, and managers are in need of more information on climate forecast for after they seed or plant. Climate knowledge for this use case generates probabilities on whether conditions will be hotter or colder and dryer or wetter. This is coarse information that needs translation so managers can use it. 

The SOILWAT2 model is essentially a translation tool, wherein the user provides the model info on a specific site (climate, vegetation, soil), and the model output will provided probabilities on where water moves on a daily basis and measurements of soil moisture at different depths. The National Weather Service provides one prediction for each of 102 regions for a time period, but this multi-month forecast data is very coarse. 

 The application team is currently developing code to synthesize short term climate predictions to a finer temporal and spatial scale in order to derive a better soil moisture model. 

Spatially and temporally refining this data was a challenge. A Jupyter Notebook that details the steps the project team took is available to USGS employees: 

A quick summary of the process: 

  1. Gather a historical record of site-specific data from GridMET (1980-yesterday) 
  2. Generate samples of what the future will look like (30 future realizations) 
  3. Apply future realization to the years in the historical record. This is how future anomalies are integrated with historical patterns. 
  4. Produces 900 climate futures 

This process produces an example output that is explained in detail in the meeting recording (log in to Confluence to access). The application will be integrated into the Land Treatment Exploration Tool (LTET), a Bureau of Land Management and USGS collaboration intended for managers planning restoration projects. 

CDI Pop-up Lab: Q&A with the CDI community 

Alternate text: slide showing the information on cloud-optimized GeoTIFFs summarized below, as well as a map and code snippet. 

Cloud optimized files and new transfer options  - Theo Barnhart and Drew Ignizio 

The CDI project Theo Barnhart is working on this year involves generating a set of continuous basin characteristics for all of contiguous U.S., resulting in many very large GeoTIFFs. The need arose for a solution with the following characteristics: geospatial format, easy to generate, good compression, stand-alone, and avoiding maintaining a server to access the data. and Rasterio were identified through a trial & error process of working through examples using Jupyter Notebooks. 

Drew Ignizio is working on an approach for handling large files from the ScienceBase side. What is a Cloud-Optimized GeoTIFF (COG) and why is it useful? In a previous approach, a user can download a 240 gig file from where it is stored in an S3 bucket. After downloading, the user can then work with data locally. With COG, users can avoid downloading data, instead just accessing the file in place. COG enables users to publish to a public S3 bucket and connect to the COG through a Jupyter Notebook. They can also be read directly from a viewer. 

Irregular meshes for data operations - quadtrees  - Thomas Rapstine 

While mapping out ground failure for a project in Alaska, an issue was identified with the diversity and variety of data inputs. The inputs to models can differ in many ways. They can be: 

  • Grids, points, polygons, lines and more 
  • Categorical, physical, or temporal 
  • With their own notion of uncertainty, or not 
  • Pulled from a global or local raster 

How can we structure diverse datasets in a way that enables robust, calculable integration and evaluation? Rapstine proposed using multi-scale, hierarchical data structure to represent data on varying scales; representation that allows for multiple resolution grids to be put together (a quadtree). A quadtree divides regions into squares. Quadtree mesh areas (using Python package discretize) result in finer representation in the mesh areas. 

Questions for the CDI community: 

  1. How are others solving these data integration issues? 
  2. Any other solution recommendations other than quadtrees? 
  3. Thoughts on using quadtrees for solving these challenges? 
  4. Are you using quadtrees? What packages would you recommend? 

See the slides for more info on the wiki and reach out to if you have an answer to these questions or would like further discussion. 

Streamstats - Kitty Kolb 

StreamStats is a USGS website that allows users to  delineate watersheds for an area of interest, built to be used by civil engineers to design highway bridges and culverts. Kolb wanted to know the answers to these questions: What's the biggest flood I can expect in a given year? How do we get information on un-gaged areas? To answer these questions, there is a need for a GIS system to calculate things quickly and efficiently. 

StreamStats is built on ArcHydro, SSHydro, and Leaflet. StreamStats provides an image of your watershed and a report, with an option to download the watershed outline and table. StreamStats Training docs and webinars, as well as classes on ArcHydro are useful in learning how to harness this tool. 

Speaking Git 

"At today's Metadata Reviewers meeting, I had the feeling that many of us were discovering that we need to know what these Git terms mean: main branch, fork, issue, tag." 

Some places to start: 

18F: How do I speak Git(hub) 

Git(hub) Glossary 

GS-Software Microsoft Team 

USGS Software Management Website 

All CDI Blog Posts