Confluence Retirement

In an effort to consolidate USGS hosted Wikis, the myUSGS Confluence service is targeted for retirement on January 28, 2022. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.

Blog from September, 2020

For August's CDI Monthly Meeting, we heard a presentation on integrating short-term climate forecast into a restoration management support tool, and had our first session of the CDI Pop-Up Lab. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

News & Announcements 

Look out for Machine Learning Mondays: a weekly course on image analysis using machine learning. Dan Buscombe will be offering this course covering image recognition, object recognition, image segmentation, and semi-supervised image classification. The course is targeted at USGS employees and contractors in the fields of satellite aerial imaging, image analysis, geospatial analysis, machine learning software development. The course is only available to those with a USGS email, at no charge. Experience in Python and the command line interface are recommended as a pre-requisite.  

Integrating short-term climate forecast into a restoration management support tool - Caitlin Andrews, USGS 

 

Alternate text: slide summarizing the goals of a short-term soil moisture forecaster and resulting example heat maps of the U.S. 

The goal of this FY19 project is to create a link between data and how it can be used in a management context. Climate forecasts are typically spatially or temporally coarse data, while managers need more temporally fine and site-specific data. For example, the success of seeding and planting rely on short-term monthly and seasonal climate that occurs immediately after seeding and planting. There is a 90% failure rate for seeding and planting in the western U.S. 

The project facilitates the link between climate data/climate knowledge and management need by creating a short term moisture forecaster application. In the western U.S., water is a limiting factor and drought is a natural part of ecosystem, and expected to be exacerbated further in the coming years. For managers, seeding/planting and drought are connected, and managers are in need of more information on climate forecast for after they seed or plant. Climate knowledge for this use case generates probabilities on whether conditions will be hotter or colder and dryer or wetter. This is coarse information that needs translation so managers can use it. 

The SOILWAT2 model is essentially a translation tool, wherein the user provides the model info on a specific site (climate, vegetation, soil), and the model output will provided probabilities on where water moves on a daily basis and measurements of soil moisture at different depths. The National Weather Service provides one prediction for each of 102 regions for a time period, but this multi-month forecast data is very coarse. 

 The application team is currently developing code to synthesize short term climate predictions to a finer temporal and spatial scale in order to derive a better soil moisture model. 

Spatially and temporally refining this data was a challenge. A Jupyter Notebook that details the steps the project team took is available to USGS employees: https://code.chs.usgs.gov/candrews/shorttermdroughtforecaster 

A quick summary of the process: 

  1. Gather a historical record of site-specific data from GridMET (1980-yesterday) 
  2. Generate samples of what the future will look like (30 future realizations) 
  3. Apply future realization to the years in the historical record. This is how future anomalies are integrated with historical patterns. 
  4. Produces 900 climate futures 

This process produces an example output that is explained in detail in the meeting recording (log in to Confluence to access). The application will be integrated into the Land Treatment Exploration Tool (LTET), a Bureau of Land Management and USGS collaboration intended for managers planning restoration projects. 

CDI Pop-up Lab: Q&A with the CDI community 

Alternate text: slide showing the information on cloud-optimized GeoTIFFs summarized below, as well as a map and code snippet. 

Cloud optimized files and new transfer options  - Theo Barnhart and Drew Ignizio 

The CDI project Theo Barnhart is working on this year involves generating a set of continuous basin characteristics for all of contiguous U.S., resulting in many very large GeoTIFFs. The need arose for a solution with the following characteristics: geospatial format, easy to generate, good compression, stand-alone, and avoiding maintaining a server to access the data. Cogeo.org and Rasterio were identified through a trial & error process of working through examples using Jupyter Notebooks. 

Drew Ignizio is working on an approach for handling large files from the ScienceBase side. What is a Cloud-Optimized GeoTIFF (COG) and why is it useful? In a previous approach, a user can download a 240 gig file from where it is stored in an S3 bucket. After downloading, the user can then work with data locally. With COG, users can avoid downloading data, instead just accessing the file in place. COG enables users to publish to a public S3 bucket and connect to the COG through a Jupyter Notebook. They can also be read directly from a viewer. 

Irregular meshes for data operations - quadtrees  - Thomas Rapstine 

While mapping out ground failure for a project in Alaska, an issue was identified with the diversity and variety of data inputs. The inputs to models can differ in many ways. They can be: 

  • Grids, points, polygons, lines and more 
  • Categorical, physical, or temporal 
  • With their own notion of uncertainty, or not 
  • Pulled from a global or local raster 

How can we structure diverse datasets in a way that enables robust, calculable integration and evaluation? Rapstine proposed using multi-scale, hierarchical data structure to represent data on varying scales; representation that allows for multiple resolution grids to be put together (a quadtree). A quadtree divides regions into squares. Quadtree mesh areas (using Python package discretize) result in finer representation in the mesh areas. 

Questions for the CDI community: 

  1. How are others solving these data integration issues? 
  2. Any other solution recommendations other than quadtrees? 
  3. Thoughts on using quadtrees for solving these challenges? 
  4. Are you using quadtrees? What packages would you recommend? 

See the slides for more info on the wiki and reach out to trapstine@usgs.gov if you have an answer to these questions or would like further discussion. 

Streamstats - Kitty Kolb 

StreamStats is a USGS website that allows users to  delineate watersheds for an area of interest, built to be used by civil engineers to design highway bridges and culverts. Kolb wanted to know the answers to these questions: What's the biggest flood I can expect in a given year? How do we get information on un-gaged areas? To answer these questions, there is a need for a GIS system to calculate things quickly and efficiently. 

StreamStats is built on ArcHydro, SSHydro, and Leaflet. StreamStats provides an image of your watershed and a report, with an option to download the watershed outline and table. StreamStats Training docs and webinars, as well as classes on ArcHydro are useful in learning how to harness this tool. 

Speaking Git 

"At today's Metadata Reviewers meeting, I had the feeling that many of us were discovering that we need to know what these Git terms mean: main branch, fork, issue, tag." 

Some places to start: 

18F: How do I speak Git(hub) 

Git(hub) Glossary 

GS-Software Microsoft Team 

USGS Software Management Website 

--  
All CDI Blog Posts  

CDI collaboration areas bring us focused information and tools to help us work with our data. See all collaboration areas and how to join. 


Slides from Dan Beckman's presentation to the Software Development Cluster, where he discussed the creation of synthetic data for training artificial intelligence algorithms.

Data Management, 8/10 - Department of Interior Records Management Repository and Data Exit Story Time on "the data they left behind"

Lynda Speck and Jim Nagode from the U.S. Bureau of Reclamation presented on their records and document management cloud solution, eERDMS. Tara Bell, Robin Tillitt, and Sue Kemp shared experiences on "Departing Scientists and the Data They Left Behind." Recording and other resources at the wiki meeting page.

DevOps, 8/4 - EPA Data Management and Analytics Platform DevOps

Dave Smith from the Environmental Protection Agency presented on "EPA Data Management and Analytics Platform DevOps." Included in the discussion was - How to get to DevSecOps? (How to add Security to Development and Operations.) "Security as usual breaks DevOps automation." Recording and slides available on the DevOps meeting page.

Fire Science, 8/18 - Department of Interior Wildland Fire Information & Technology Strategy

Roshelle Pederson from the Dept of Interior Office of Wildland Fire presented on the Wildland Fire Information & Technology Strategy. The discussion included the role of USGS research and successful paths to integrate research information, data, and tools in fire management information systems. Join the Fire Science mailing list here.

Metadata Reviewers CoP, 8/3 - Metadata for public release of legacy data

Tara Bell, Matt Arsenault, and Sofia Dabrowski led a discussion on metadata for public release of legacy data for which full documentation is not available.

Risk, 8/11-8/13 - Annual Risk Meeting

The Risk Community of Practice held their Annual Risk Meeting virtually, from August 11-13. The meeting agenda included a keynote on "An evaluation of the risk of SARS-CoV2 transmission from humans to bats" by Mike Runge, a session with the EarthMAP project management team, presentations from FY19 Risk Proposal Awardees, a risk analysis panel discussion, virtual networking, and sessions on engaging diverse stakeholders and tools for virtual stakeholder meetings. To join the Risk Research and Applications Community of Practice, visit https://listserv.usgs.gov/mailman/listinfo/cdi-risk.

Usability, 8/19 - Human-Centered Approach and Usability

Jamie Albrecht from Impact360 Alliance presented on Inclusive Problem-Solving to Reduce Natural Hazard Impacts & Disaster Risk. Inclusive problem-solving is Impact360’s process to bring together natural hazard researchers and practitioners to solve wicked problems. Several contributing foundational frameworks on the topics of mutual gains, joint fact finding, systems thinking, design thinking, social innovation, and equity-centered community design were introduced for consideration. Notes, slides, and recording are accessible on the meeting page.

Software Dev, 8/27 - Synthetic data and build process for AI imagery and deep learning methods

Dan Beckman presented on "Synthetic data and build process for AI imagery and deep learning methods." He described a solution for the challenge of not having enough training data, using synthetic stand-in data to make the volume of data needed. Dan referenced some code he used from Adam Kelly and here is a related medium post. Read the post to follow up on the statement "I’ve found, from both researching and experimenting, that one of the biggest challenges facing AI researchers today is the lack of correctly annotated data to train their algorithms." Software Development Cluster wiki page.

-- 
All CDI Blog Posts 

CDI collaboration areas bring us focused information and tools to help us work with our data. See all collaboration areas and how to join. 


Screenshots from the USGS COVID-19 Case Finder and Viz Palette - two resources discussed at the July Data Viz call.

Artificial Intelligence/Machine Learning,  7/14 - Gage Cam - computer vision for water surface elevation

Daniel Beckman presented on Gage Cam, a low cost, custom built wireless web camera paired with a custom deep learning algorithm that allows for a computer vision method to measure water surface elevation (stage). Daniel's slides also cover a list of additional topics include U-Nets, synthetic data, algorithms for text, suggested books on deep learning, and more!

Slides and recording at the AI/ML Meeting Notes page.

Data Management, 7/13 - Collections management Informational Memo and Center-level collection management plans

Lindsay Powers presented on a new Collections Management Instructional Memo (IM CSS 2019-01) and associated website, released last August, providing policy and guidance for the management of scientific working collections.

Brian Buczkowski, from the Woods Hole Coastal and Marine Science Center, presented on Center-level collection management plans, which can help ensure that these samples and specimens continue to have value as assets to the public and scientific community.

Slides and recording can be found on the meeting notes page.

Data Visualization, 7/2 - Kickoff meeting COVID-19 Case Finder

Chuck Hansen from the California Water Science Center presented on the COVID-19 Case Finder, built on Tableau. The app that allows a USGS employee planning a trip to get COVID information on their destination, with preloaded USGS facilities and gage sites. A conversation on color maps ensued, sharing tools like this one - https://projects.susielu.com/viz-palette - which enables you to import your own color schemes and see what they look like based on different types of color deficiencies.

The Data Visualization group plans to hold quarterly calls. See more at their wiki page.

Fire Science, 7/21 - Climate-fire science synthesis

As fire continued to increase in July, Paul Steblein and Rachel Loehman led the Fire Science Community of Practice call. After a Fire update from Paul, Madeleine Ruben stein from the Climate Adaptation Science Centers presented on a workplan to conduct a synthesis of Climate-Fire Science.

Join the Fire Science mailing list here.

Metadata Reviewers, 7/6 - Metadata for software and code

Eric Martinez joined the Metadata Reviewers group to chat about different types of code releases, different options for code repositories at USGS, code.json documentation, and more. He shared some links including the USGS Software Management website and the code.json schema, where controlled vocabularies can be found (search for 'enum' for enumerated lists).

See more notes on the Metadata Reviewers meeting notes page.

Model Catalog Working Group - Scientific model categorization and finding information about USGS models

A working group that is advising on the development of a new USGS Model Catalog was briefed (by email) on the sources used for populating the initial model catalog and asked about categorization of models by type and action. Project updates can be seen on this wiki page. Anyone interested in contributing to the direction of the model catalog can find out more on the working group home page, subscribe to the mailing list, and get in touch with the point of contact, which would be me, Leslie Hsu, lhsu@usgs.gov.

Risk CoP, 7/16 - Project presentations from the FY19 Risk RFP awardees (Round 2)

Four speakers gave final Risk project presentations on the topics of the global copper supply disruption from earthquakes (Kishore Jaiswal), how scientific research affects policy and earthquake preparedness (Sara McBride), the Hazard Exposure Analyst Tool (HEAT) (Jason Sherba), and ecological forecasts for risk management (Jake Weltzin and Alyssa Rosemartin).

See more at the Risk CoP meeting notes page (sign in as a CDI member to view).

Semantic Web, 7/9 - the Semantic Zoo

A group from the Semantic Web WG discussed the article "The Semantic Zoo - Smart Data Hubs, Knowledge Graphs, and Data Catalogs." This led to a discussion on the basic question of "How do we get data cleaned up so that many different places can use it?"

Usability Resource Review, 7/15 - Mobile UX Design Principles and Best Practices

Sophie Hou posted a resource review on Mobile UX Design Principles and Best Practices. The resource addresses topics like creating a seamless experience across devices, allowing for personalization, good onboarding practices, using established gestures, mobile layout design, focusing on speed, minimizing data input, and more.

See the full review and summary on the resource review wiki page.

-- 
All CDI Blog Posts