Meeting summaries and links in reverse chronological order.
Natalya Rapstine will give an overview of new USGS Tallgrass supercomputer designed to support machine learning and deep learning workflows at scale and deep learning software and tools for data science workflows.
Natalya Rapstine is a computational scientist in the Science Analytics and Synthesis (SAS), Advanced Research Computing (ARC) group of the Core Science Systems. She has a bachelor degree in Earth Science from Rice University and a MS in Statistics from Colorado School of Mines, and she has been with the USGS since 2016. Her expertise is in high performance computing, machine and deep learning applications for advancement of science at U.S. Geological Survey.
Inseok Heo is a data scientist in Envision Engineering AWS.
Inseok received a PhD from the department of Electrical Engineering in the University of Wisconsin Madison in 2015. He specializes in speech and audio signal processing and machine learning. In his career, he developed and worked on single/multi channel noise reduction, beamforming, and Alexa wakeword recognition/detection for Amazon Echo device.
Amogh Gaikwad is a Solutions Architect, specializing in AI/ML, for AWS Federal customers and is part of the specialist team for Analytics. Prior to his role at AWS, Amogh has worked as a software developer, developing enterprise applications. Through his role at AWS his has created ML solutions to help federal customers migrate their AI/ML workloads to AWS.
Amogh has received his Master’s Degree in Computer Science specializing in Big Data Analytics and Machine Learning from George Mason University
Phillip Dawson is a geophysicist with the U.S. Geological Survey’s Volcano Science Center, focusing on theoretical and experimental investigations of active volcanism and volcanic processes. He currently works on the Seismology of Magmatic Injection project at the California Volcano Observatory, Menlo Park, California. This project is dedicated to understanding the underlying physics driving volcanic seismicity and processes through the use of detailed field experiments and the application, modification, and extension of existing seismic methods and theories.
Michael Furlong, NASA-Ames Intelligent Robotics Group
Jack Eggleston, USGS WMA Hydrologic Remote Sensing Branch
John Stock, USGS Innovation Center
Abstract: The availability of high-resolution satellite imagery, combined with machine learning analysis to rapidly process the satellite imagery, provides the USGS with a new capability to map natural resources at the national scale. The new capability is made possible by technology progress in these areas:
1 - Daily national imagery at <1 to 5 m pixel size from commercial providers
2 - High-performance computing (USGS high-performance computing or Cloud)
3 - Artificial intelligence and machine learning (AI-ML) tools to automatically process the imagery
USGS is working to build enterprise capability in each of these 3 areas and has a growing focus on development of AI-ML tools. In this presentation, two USGS projects that rely on collaborations with external partners to develop AI/ML tools to map water extent will be discussed. In one of these projects USGS is collaborating with the NASA-Ames Intelligent Robotics Group to use its Deep Earth Learning Training, and Analysis (DELTA) software. The DELTA software will be presented including description of its early implementation on the USGS TallGrass supercomputing system.
Log in to access this month's recording and slides.
Abstract: This presentation provides an overview of how we use a recurrent autoencoder neural network to encode sequential California golden eagle telemetry data. The encoding is followed by an unsupervised clustering technique, Deep Embedded Clustering (DEC), to iteratively cluster the data into a chosen number of behavior classes. We apply the method to simulated movement data sets and telemetry data for a Golden Eagle. The DEC achieves better unsupervised clustering accuracy scores for the simulated datasets as compared to the baseline K-means clustering result.
Speaker Bio: Natalya Rapstine is a Computer Scientist at Advanced ResearchComputing group, specializing in computational data science, statistics, andmachine learning applications for advancement of science at the U.S. GeologicalSurvey. She received a M.S. in Statistics from Colorado School of Mines.
Abstract: Satellite observations provide invaluable data across different spatio-temporal scales. These data enable us to build models for applications such as land cover classification, agricultural monitoring, surface water mapping, biodiversity monitoring, among others. Meanwhile, machine learning (ML) techniques can be utilized to advance these applications, and develop faster, more efficient and scalable models. These techniques learn from training datasets that are generated from image annotation or ground reference observations. However, to develop accurate ML-based models, and be able to validate their accuracy, we need to use benchmark training datasets that are representative of the diversity of the target variable, and openly accessible to all researchers and developers.
To address this requirement, Radiant Earth Foundation has established Radiant MLHub to foster sharing of geospatial training data for different thematic applications. Radiant MLHub is hosted on the cloud and users will be able to search for different training datasets, and quickly ingest them into their pipelines using an API. To increase interoperability of training datasets generated by different institutions, Radiant MLHub has adopted the SpatioTemporal Asset Catalog (STAC) as the standard for data cataloging.
In this presentation, I will review the architecture of Radiant MLHub, its API access and the STAC definition for training data. Next, I will present two applications on using ML models for LC classification from multi-spectral data and surface water detection from Synthetic Aperture (SAR) data.
Speaker bio: Hamed Alemohammad is the Chief Data Scientist at Radiant Earth Foundation, leading the development of Radiant MLHub as an opensource cloud native commons for Machine Learning applications using Earth Observations. He has extensive expertise in machine learning, remote sensing and imagery techniques particularly in developing new algorithms for multi-spectral satellite and airborne based observations. He also serves as an elected member of the American Geophysical Union’s technical committee on remote sensing. Prior to joining Radiant Earth, he was a Research Scientist at Columbia University. Hamed received his PhD in Civil and Environmental Engineering from MIT in 2014.
Abstract: Landscape classification is the task of using imagery to map defined features on the landscape. As computer technology and data science methodology advances, new techniques for this problem emerge. Modern machine learning (ML) utilizing neural networks (NN) – is becoming an industry-standard data science approach for a variety of applications. In particular, procedures of analysis for the task of computer vision (CV) are particularly adept and well-understood) at the task of computer vision (CV).
However, current landscape classification necessarily exposes trade-offs between accuracy, spatial granularity, and resources required. CV offers a unique combination of speed and accuracy, while still producing feature mappings rather than simple pixel classifications. Compared to other explicit feature extractors, such as object-based image analysis (OBIA), CV can be a cost-effective a powerful methodology for obtaining features from difficult-to-classify image domains.
This application shows how an off-the-shelf Deep Neural Network (DNN) algorithm – Inception v2 – was retrained into a production classifier and applied to the problem of locating and sizing cannabis production on private lands in Trinity County. This application demonstrates the strengths and limitations for applying this method at the landscape scale.
The presentation concludes with ‘next steps’ and identifies developing technologies and architectures that mitigate some of the limitations in the current application.
Speaker bio: Daryl Van Dyke serves as the spatial analyst for the USFWS, in Science Applications and Strategic Habitat Conservation. I have a interdisciplinary background, with a focus on community and environment as well as second BS and MS in Environmental Engineering. My thesis focused on using two-dimensional hydrodynamics for fish passage culvert retrofit design. As a federal servant, and a programmer, I've looked at the developing technologies of LiDAR, Structure-from-Motion, and ML as pivotal to the task of landscape analysis and conservation design. Non-technical interests in federal service include integrating analytic workflows, encouraging cross-program collaboration, and building accountability and reproducible science in resource management.
Abstract: The expense and logistics of monitoring streamflow (e.g. stage and discharge) and nearshore waves (e.g. height and period) using in situ instrumentation such as current meters, bubblers, pressure transducers, etc, limits the extent to which such important basic information can be acquired. Machine learning might offer a solution, if such information can be obtained remotely from time-lapse imagery using inexpensive consumable camera installations. To that end, I describe a proof-of-concept study into designing and implementing a single deep learning framework that can be used for both stream gaging and wave gauging from appropriate time-series of imagery. I show that it is possible to train the framework to estimate 1) stage and/or discharge from oblique imagery of streams at USGS gaging stations, using existing time-lapse camera infrastructure; and 2) nearshore wave height and period from oblique and rectified imagery from USGS Argus systems. This proof-of-concept technique is based on deep convolutional neural networks (CNNs), which are deep learning models for regression tasks based on automated image feature extraction. The stream/wave gauge model framework consists of an existing generic CNN model to extract features from imagery - called a ‘base model', with additional layers to distill the feature information into lower dimensional spaces, prevent overfitting, and a final layer of dense neurons to predict continuously varying quantities. Given enough training data, the model can generalize well to a site despite variation in, for example, lighting, weather, snow cover, vegetation, and any transient objects in the scene. This development might offer the potential to train models for imagery at sites based on short deployments of in situ instrumentation, especially useful for sites where instrumentation is difficult or expensive to maintain for long periods. This entirely data-driven technique, at least for now, must be trained separately for each site and quantity, so would be suitable for very long-term, site-specific estimation of wave or hydraulic parameters from stationary camera installations, subsequent to a training period. Further development might promote low-cost (or even hobbyist) hydrodynamic and hydraulic monitoring anywhere.
Jeff Falgout presented on Tallgrass, a new machine for AI at the USGS
Ken Bagstad presented on AI for integrated environmental modeling & forecasting (+ overview of AI for Ecosystem Services)
JC pointed out some activity on the AI/ML forum and encouraged members to post
Group leads reminded members to contribute to a spreadsheet for collecting USGS AI/ML project descriptions to communicate to USGS leadership
John Stock talked about opportunities at the USGS Innovation Center related to AI/ML, including postdoctoral positions
Pete Doucette gave a presentation “Ruminations on AI and Land Imaging,” covering some background to artificial intelligence and machine learning, relevant Landsat and Analysis Ready Data activities at the USGS, and the importance of team science
See February CDI collaboration area blog post that summarizes the call
Group leads asked members to contribute to a spreadsheet for collecting USGS AI/ML project descriptions to communicate to USGS leadership
Introduction to the group by Tim Quinn, Chief, Office of Enterprise Information
Introduction to the wiki space and AI/ML forum, JC Nelson
Comments from attendees, including mention of dl_tools toolbox for deep learning, an output from a recent CDI funded project