The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.
Meeting recordings are available to CDI Members approximately 24 hours after the completion of the meeting. Please log in to view the recording. If you would like to become a member of CDI, email email@example.com.
During the call, you can ask and up-vote questions at slido.com, event code #K263.
11:00a Welcome and Announcements - Leslie Hsu, CDI Coordinator, and Kevin Gallagher, Associate Director for Core Science Systems, CDI_20181010_OpeningSlides.pdf
11:15a Collaboration Area Announcements
11:20a Building a SpatioTemporal Feature Registry (preview), Sky Bristol, USGS, Scientist's Challenge Post
11:25a Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data, Benjamin Mirus, USGS
11:55a A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus, Matt Davis, Jeff Falgout, Drew Ignizio, USGS
Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data
Understanding spatial patterns is fundamental to Earth sciences and risk assessments, but spatial data are often collected at local scales, in disparate formats, and within specific jurisdictional boundaries. We encountered this issue when compiling a national-scale inventory of landslide occurrence across the U.S. into a searchable, web-based map for use by the public, researchers, and emergency planners. One critical challenge was determining which attribute fields were significant enough to be included at a national scale and also how to establish a fair and balanced schema for evaluating landslide confidence and inventory completeness across very different landslide inventories. Another challenge was determining a sustainable approach for ingesting data and maintaining the repository for long term access to current landslide occurrence information. In this presentation I will outline our process and present some of the successful solutions, as well as some of the more ad-hoc fixes.
Bio: Ben Mirus started his USGS career in 2005 as an intern in Menlo Park, CA, while he was a graduate student at Stanford. After earning his PhD in hydrogeology, Ben continued on as a postdoc with the USGS in the Unsaturated Zone Flow Project. He then served as an assistant professor in the Department of Geological Sciences at UNC Chapel Hill, but was excited to return to the USGS in 2015 as a research geologist for the Landslide Hazards Program in Golden, CO. Ben's research focuses primarily on hillslope hydrology and rainfall-triggered landslides using field monitoring and numerical modeling.
A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus
Effective management of data has become an increasingly difficult problem to solve. The amount and diversity of science data within the USGS is ever increasing, while at the same time we look towards cloud storage, improved storage infrastructure, and data center consolidation to help reduce storage and operational costs. BlackPearl and Globus can be integrated into existing data management workflows and storage resources in order to further reduce costs, and to help fulfill the data storage strategy of the USGS.
Bio: Matt Davis is an HPC Systems Engineer for the U.S. Geological Survey's Advanced Research Computing group. He helps run the Yeti supercomputer, a research computing resource available to all USGS scientists.
Presentation: Slides are available to CDI Members. Please log in to download the slides. If you would like to become a member of CDI, email firstname.lastname@example.org.
Don’t forget that we are here to help you find solutions - email us questions, suggested topics, and desired trainings at email@example.com.
We’re starting to highlight monthly readings related to CDI topics - you should contribute!
Kevin Gallagher previewed the themes of the FY19 Request for Proposals:
Biosurveillance of emerging invasive species and health threats
Building National Datasets
Reusing previous CDI funded outputs
FAIR (Findable, Accessible, Interoperable, Reusable) Data
Sky Bristol was “interviewed” about the SpatioTemporal Feature Registry and shared a video link and notebook explaining the concept further.
A poll run during the call indicated that 43% of respondents “work with place data that they need to analyze in a repeatable workflow or generate reports on”, 34% do not, and 23% maybe do.
Ben Mirus presented some lessons learned from his FY18 CDI Funded Project in “Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data.” Some topics to explore further with CDI are figuring out what other types of disciplinary data have this type of incomplete and disparate data (for example, species occurrence), and what is the theory about quantitatively analyzing incomplete and disparate data.
Matt Davis presented on USGS exploration of A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus, and let us know that YES, these options for storing and managing large data are available to USGS researchers now (in beta). To get started, contact firstname.lastname@example.org and tell the Advanced Research Computing team about your data needs.
A Participant Report is available to CDI Members. Please log in to download the report. If you would like to become a member of CDI, email email@example.com.