Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

CDI Monthly Meeting - 20181010

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.


Meeting Recording

Meeting recordings are available to CDI Members approximately 24 hours after the completion of the meeting. Please log in to view the recording. If you would like to become a member of CDI, email cdi@usgs.gov.

During the call, you can ask and up-vote questions at slido.com, event code #K263.

Agenda (in Eastern time)

11:00a Welcome and Announcements - Leslie Hsu, CDI Coordinator, and Kevin Gallagher, Associate Director for Core Science Systems, CDI_20181010_OpeningSlides.pdf

11:15a Collaboration Area Announcements

11:20a  Building a SpatioTemporal Feature Registry (preview), Sky Bristol, USGS, Scientist's Challenge Post

11:25a  Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data, Benjamin Mirus, USGS

11:55a  A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus, Matt Davis, Jeff Falgout, Drew Ignizio, USGS

12:30p  Adjourn

Abstracts

Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data 

Understanding spatial patterns is fundamental to Earth sciences and risk assessments, but spatial data are often collected at local scales, in disparate formats, and within specific jurisdictional boundaries. We encountered this issue when compiling a national-scale inventory of landslide occurrence across the U.S. into a searchable, web-based map for use by the public, researchers, and emergency planners. One critical challenge was determining which attribute fields were significant enough to be included at a national scale and also how to establish a fair and balanced schema for evaluating landslide confidence and inventory completeness across very different landslide inventories. Another challenge was determining a sustainable approach for ingesting data and maintaining the repository for long term access to current landslide occurrence information. In this presentation I will outline our process and present some of the successful solutions, as well as some of the more ad-hoc fixes.  

Bio: Ben Mirus started his USGS career in 2005 as an intern in Menlo Park, CA, while he was a graduate student at Stanford. After earning his PhD in hydrogeology, Ben continued on as a postdoc with the USGS in the Unsaturated Zone Flow Project. He then served as an assistant professor in the Department of Geological Sciences at UNC Chapel Hill, but was excited to return to the USGS in 2015 as a research geologist for the Landslide Hazards Program in Golden, CO. Ben's research focuses primarily on hillslope hydrology and rainfall-triggered landslides using field monitoring and numerical modeling. 


A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus

Effective management of data has become an increasingly difficult problem to solve. The amount and diversity of science data within the USGS is ever increasing, while at the same time we look towards cloud storage, improved storage infrastructure, and data center consolidation to help reduce storage and operational costs. BlackPearl and Globus can be integrated into existing data management workflows and storage resources in order to further reduce costs, and to help fulfill the data storage strategy of the USGS.

 Bio: Matt Davis is an HPC Systems Engineer for the U.S. Geological Survey's Advanced Research Computing group. He helps run the Yeti supercomputer, a research computing resource available to all USGS scientists.


Presentations

Presentation: Slides are available to CDI Members. Please log in to download the slides. If you would like to become a member of CDI, email cdi@usgs.gov.

Highlights

  1. Don’t forget that we are here to help you find solutions - email us questions, suggested topics, and desired trainings at cdi@usgs.gov.

  2. We’re starting to highlight monthly readings related to CDI topics - you should contribute!

    1. September 2018 Reading

    2. October 2018 Reading

  3. Kevin Gallagher previewed the themes of the FY19 Request for Proposals:

    1. Biosurveillance of emerging invasive species and health threats

    2. Building National Datasets

    3. Reusing previous CDI funded outputs

    4. FAIR (Findable, Accessible, Interoperable, Reusable) Data

  4. Sky Bristol was “interviewed” about the SpatioTemporal Feature Registry and shared a video link and notebook explaining the concept further.

  5. A poll run during the call indicated that 43% of respondents “work with place data that they need to analyze in a repeatable workflow or generate reports on”, 34% do not, and 23% maybe do.

  6. Ben Mirus presented some lessons learned from his FY18 CDI Funded Project in “Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data.” Some topics to explore further with CDI are figuring out what other types of disciplinary data have this type of incomplete and disparate data (for example, species occurrence), and what is the theory about quantitatively analyzing incomplete and disparate data.

  7. Matt Davis presented on USGS exploration of A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus, and let us know that YES, these options for storing and managing large data are available to USGS researchers now (in beta). To get started, contact hpc@usgs.gov and tell the Advanced Research Computing team about your data needs.

Attendees

A Participant Report is available to CDI Members. Please log in to download the report. If you would like to become a member of CDI, email cdi@usgs.gov.