September 9, 2020: Structured data on the web for earth science and CDI Pop-Up Lab
The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.
The Microsoft Teams call information is emailed to the CDI mailing list. If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.
Agenda (in Eastern time)
11:00 am Welcome and Opening Announcements, CDI Request for Proposals - Leslie Hsu and Tim Quinn
11:10 am Collaboration Area Announcements
11:20 am Structured data on the web for earth science: emerging best practices and remaining challenges - Dave Blodgett, USGS
11:50 am CDI Pop-Up Lab
Meeting the CDI Community - Mining Expertise Keywords - Sky Bristol, USGS
Highlights from the South East Region Science Meeting - Greg Steyer, USGS
Machine learning, satellite imagery, and tile drains - Tanja Williamson, USGS
12:30 pm Adjourn
Structured data on the web for earth science: emerging best practices and remaining challenges
TL;DR; This month, Dave Blodgett will provide an update on progress from two OGC Interoperability Experiments, topics of interest from the CDI/ESIP IT&I webinar series, and the Water Mission Area National Hydrologic Geospatial Fabric project.
The landscape of spatiotemporal information we observe and predict exists as a spectrum between continuous fields to discrete objects. With this as context, the evolving technology upon which we base our information systems presents tremendous opportunities to understand and provide insights about our data. Simultaneously, the evolving technical basis presents challenges to the long term stability of our IT systems and the reproducibility of our science. This presentation will explore recent developments in concepts, architectures, and technologies that help characterize the real world, define a stable and cohesive architecture, and seek to increase the impact of earth science data in society. This presentation will present some specific activities on these theme with a focus on hydrologic science and concepts as an integrator of environmental systems and data with an eye toward EarthMap and an even broader Integrated Hydro-terrestrial Modeling community.
Dave Blodgett is an EDGE Civil Engineer in the Water Resources Mission Area Geospatial Intelligence Branch. He takes part in numerous community groups including the OGC Architecture Board, the ESIP Program Committee, and the Unidata Strategic Advisory Committee. His active research and development includes interoperable modular workflow tools development for hydrology, application of consistent conceptual models to diverse technical challenges, and advancing adoption of structured data for environmental feature and observations data.
- CDI's Request for Proposals is now open.
- CDI's Request for Proposals is now open. Visit the FY21 RFP page for more.
- Projects build USGS capabilities in data integration and management. CDI projects are short term projects that leverage existing resources, demonstrate scalable solutions, improve access to data and tools, develop best practices, share lessons, and more.
- Encourages interdisciplinary collaboration and wide communication.
- Fire science and coastal resilience are the themes for the FY21 RFPs. Proposals from all areas are accepted.
- Collaboration Areas
- Data Management working group
- Next event: Monday, September 14
- Tech Stack
- Next event: Thursday, September 10
- Software Development Cluster
- Next event: Thursday, September 24
- Next event: Wednesday, September 16 (resource review)
- Next event: Thursday, September 17
- October 15 - FY21 RFP Information Session
- Open Innovation
- Next event: September webinar TBD
- Next event: Tuesday, October 13
- Structured data on the web for earth science: emerging best practices and remaining challenges - Dave Blodgett, USGS
- Observation and feature
- Observation - act of measuring or otherwise determining the value of a property
- Feature - an abstraction of a real-world phenomenon
- Second Environment Linked Features Interoperability Experiment
- Use case: as a web users, I want to find all the information available for an environmental feature, so I can find what I'm looking for and retrieve it
- Four functions to satisfy linked environmental feature data
- Landing page/content
- Structured data
- Resource model vs content model
- Resource is identified by URL
- Content model is more like a data model or information schema. Information about the kind of information you have in a document.
- See graph for more.
- Tech Stack themes
- Science Data gateways
- Pangeo, HydroShare, and more
- Formalization of bringing processing and content management together. Provides pathway for scientists to work in these environments as a community.
- Linked data and knowledge networks
- Messy data found from a myriad of sources, relies on data authors/content creators for curation
- Knowledge networks - more centralized, can query them rapidly, curated or less messy
- Discrete global grids & dataset indexing
- ex: integrating datasets, allowing better probability calculations, etc
- Google dataset search
- National Hydrologic Geospatial Fabric Project
- Elements include hydroinformatics, reference fabric, enterprise systems, tools development, river corridors, landscape characteristics, and hydrogeology
- Research to operations and data to knowledge
- Future work
- Linked data, knowledge networks, and the google index "space"
- Leveraging the google index using linked data and knowledge networks
- Persistent identification of places with multiple representations
- Negotiation of multiple content profiles from the same resource
- A minimal set of representative domain ontologies for knowledge management
- Please review EDR!
- CDI Pop-Up Lab
- Meeting the CDI Community - Mining Expertise Keywords - Sky Bristol, USGS
Created a python package pylincmd to make corporate master data to be linked
Wrote some code to scrape staff profile inventories into a data structure to look at individual pages
Wrote as web scraper
Work examining what is out there
- Also looking at additional vocabulary sources
- Notebook records comments and future directions
- Highlights from the South East Region Science Meeting - Greg Steyer, USGS
- Questionnaire on capacities that exist within USGS
- Lightning talks pointed to various examples of how we can integrate across system components for large datasets.
- Ex of technology examined: cloud-based frameworks for camera-based streamflow measurements
- Additional slides available on this page soon.
- SER Science Workshop: https://doimspp.sharepoint.com/sites/gs-ser-fy20scienceworkshop
- Machine learning, satellite imagery, and tile drains - Tanja Williamson, USGS
- Tile drains are present in every state in the U.S. - moves water out of fields so that people can farm larger areas. This water then moves into the stream network. Some of these tile drains carry sediments and pesticides into the stream system.
- Data gap inferred in flooding and water quality models.
- Need is to identify and catalog tile drains visible to satellite images. Can do this now, but the goal is to scale up.
- Focusing on linear features. Drilling down to be more specific. Can see the sub-service features in the satellite images.
- Still getting a little bit of background noise (farming structures usually). Want to focus on the tile drains instead of the background
Questions & Answers
- Dave Blodgett
- What is a good next step for the CDI community to look at future work problems or learn about these things in more detail?
- A group engaging as a group in review and evaluation activities. To rally around these concepts and represent them with more force than we could individually.
for Dave - one thing we've discussed is data being findable. is there an effort to go thru sciencebase to address this?
I've not engaged with the ScienceBase team on the structured data front specifically. Just glancing at the Google "rich results" for a sciencebase item -- they are pretty good.
- Sky Bristol
how did you search expertise terms? In the entire page? or a specific part of page?
under the section specifically for keywords, on the left of the page
I'd recommend joining the Configuration Management Committee for activities related to our business information systems.
- Greg Steyer
- What is the follow-up for this workshop?
- Outcomes from presentations and discussions are synthesized on the page, post-workshop summary folder. Trying to gather the needs from the community for applying solutions.