Skip to end of metadata
Go to start of metadata

Peter Esselman from the Great Lakes Science Center presents on Frontiers in collection and delivery of Lakes ecosystem data.

You can reach Peter at


  1. Suggest you check out  Rolling Deck to Repository- R2R (  This is the way UNOLS handles shipboard data acquisition, processing, QA/QC and dissemination.

    Also QARTOD ( has done a lot of work on real-time QA/QC of oceanographic type data. It has been wrapped into NOAA/IOOS, and I believe the PORTS program ( uses a lot of the concepts. 

    Hope some of this is helpful- not really within USGS, but these are becoming "industry standard" ways of handling real-time data.  Feel free to contact me if you have questions.

    1. Hi all, 

      Just a note that the March 2017 DataOne webinar happens to be about R2R. DataOne also posts webinar recordings if you can't make the date.

      From Rolling Deck to Repository (R2R): Lessons Learned in Managing Data for the US Research Fleet

      Tuesday, March 14, 2017
      Time:  9 am Pacific / 10 am Mountain / 11am Central / 12 noon Eastern

      R2R is the NSF-supported repository for environmental sensor data routinely acquired by U.S. oceanographic research vessels. The research fleet supports hundreds of expeditions around the world each year, ranging from oceans to coasts and estuaries to the Great Lakes. R2R works with an extensive network of partner repositories to link original field data from both sensors and samples, post-field products, global syntheses, and journal articles.

      Bob Arko is a research engineer at Lamont-Doherty Earth Observatory in Palisades, New York. He is Co-PI and Technical Director of the R2R program.

  2. Unknown User (

    Hi Peter, I missed most of your presentation unfortunately.  Thanks for posting your slides. Are you familiar with the recent paper by Soranno and others about the Lagos database?  It describes an agency and academic collaboration to build a cohesive inventory of lake data. Really neat work presented in their paper.



  3. Thanks, Shad!

    I went to look up the paper, and here's a link for others to check out:

    Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

    Soranno, P.A. et al., 2015, GigaScience

    ...Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.


  4. I wasn't able to be on the last CDI meeting to hear this challenge presented, but I offer the following toward your second question on the role of Core Science Systems. The document that most thoroughly describes the CSS role as a Mission Area in USGS is the strategy document published a few years ago. In that paper, we described a number of things that should support the type of system that integrates in situ and remote sensing data toward "improved landscape science." Part of how we do that is through development and implementation of standards and standard approaches.

    One example that helps get at question 3 here can be found in the Common Framework for Earth Observation Data (attached for convenience).


    This is something a number of us worked on as part of the US Group on Earth Observations, Data Management Working Group. It lays out a set of acceptable standards and approaches to the management and distribution of earth observation data of all kinds from satellite data to the results of surveys and ground-based monitoring networks. When it comes to how individual data streams are packaged and served, standards from the Open Geospatial Consortium and others can help make things more interoperable. These approaches have been employed by many different groups across USGS and the other earth system science organizations who contributed to the framework.

    At a more detailed level and since the slides seem to indicate an interest in biological data, you might also check out this recent paper describing a new approach to integrating biological and environmental variables using the Darwin Core standard. It describes an approach that focuses on the sampling/observation event (potentially hierarchical) and then the collection of observations and measurements made within that event. There are a number of approaches along these lines that all derive from the ISO Observations & Measurements standard and one of the more robust implementations from the OGC.

    As with anything in data technologies, there are any number of different ways to get at a particular solution. There is no single answer or simple plug-and-play solution to the problem this presentation poses. A lot depends on the capability, capacity, and propensities of the implementers. However, there are a number of basic principles and design patterns that can help those pursuits be successful in a shorter rather than longer period of time. The Common Framework document tried to point out some of those at a high level and put things in a venue designed to help guide investment priorities at Federal agencies.

    Depending on what connections Peter or others have to university groups interested in working on this kind of thing, there is a current funding opportunity open from the NOAA IOOS Program that does cover the Great Lakes and could provide a way to pursue the vision depicted in the slides. USGS can't receive money directly on that FFO, but if you have university collaborators that would be interested, we might be able to get some interesting work done. Proposals can be up to $800K/year for up to three years but are due March 20, so you'd have to hurry.

  5. Here are a few more comments that were compiled from a few different people for your question  "What roles does CSS (Core Science Systems Mission Area) have in facilitating delivery of such data?"

    • CDI is facilitated under CSS, and is trying to offer a forum and other opportunities on how to solve the data dilemma posed here (e.g. working groups, Annual RFP for working on issues such as these).
    • CSS has tools (Online Metadata Editor, Metadata Wizard, DOI Tool, etc.) and a team (ScienceBase Data Release) available to help with how to organize that (with the assistance of the researchers involved), and at least a couple tools (Science Base and Science Data Catalog) potentially to help distribute their data... so CSS has at least some of the technology involved to help with this as one possible access point (potentially among many)
    • CSS is one group in USGS, along with OEI, OSQI, BAOs, that may have input in facilitating data delivery here.

    Let me know if you have further follow-up questions on anything in this response!