Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

CDI Monthly Meeting - 20170208

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.

WebEx:
https://usgs.webex.com/ -
Under the Meeting Center tabs, search for meeting name: "Community for Data Integration."

Audio:
USGS/DOI Dial In Number: (703) 648-4848 (for USGS and DOI offices)
Toll Free Dial In Number: (855) 547-8255 (for other offices and telecommute locations)
Conference Code: 47919# (same for both numbers)

Webex Recording

Webex recordings are available to CDI Members approximately 24 hours after the completion of the meeting. Please login to view the recording. If you would like to become a member of CDI, please email cdi@usgs.gov.

 

Agenda (in Eastern time)

11:00a Scientist's Challenge, Peter Esselman, USGS Great Lakes Science Center

11:10a Welcome - Kevin Gallagher - Associate Director for Core Science Systems and Tim Quinn - Office of Enterprise Information Chief [PDF]

11:15a Working Group Announcements [PDF]

11:25a  Interagency Collaborative on Environmental Modeling and Monitoring (ICEMM)Brenda Rashleigh, EPA

Abstract: The Interagency Collaborative on Environmental Modeling and Monitoring (ICEMM) is a group of six federal agencies under a Memorandum of Understanding  to continue and strengthen a framework for facilitating cooperation and coordination on environmental modeling, and monitoring.  ICEMM focuses on exchanging information related to multimedia environmental modeling tools and supporting scientific information for environmental risk assessments, joint efforts to improve the scientific basis for implementing multimedia environmental models, protocols for establishing linkages between disparate databases and models, and development and use of a common model-data framework.

 Bio: Brenda Rashleigh is the Assistant Laboratory Director for Water in the Environmental Protection Agency’s National Health and Environmental Effects Laboratory, and serves as the Chair of ICEMM and co-Chair of the ICEMM Working Group on Ecosystem Functions and Services.  Her research interests include understanding effects of multiple stressors on aquatic systems and simulating dynamics of fish metacommunities in riverine networks.


Presentation: Slides are available to CDI Members. Please login to download the slides. If you would like to become a member of CDI, please email cdi@usgs.gov.

 

11:40a Facilitating Reproducibility of Scientific Findings through Access to Data, Code, and Research Objects - Victoria Stodden, University of Illinois UC

Abstract: This discussion will parse notions of reproducibility and distinguish actionable steps to enable greater transparency and verifiability in research and the scholarly record. I will present three types of reproducibility (empirical, computational, and statistical viz http://bulletin.imstat.org/2013/11/resolving-irreproducibility-in-empirical-and-computational-research/ ) and then focus on enabling the routine verification of computational findings.

Bio: Victoria is an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign, with affiliate appointments in the School of Law, the Department of Computer Science, the Department of Statistics, the Coordinated Science Laboratory, and the National Center for SuperComputing Applications.

 

Sky Bristol's Introduction: mp3

Victoria's Presentation [PDF]

Supplemental Slides [PDF]

12:30p  Adjourn

Meeting Notes and Q/A

Scientist’s Challenge – Peter Esselman (Great Lakes SC)

  • Traditionally, sampling along transect lines, single points
  • Trying to transition to more advanced technologies
  • Aquatic remote sensing, near-real-time delivery of information products
  • Integrating data sources in a way that can lead to rapid delivery of science products
  • Will increase capability to collect data: spatial coverage and data density
  • Searched for models for how remote sensing data could be collected and delivered
  • Looking for guidance, examples, partnerships to move forward with vision
  • Something similar to landsat model

Tom Kalvelage from Eros mentioned that Peter should feel free to contact him to talk about work at EROS

 

Welcome - Tim Quinn – chief of OEI

Sylvia Burns – strategic initiatives for 2017, discussing managing data, CDI was mentioned a number of times

CDI 2017 Workshop – draft agenda available

Accommodate virtual accommodation – send note to CDI if interested in participating virtually

 

Brenda Rashleigh – Presentation

Website wiki, all presentations from Dec. meeting are presented there

Questions

Viv Hutchison: the group is sharing different models created in the agencies, are there stories where the models were used by agencies that didn’t create them?

Brenda: we so see a lot of crossover, e.g. army corp of engineer models, USGS, etc.

 

Victoria Stodden

Questions

Madison: standards for documenting code – can you describe some of the work that’s being done in that area?

Victoria: this discussion is just starting to happen

Thinking about this can be extremely daunting, only really over the last 6 months that people have been ready to think about documentation, particularly software

Discussion is only just starting, give it 12 months, then you’ll see some writing about this and documentation

Viv: what is in the discussion about transparency, involves storing large datasets. Are there discussions about storage? Solving this problem?

Victoria: cyberinfrastructure community. Intermediary datasets that are just as large. We haven’t thought about whether that’s something that’s valuable, or how to store these objects. It may be the case that the original dataset may be enough, together with scripts used. Is there a version/snapshot of dataset needed to verify the results in publication. How do we modify datasets and store info in a way that we’re not just making new copies. Think about things like versioning, compression. Came up in discussion. Agencies should instigate new programs, pilot studies, in this case storage/compression needs. No silver bullets, but there are a lot of discussions in community.

Michelle: Have there been support or people from institutions, academia, in providing that perspective

Victoria: Probably the majority of people in these workshops are from academia. There are people coming from agencies and publishing groups. A lot of this is driven by researchers in academia. Snapshot of the tools that are being made available, e.g. workflow capture tools. Vast majority created by academics, isn’t funded in deliberate way. There’s a lot of engagement from many academic fields. Coordination is important between people working on similar problems.

Madison: You were mentioning slide with different tools – will make available to the CDI community (see Supplemental slides link above).

Victoria: Will include slide

Ellen Montgomery from Woods Hole: thinking about versioning from data and code, very important, code could have changed. It would be cool if there was an independent agency for reproducibility checking

Victoria: there is a bonus slide. Grant set-asides. Money earmarked to 3rd party to do reproducibility check. E.g. this code and this data generated these results. Cement this industry – 3rd party groups, e.g., within publishing house, would start developing reproducibility tools. Would develop together with research community, would help develop standards for reproducibility. Something like 3rd party groups taking snapshots of code and data, do reproducibility check. Just an idea at this point.

Michelle: How can we at CDI get more involved – actionable next steps?

Victoria: I'm not the right person to answer this because don’t have previous experience with CDI.  Working groups can be very effective. Taking some of these recommendations that came out of the Science publication, they’re very high level, looking at it and thinking how it can apply in an actionable way. That might be one way to think about moving forward.

Madison: Will be very valuable for us as we’re working on data release and software release policies.

 

Next meeting: will be announcing funded projects for FY2017

Attendees

A WebEx Participant Report is available to CDI Members. Please login to download the report. If you would like to become a member of CDI, please email cdi@usgs.gov.