The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.
https://usgs.webex.com/ - Under the Meeting Center tabs, search for meeting name: "Community for Data Integration."
USGS/DOI Dial In Number: (703) 648-4848 (for USGS and DOI offices)
Toll Free Dial In Number: (855) 547-8255 (for other offices and telecommute locations)
Conference Code: 47919# (same for both numbers)
Webex recordings are available to CDI Members approximately 24 hours after the completion of the meeting. Please login to view the recording. If you would like to become a member of CDI, please email email@example.com.
11:10a Welcome - Kevin Gallagher - Associate Director for Core Science Systems and Tim Quinn - Office of Enterprise Information Chief [PDF]
11:15a Working Group Announcements [PDF]
11:25a Interagency Collaborative on Environmental Modeling and Monitoring (ICEMM) - Brenda Rashleigh, EPA
Abstract: The Interagency Collaborative on Environmental Modeling and Monitoring (ICEMM) is a group of six federal agencies under a Memorandum of Understanding to continue and strengthen a framework for facilitating cooperation and coordination on environmental modeling, and monitoring. ICEMM focuses on exchanging information related to multimedia environmental modeling tools and supporting scientific information for environmental risk assessments, joint efforts to improve the scientific basis for implementing multimedia environmental models, protocols for establishing linkages between disparate databases and models, and development and use of a common model-data framework.
Bio: Brenda Rashleigh is the Assistant Laboratory Director for Water in the Environmental Protection Agency’s National Health and Environmental Effects Laboratory, and serves as the Chair of ICEMM and co-Chair of the ICEMM Working Group on Ecosystem Functions and Services. Her research interests include understanding effects of multiple stressors on aquatic systems and simulating dynamics of fish metacommunities in riverine networks.
Presentation: Slides are available to CDI Members. Please login to download the slides. If you would like to become a member of CDI, please email firstname.lastname@example.org.
11:40a Facilitating Reproducibility of Scientific Findings through Access to Data, Code, and Research Objects - Victoria Stodden, University of Illinois UC
Abstract: This discussion will parse notions of reproducibility and distinguish actionable steps to enable greater transparency and verifiability in research and the scholarly record. I will present three types of reproducibility (empirical, computational, and statistical viz http://bulletin.imstat.org/2013/11/resolving-irreproducibility-in-empirical-and-computational-research/ ) and then focus on enabling the routine verification of computational findings.
Bio: Victoria is an associate professor in the School of Information Sciences at the University of Illinois at Urbana-Champaign, with affiliate appointments in the School of Law, the Department of Computer Science, the Department of Statistics, the Coordinated Science Laboratory, and the National Center for SuperComputing Applications.
Sky Bristol's Introduction: mp3
Victoria's Presentation [PDF]
Supplemental Slides [PDF]
Scientist’s Challenge – Peter Esselman (Great Lakes SC)
Tom Kalvelage from Eros mentioned that Peter should feel free to contact him to talk about work at EROS
Welcome - Tim Quinn – chief of OEI
Sylvia Burns – strategic initiatives for 2017, discussing managing data, CDI was mentioned a number of times
CDI 2017 Workshop – draft agenda available
Accommodate virtual accommodation – send note to CDI if interested in participating virtually
Brenda Rashleigh – Presentation
Viv Hutchison: the group is sharing different models created in the agencies, are there stories where the models were used by agencies that didn’t create them?
Brenda: we so see a lot of crossover, e.g. army corp of engineer models, USGS, etc.
Madison: standards for documenting code – can you describe some of the work that’s being done in that area?
Victoria: this discussion is just starting to happen
Thinking about this can be extremely daunting, only really over the last 6 months that people have been ready to think about documentation, particularly software
Discussion is only just starting, give it 12 months, then you’ll see some writing about this and documentation
Viv: what is in the discussion about transparency, involves storing large datasets. Are there discussions about storage? Solving this problem?
Victoria: cyberinfrastructure community. Intermediary datasets that are just as large. We haven’t thought about whether that’s something that’s valuable, or how to store these objects. It may be the case that the original dataset may be enough, together with scripts used. Is there a version/snapshot of dataset needed to verify the results in publication. How do we modify datasets and store info in a way that we’re not just making new copies. Think about things like versioning, compression. Came up in discussion. Agencies should instigate new programs, pilot studies, in this case storage/compression needs. No silver bullets, but there are a lot of discussions in community.
Michelle: Have there been support or people from institutions, academia, in providing that perspective
Victoria: Probably the majority of people in these workshops are from academia. There are people coming from agencies and publishing groups. A lot of this is driven by researchers in academia. Snapshot of the tools that are being made available, e.g. workflow capture tools. Vast majority created by academics, isn’t funded in deliberate way. There’s a lot of engagement from many academic fields. Coordination is important between people working on similar problems.
Madison: You were mentioning slide with different tools – will make available to the CDI community (see Supplemental slides link above).
Victoria: Will include slide
Ellen Montgomery from Woods Hole: thinking about versioning from data and code, very important, code could have changed. It would be cool if there was an independent agency for reproducibility checking
Victoria: there is a bonus slide. Grant set-asides. Money earmarked to 3rd party to do reproducibility check. E.g. this code and this data generated these results. Cement this industry – 3rd party groups, e.g., within publishing house, would start developing reproducibility tools. Would develop together with research community, would help develop standards for reproducibility. Something like 3rd party groups taking snapshots of code and data, do reproducibility check. Just an idea at this point.
Michelle: How can we at CDI get more involved – actionable next steps?
Victoria: I'm not the right person to answer this because don’t have previous experience with CDI. Working groups can be very effective. Taking some of these recommendations that came out of the Science publication, they’re very high level, looking at it and thinking how it can apply in an actionable way. That might be one way to think about moving forward.
Madison: Will be very valuable for us as we’re working on data release and software release policies.
Next meeting: will be announcing funded projects for FY2017
A WebEx Participant Report is available to CDI Members. Please login to download the report. If you would like to become a member of CDI, please email email@example.com.