The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.
Meeting Recording and Slides
These are the publicly available resources. All recordings and slides are available to CDI Members approximately 24 hours after the completion of the meeting.
Log in to view the meeting resources. If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.
During the call, you can ask and up-vote questions at slido.com, event code #CDIJUL
Agenda (in Eastern time)
11:00 am Welcome and Opening Announcements
11:15 am Collaboration Area Announcements
11:25 am Development of a flexible multi-channel spatiotemporal geophysical HDF5 data format supporting FAIR - Karl Kappler, DataCloud
11:40 am So you want to build a decision support tool? Assessing successes, pitfalls, and lessons learned for tool design and development - Amanda Stoltz
11:55 am CDI Workshop Session Report outs:
The Cloud in Action – How Centers are using Cloud Hosting Solutions for Data-intensive Workflows & Running Scientific Models - Kirstie Haynie
A fun, fast hands-on introduction to the User-centered Design Process! - Sophie Hou
USGS Shared Software Development Resources - Carl Schroedl
Integrated Modeling at the USGS – What do we need? - Leslie Hsu
USGS Cloud Hosting Solutions - Advancing 21st Century Science - Dionne Zoanni
12:30 pm Adjourn
Development of a Flexible Multi-Channel Spatiotemporal Geophysical HDF5 Data Format Supporting FAIR
A unique opportunity for USGS to collaborate with IRIS-PASSCAL (the national seismic instrument facility) has presented itself to develop a geophysical data archive format that follows FAIR principles. IRIS-PASSCAL is extending facility to include magnetotelluric (MT) instruments prescribing the need for them to archive collected MT data by extending their existing protocol. Concurrently, Congress has mandated the USGS to collect nationwide MT data (5000 stations) which will all need to be archived under FAIR principles. In collaboration with IRIS-PASSCAL, we propose to develop a generalized HDF5 format for archiving MT data which can easily be extended to other geophysical data in the future. This project will not only be beneficial to USGS and IRIS-PASSCAL but also international facilities that need a comprehensive format to store geophysical data.
So you want to build a decision support tool? Assessing successes, pitfalls, and lessons learned for tool design and development
The purpose of this study is to understand how the USGS is using decision support, learning from successes and pitfalls in order to help streamline the design and development process across all levels of USGS scientific tool creation and outreach. What should researchers consider before diving into tool design and development? Our goal is to provide a synthesis of lessons learned and best practices across the spectrum of USGS decision support efforts to a) provide guidance to future efforts and b) identify knowledge gaps and opportunities for knowledge transfer and integration.
- Karl Kappler's presentation on a flexible geophysical HDF5 data format and metadata standard ended with a call for users to begin getting involved through the Git interface, project scheduled for release in September.
- The CDI project covering all things decision support tool development wants the CDI community's opinion. Are there questions you have about designing decision support tools that you would like to see covered in the final report?
- Report outs on five CDI Workshop sessions included session goals and summaries, takeaways, and ways to follow up.
Welcome and Opening Announcements
- Questions from the community
- How can we get more help for transitioning science to CHS and HPC?
- Tim Quinn: Getting help is getting access to expertise. The easy answer is to hire more people, but that's not necessarily always the answer. In terms of CHS, we always encourage you to contact the team. Can talk about concerns with cost and planning budgets. Have to determine if your work is best for the cloud. We have a quarterly Cloud checker training so that people can understand and manage their costs in the cloud. There are also technologies available that you don't have to pay for. If you're willing to wait a little, there's options on spot pricing, which can greatly reduce expense. We've also tried to open doors for people trying to defer the cost of migrating and/or recoding their work to migrate to the cloud. Through the USGS budget office, we have the ability to use working capital fund to pay for this. However, this fund is not available for recurring cloud costs. There's an active cloud user group that has good discussions on how to manage costs. We have IT liaisons available for help.
- There seems to be a large disparity in access/support of tools for enabling science across centers (e.g., computing tools, Pangeo, etc.) How can this be reduced?
- Tim Quinn: With Tableau, we have a very active cloud Tableau user group. Pangeo stands out as having potential for doing the same thing, whether we build a user group as part of the cloud effort, or build a CDI Pangeo group. We're always working hard in the budget process to enhance our ability to aid cloud efforts.
Collaboration Area Announcements
- For more information on any of the collaboration areas, see https://my.usgs.gov/confluence/x/yhv1I
- Tech Stack
- Next event: participating ESIP Summer Meeting Jul 19-23
- Next event: August 3
- Next events: August 4 and 18, User Centered Design Process - prototyping presentation and demo
- Inland/Coastal Bathymetry
- Next event: August 4, Monthly Bathymetry Research Coordination Meeting
- Semantic Web
- Next event: August 5, Continue learning how to do semantic annotation of models by planning an interview with subject matter experts
- Metadata Reviewers
- Next event: August 2
- Working on a "How-to video" for using our site (the use of Forums) and exploring the new functionality of Microsoft apps in Teams
- Imagery Data
- Next event: date TBD, session on imagery data management workflows
- Check Teams channel for more info
- Next event: July 15, monthly meeting, review of August annual meeting agenda and presentations from FY20 Risk RFP awardees, part 1
- August 17-19, Annual Meeting
- Data Management
- Next events: August 9, USGS Data Release - releasing data through domain-specific repositories
Development of a flexible multi-channel spatiotemporal geophysical HDF5 data format supporting FAIR - Karl Kappler, DataCloud
- Magnetotellurics (MT) is a passive electromagnetic method that measures the Earth's electrical response to natural variations in the magnetic field
- Useful for imaging where fluids are in the crust
- MT data types
- Transfer functions
- 10s of MB of data
- Time series
- 100s of GB of data
- Helping to get transfer functions without having to go through the time series data method
- Data are gathered and archived by several groups; no standards
- Create a standard for the metadata, that can be integrated
- Project goals
- Develop metadata standards for MT time series data
- Develop a standard HDF5 data format
- Develop a way for people to process the time series
- mth5 - pushes out to an archive
- aurora is the time series processing aspect that generates transfer functions
- each of these repos can talk to each other
- Python ecosystem
- Git version control (issues, git actions, pull requests, etc.)
- MT metadata is basically a metadata container with validation built in; not tied to a specific format
- Two XML methods, two json methods; if someone wants to add yaml methods, they should be able to
- built-in help commands
- information can be input in one way but is ingested in a standardized way
- MT metadata: to-do
- Publish the standards (create schema files that can be validated against)
- Host on USGS GitLab
- Goal: develop a standardized container to store time series data and provide tools
- MTH5 is based on HDF5, a hierarchical data format
- data container is based on x array, like numpy arrays (cross between data frame and numpy arrays)
- lazy access
- has a container for metadata
- indexed by time
- easily searchable, indexed, etc.
- To do:
- Host on USGS GitLab
- Upload to PyPi and Conda
- Extend data readers
- Extend parallel access
- Make more efficient
- Figure out an efficient way to transfer an MTH5 file over a network
- Hope to get users involved through the Git interface
- How to make packages community driven?
- How to get the greater community (young and old) to use these packages
- Decision support is key to the USGS mission - to give useful information to specific people for specific purposes. Development of decision support tools varies across the bureau, however.
- Research objectives and methods
- Completed 25 interviews with USGS colleagues (scientists and developers)
- Distributed survey in USGS Need to Know newsletter (57 responses and 84 named tools)
- Working on an Open-File Report on results
- My Decision Support is not your Decision Support
- Different definitions for 'decision support tool'
- 16% said interactive software applications that structure a decision-making process
- 14% said any software application that supports user decision-making (a little more broad)
- 64% said any activity (software or other) that provides data or other types of information products to support user decision making (broadest)
- 5% said other
- Five principles of Creating Effective Decision Support
- Give priority to process over products
- A need to start by understanding user needs, to work iteratively, and to have specific metrics for success
- Requires leadership and funding to have buy-in for the process
- Build connections across disciplines and organizations
- Diverse, multi-disciplinary teams
- The coder/developer was brought in early as part of the design team
- Roles mentioned by participants: fundraiser, spokesperson, social scientist, usability specialist, designer, client services manager
- User engagement
- Define the audience for the tool, and engaging with them
- Having a good sense of the audience, the need the tool is fulfilling, and the reason for that need
- Need to manage user expectations - timing is important (reaching out to contacts and not following up for a year)
- "A tool is successful if people use it"
- Need an idea of what the reach should/will be
- The way the user sees the tool, and what they use it for, may not be what the tool was developed for
- Decision support tools are living things - maintenance, funding, and staff time
- We sometimes build tools that are too complex to keep running
- What resources do interviewees say they need?
- Funding models
- Staffing models
- Champions with an understanding of what developing these tools require
- Help with client services
- Questions for CDI
- Are there questions you have about designing decision support tools that you would like to see covered in the final report?
- How can we make the final product most useful to you?
Answers from audience
"decision support", "decision support tools", and "decision support systems" have important and sometimes subtle distinctions. Could discuss briefly in report?
It would be interesting to have information about the amount of time it can take, or cost associated, with developing and maintaining different DSS types.
To answer your questions: I like to see a best practices check list that would help developers as they begin development.
Would also be great to see, for effective USGS tool, what is the structure of the team working on that tool?
For some of us who will never have the funds to build such a successful tool, what would be the alternative to building a tool that we know will just fade away?
as for making this entire product useful perhaps creating a resource on the CDI website as a reference in addition to the Openfile.
Did you try to split examples into different types (eg web portals, optimization, etc)?
CDI Workshop Session Report Outs:
- The Cloud in Action – How Centers are using Cloud Hosting Solutions for Data-intensive Workflows & Running Scientific Models - Kirstie Haynie
- Session goal: share ways people are using Cloud and challenges
- The Cloud is the way to do science!
- Cloud skills are not the same as our day in day out science skills and can dramatically affect processes. Barriers exist but often are knowable and can be planned for.
- There is a lot of diversity in the subject matter of these projects. It is encouraging that similar resources can be applied to a wide variety of scientific applications. Sharing our knowledge is necessary to succeed in any project!
- A fun, fast hands-on introduction to the User-centered Design Process! - Sophie Hou
- Session goal: find a way to share key content from the Usability workshop with CDI
- Talked about what a usability advocate is, and what they do
- Had a quick activity to try usability techniques for each stage of the user-centered design process
- Being a usability advocate is something we can all do, no matter our role/expertise
- Important to iterate and grow - create opportunities to learn more about our users
- USGS Shared Software Development Resources - Carl Schroedl
- Session goal: identify common problems and solutions
- There are gaps in shared software development resources. Gaps produce duplicate labor, hosting, and licensing costs
- Community has defined and prioritized new enterprise software development resources
- If you're interested in starting and sustaining enterprise software development resources, sign up (Sign up to follow-up on Shared Software Development Resources (USGS only): https://doimspp.sharepoint.com/:x:/s/usgs-GS-CDI-Workshop/EQPnYEUrR39NifXq0T-HRc0BRjs3Qqx89MYaHn6EuIDoFw?e=DVJkI0) or contact Carl
- Integrated Modeling at the USGS – What do we need? - Leslie Hsu
- Session goal: help USGS model catalog team understand examples of integrated modeling
- Wide diversity of scientific modeling (and modelers) - need to develop a shared language so that modelers can communicate
- More discussion of coupling methods and shared challenges
- Organized efforts of integrated science teams will be needed for integrated modeling
- To join the Model Catalog effort, sign up at https://listserv.usgs.gov/mailman/listinfo/cdi-models or contact project coordinator, Leslie Hsu, email@example.com.
- USGS Cloud Hosting Solutions - Advancing 21st Century Science - Dionne Zoanni
- Session goal: providing users the opportunity to learn more about CHS and services
- CHS is developing an ecosystem of services to support 21st Century Science in the Cloud
- The goal of CHS's new and upcoming services is to extract the intricacies of Cloud away from users, so they can spend more time focusing on their research and less time on developing new technologies
- Whether you're new to Cloud or more experienced, CHS is here to help and can work with you to figure out what service(s) fit your needs
- Encourage you to get involved with the CHS user communities
When a standard is created, how is it communicated to users?
- Karl Kappler: We're hoping to get a workshop going next year in order to reach out to people. It is a major challenge. We have a lot of users on different continents using different instruments. Not sure how we will get the community to embrace the standard. Creating examples, doing outreach, etc. seems the best start. If institutionalized for one or two of the big players, or USGS, used the standard, the user base would grow.
Where should people go for more info on the project and MTH5? Is there a link we can share?
- Karl Kappler: Scheduled for release in September.
Are the time series processing methods applicable to other types of time series data, or specific to MT data?
- Karl Kappler: We went to great pains to make the time series processing agnostic to magnetotelluric data. Standard statistic regression, which is generic. Spectral processing needs some tweaking, but is mostly agnostic to the specific data case. As the project evolves, some things will get MT specific, but the base can be pushed to other analyses.
Assuming that the 5 in HDF5 is version 5, how frequently is there an update of the format, is that a concern?
Wow - are you planning to have some sort of checklist, with resources, for these lessons?
- Nicole Herman-Mercer: We could if it is of interest. Planning on outlining best practices/principles in the report.
- Amanda Cravens: Think a checklist might be possible, but would be questions to ask. There's a lot of variability across the bureau as to what kinds of tools we're developing, etc.
Where does web analytics fit into the usability element?
- Amanda Cravens: web analytics relates, but one thing I would flag - the difference between what we are able to do with web analytics, and what someone is able to do with targeted, sophisticated web analytics is different. Government is different from industry
Did you get examples of "great" (well-used) decision support tools, that could be shown as models?
- Amanda Stoltz: In this report, we're going to showcase a couple of the tools that have taken these principles to heart. For each principle, we will have an example of a USGS tool that achieved the principle.
It would be interesting to have information about the amount of time it can take, or cost associated, with developing and maintaining different DSS types.
- Amanda Cravens: We can try for more specific information in the report. Some of the tools that made the biggest impact (multiple years, multiple millions spent) - good to set expectations of what it took to build these success stories.
- Research to Operations: How early in the basic research development process should the design of decision support tools be considered? Ideas about the evolution of decision support tools from initial prototypes used in early phase research towards more developed tools to be used for decision making would be helpful in your report.
- Great suggestion.
Additional sli.do comments for decision support tool topic:
- as for making this entire product useful perhaps creating a resource on the CDI website as a reference in addition to the Openfile.
- Thanks for this suggestion. We will think about how to bring the findings and principles to some kind of "living resource" on the CDI website as well as the OFR.
- decision support, "decision support tools", and "decision support systems" have important and sometimes subtle distinctions. Could discuss briefly in report?
- Absolutely! All these terms (and more) are used throughout USGS. In the report we discuss how these definitions vary across USGS employees and mission areas.
- Did you try to split examples into different types (eg web portals, optimization, etc)?
- The USGS decision support tools we learned about during this research included web portals, interactive models, infographics and more. The report covers a broad array of decision support tools, and while the report will include several examples of successful tools, it does not split the tools by type.
- For some of us who will never have the funds to build such a successful tool, what would be the alternative to building a tool that we know will just fade away?
- Yes, the resources are a real consideration. One thing our data suggests is doing informal (e.g., through conversations) or formal stakeholder engagement (e.g., with social science data collection and analysis methods) upfront to make sure the tool addresses a need and that there aren't existing tools that already meet the need. It might be that with compelling evidence of a need or gap, you could find or apply for additional resources. A second suggestion is to take a phased approach. Maybe you don't have funds to successfully build the full "bells and whistles" version but you could build something that meets a significant portion of user needs. A third suggestion might be to identify partners (possibly agency, NGO, or even private) who want to partner to build the tool part while USGS provides the data or even to leave the tool creation to others. This is the approach that has been used in some Landsat data cases (Landsat User Case Studies – Dive into Details (usgs.gov)) and by the State of Colorado developing decision support for water (Colorado’s Decision Support Systems – Open Water Foundation)
- To answer your questions: I like to see a best practices check list that would help developers as they begin development.
- Thank you for this suggestion. We will be including questions to consider before building decision support design tools, but it will not be developer focused and is designed to help the entire decision support tool design team. The need for more resources for developers has been identified and will be included in a section of the report detailing needed resources.
- Would also be great to see, for effective USGS tool, what is the structure of the team working on that tool?
- Thank you for this idea! We will be including a section on the importance of interdisciplinary teams in the report but have found that the structure of these teams varied a lot from tool to tool.