I attended the "Advancing FAIR and Go FAIR in the U.S." workshop in February; the workshop covered topics on how to establish and promote FAIR culture and capabilities within a community. Many of the discussions were synergistic with the CDI activities, so I wanted to share some key points from the workshop with the CDI community. - Sophie Hou
(Logo from the Go FAIR Initiative)
Title: Advancing FAIR and Go FAIR in the U.S.
Date: February 24th to 27th, 2020
Location: Atlanta, Georgia
CDI's February meeting featured a discussion on the value of CDI to you, and a deep dive into Pangeo.
Rich Signell, a Research Oceanographer at the Coastal and Marine Science Center in Woods Hole and member of the Pangeo Steering Council, presented an overview of Pangeo and examples of uses for Pangeo for several different types of USGS workflows. The Pangeo framework is deployed by Cloud Hosting Solutions (CHS) and funded by EarthMAP as a new form of cloud-based model data analysis. Community-driven, flexible, and collaborative, Pangeo is slowly building out a set of tools with a common philosophy. In one example, Rich used a Pangeo Jupyter Notebook to process a dataset in one minute that had previously taken two weeks. Cloud costs, skills, cloud-optimized data, and Pangeo development are issues that are currently being addressed.
Renee Pieschke, a Technical Specialist for the Technical Services Support Contract at the Earth Resources Observation and Science Center in Sioux Falls, SD, continued our Pangeo focus with some information on Landsat in the cloud. Renee and her team is looking to a spring release of collection two data, which will exponentially increase the amount of data available. Level 2 processing will be required for the collection two data (trying to get close to what it would be like if you were looking at the ground; taking out disturbances, clouds, etc).
The Landsat Look upgrade uses a cloud-native infrastructure and a cloud-optimized GeoTIFF format. It uses new SpatioTemporal Asset Catalog metadata to programmatically access the data. The new Landsat Look can filter pixels with a QA Band so that any clouds, shadows, snow, ice, or water is removed to produce the best possible image.
The SpatioTemporal Asset Catalog was developed to help standardize metadata across the entire geospatial data provider community, using a simple JSON structure. It normalizes common names, simplifies the development of third-party applications, and helps enable querying in Pangeo. Another in-progress goal is connecting with Landsat data in the cloud. Getting this Landsat data into the cloud involves converting the data to a cloud-optimized GeoTIFF format and this kind of data is already fueling the backend of Landsat Look.
A poll was administered on sli.do to participants to see what the value of CDI is to them. Some responses are below.
"I like to hear about (and share) the cool work folks are doing throughout the USGS! The Communities are valuable because they allow folks to share innovative research and discuss ways we can do so while following Department, Bureau, Mission Area policy."
"CDI provides relevant, useful, and timely data management related issues, projects, and tools."
"I learn about new technology applications and learn of colleagues I might collaborate with."
"The CDI helps me to get my work done in my daily job! I find the people who are part of the CDI are amazing to interact with - they are engaged, enthusiastic, and interested in making things better at USGS. CDI has made me feel like I am more in touch with the USGS - there is so much going on in this Bureau, and CDI keeps me informed and makes me feel like I am part of something bigger than just my daily job."
"Demonstrate that best practices in data sci/software/etc. is important to colleagues."
"Diverse community, wide range of experience and expertise."
More information, including notes, links, slides and video recordings on the meeting, are available here.
January's monthly meeting covered how to evaluate web applications and better understand how they are working for users, and explored well-established strategies for USGS crowdsourcing, citizen science, and prize competition projects.
Nicole Herman-Mercer, a social scientist in the Decision Support Branch of the Water Resources Mission Area's Integrated Information Dissemination Division, presented on how to evaluate web applications based on use, value, impact, and reach, as defined below.
Definition: take, hold, view, and/or deploy the data/application as a means of accomplishing or achieving something.
Herman-Mercer used Google Analytics to answer some of these questions. Google Analytics provided information such as total daily visits, visits through time, what pages users are visiting and how they're getting there (links from another website, search, or direct visits), how often they're visiting, how many repeat visits occur, and how long users spend on individual pages.
Definition: The importance, worth, and/or usefulness of the application to the user(s)
To estimate the value of selected applications to users, an electronic survey was sent to internal water enterprise staff, which asked respondents to indicate which applications they used for work, and then to answer a series of questions about those applications. Questions attempted to pinpoint how important applications were to users, and how affected their work would be should the application be decommissioned.
Definition: The effect the application has on science, policy, or emergency management
Publish or Perish software for text mining was used to get at some of these data points. Publish or Perish searches a variety of sources (Google Scholar, Scopus, Web of Science, etc.) and returns any citations that applications are getting. Attempts to search for policy document citations has proven more difficult, and was not factored into this evaluation as a result.
Definition: How broadly the application reaches across the country and into society
Google Analytics was again used to gather visits by state, which was then compared with the state population to get an idea of use. These analytics could also identify which networks users are on, i.e., .usgs, .gov, or .edu. Finally, an expert survey was deployed, surveying users who developed the application or currently manage it to get a sense of who the experts think the intended and actual audience is.
Contact Nicole at firstname.lastname@example.org for a detailed report on the full evaluation.
Herman-Mercer's team was inspired by Landsat Imagery Use Case studies.
Sophia Liu, an Innovation Specialist at the USGS Science and Decisions Center in Reston, VA, as well as the USGS Crowdsourcing and Citizen Science Coordinator and Co-Chair of the Federal Community of Practice for Crowdsourcing and Citizen Science, presented an overview of well-established USGS crowdsourcing, citizen science, and prize competition projects.
Citizen science, crowdsourcing, and competitions are all considered by Liu to be types of open innovation. Definitions of these terms are as follows:
A popular example of citizen science/crowdsourcing is citizen seismology or public reports of earthquakes, like Did You Feel It?
Liu has documented about 44 USGS crowdsourcing and citizen science projects, and 19 USGS prize competitions. Some examples of open innovation projects and information sources are listed here:
Participants during the presentation were asked to use the following Mentimeter poll to answer short questions and provide feedback on the talk.
Sophia is looking for representatives from across all USGS mission areas, regions, and science support offices interested in giving feedback on the guidance, catalog, toolkit, and policies she is developing for the USGS Open Innovation Strategy. Feedback can be provided by joining the USGS Open Innovation Strategy Teams Site or emailing her at email@example.com.
See the recording and slides at the meeting page.
You'll probably want to join a new collaboration area after reading all of this exciting news. You can do that by following the instructions on this wiki page. You can also get to all CDI Collaboration Area wiki pages here. We recently added quick link buttons to meeting content to most collaboration area wiki pages.
The group had a conversation about the need for persistent unique identifiers in USGS metadata records that could be used across different government systems, including usgs.gov and data.gov. Lisa Zolly presented some slides frame the conversation, and take-aways are on the Metadata Reviewers Meetings Page.
February's DevOps meeting was like a Valentine's Day-themed love letter to the partnership between Dev and Ops. (Yes, that is a subjective opinion.) Andy Stauffer (Dev) and Robert Djurasaj (DevOps) combined to present on "Automating the Deployment of a National Geospatial Map Production Platform Using DevOps Workflows." A dedicated DevOps team was critical for scaling up workflow and infrastructure for the Map Production On Demand (POD) system. See the recording at the DevOps Meeting page.
Slide from the DevOps February meeting presentation.
The Data Management working group meeting used an interactive virtual format to have small group discussion about developing data management value propositions. Science Gateways Community Institute superstars Claire Stirm and Juliana Casavan presented the essence of value propositions - A clear understanding of the unique value your project delivers to your users or stakeholders. Virtual breakout groups held discussions and came up with many answers to "Why is CDI important for data managers?" Claire and Juliana's tips on value propositions included being succinct and developing different value propositions for different audiences. See the slides and the value propositions at the meeting notes page!
A general formula for developing a value proposition statement.
Mike Johnson from UC Santa Barbara presented on the Urban Flooding Open Knowledge Network. This is an exciting stakeholder-driven knowledge network project with emphasis on prototyping interfaces and web resources. See other joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.
Sophie Hou presented on "Choosing Usability Techniques." The process starts with establishing the context: what are you trying to learn and why? What are the gaps? What information do we need to learn and why? Next you should decide on the types of data needed: Attitudinal vs. Behavioral vs. Qualitative vs. Quantitative. Thank you to Sophie for continuing to build our knowledge about how to improve usability in our projects! See the slides and the notes on the Usability meeting page.
Slide from the Usability group's February presentation showing the differences between different types of usability data.
The Risk CoP hosted guest Scott Miles from Impact360 to kick off a special training series. Scott's presentation was loaded with information!
From the Risk Meetings page: This was the kickoff meeting for a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. We met the Impact360 team and were introduced to the foundations and terminology of Impact360's toolkit, Toolkit360. Toolkit360 is a rigorous, intentional process to collectively amplify researcher and practitioner superpowers to integrate knowledge and action, unlike doing it “the way we’ve always done it” or working in silos. Toolkit360 fuses processes and methods from human-centered design, frame innovation,and systems theory (Panarchy). The Toolkit360 process uses 12 tools to bridge the problem and solution spaces with situation assessment, stakeholder alignment, problem framing, and prototyping. Future training webinars during the Risk COP's March and April monthly meetings will take deeper dives into these tools.
Slide from the Risk Community of Practice's February meeting.
Sheree Watson presented on Using Open Innovation to Engage Youth and Underserved Communities in the Pacific Islands. Sophia B. Liu followed with a discussion of why Open Innovation matters and how to participate in the USGS Open Innovation Strategy. See more ways to get involved in USGS Open Innovation at the Open Innovation wiki site!
Peter Claggett and Labeeb Ahmed presented on Mapping Channels & Floodplains: Hyper-Res Hydrography and FACET. FACET stands for Floodplain and Channel Evaluation Tool and more can be found at https://code.usgs.gov/water/facet.
Matt Baker (University of Maryland, Baltimore County) presented on 'Hyper'-resolution Geomorphic Hydrography: Methods, advantages, and shifting paradigm.
I always try to make myself look like I've been around for a long time with references to "do you remember when?" and I will do that now. I remember when the first LIDAR images started showing up at conferences and presenters would first show a standard (at the time) 30-meter DEM and then flip to the next slide that was LIDAR 1-meter resolution, and the whole audience would gasp in astonishment. Then this was followed by people saying "Well, none of the old tools work on LIDAR data, we have to build a whole bunch of new tools to analyze this higher resolution data." And that is what I thought of when watching these presentations. We've come a long way!
Access the recording at the Geomorphology Tools meeting page.
Slide from the Hyper-Resolution Geomorphic Hydrography presentation.
Jeremy Fee presented on Swagger and Micronaut. Jim Kreft showed the example of the Water Quality Data Portal at https://www.waterqualitydata.us/.
Learn more at:
See the recording at the Software Dev meeting page!
Jim Kreft demonstrated some of the details of the National Water Quality Data Portal.
Thanks to our collaboration area group leads, who organized topics and speakers ! Lightsom, Frances L. , Masaki, Derek , Hughes, David R. , Langseth, Madison Lee , Hutchison, Vivian B. , Blodgett, David L. , Signell, Richard P. , Unknown User (firstname.lastname@example.org) , Ludwig, Kristin A. , Emily Brooks , Ramsey, David W. , Liu, Sophia , Ladino, Cassandra C. , Guy, Michelle , Newson, Jeremy K.