CDI's May monthly meeting included updates on CDI projects focusing on FAIR data, a grassland productivity forecast, and animal movement visualization.
For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki.
Fran Lightsom presented on the process of building a roadmap for making USGS data FAIR. FAIR stands for Findable, Accessible, Interoperable and Reusable and has become a popular way for organizations to improve the value and usefulness of data products.
To begin building a roadmap for FAIR data, the project team conducted a survey of data producers, collected use cases of projects that integrate data, hosted a workshop on September 9th-11th, 2019, and drafted a report & list of recommendations. The workshop produced about 100 discrete recommendations, with 14 being deemed essential, 38 important, and 44 useful.
Some broad thoughts that came out of the workshop included the assertion that open science requires extension of FAIR beyond data to samples, methods, software, and tools; a less-explored application of FAIR. Implementing recommendations would be the responsibility of many groups, and would require input from representatives of these groups. There may be a place for CDI to step in and coordinate in the future, as this effort continues.
Further objectives coming out of this effort include increasing use of globally unique persistent identifiers (especially with physical samples and software), developing policy, researching best practices, creating support tools, enabling creation of digital products that are interoperable and usable by making use of existing standards, and improving interoperability through coordinated creation of shared vocab and ontology.
An opportunity for CDI to view and provide feedback for the FAIR roadmap is upcoming.
Grass-Cast is a CDI-funded project that is focused on producing near-term forecasts of grassland productivity for the U.S. southwest. The goal of the project is to bring together different kinds of data in order to provide upcoming growing season forecasts, updated very 2 weeks. This work started in the Great Plains to provide information about seasonal outlooks to ranchers.
So, why are grasslands important? Grasslands provide a critical amount of ecosystem services. They are one of the largest single providers of agro-ecological services in the U.S., and they supply important habitat and food provision for wildlife. Productivity of grasslands helps to determine fire routines and how much carbon is coming from the atmosphere into the grass and soil. Dust reduction and problems associated with air quality can also be thought about from a grassland productivity perspective.
Near-term productivity forecasts for grasslands can provide information to stakeholders on cattle stocking rates, where and how to allocate resources towards fire management, and rates of carbon sequestration. Grasslands are notably responsive to subtle changes in the environment and climate, and thus, they vary from year to year, making productivity predictions difficult.
The diagram above outlines the process that informs Grass-Cast for the Great Plains, but the project team wants to expand to include the Southwest region. The Southwest region differs from the Great Plains in that it does not have the same homogeneous coverage of grasses, meaning that bare ground is often exposed, complicating the interpretation of remotely sensed data. The Southwest also has a more varied mix of vegetation types, including cacti and shrubs, which needs to be differentiated from grass cover.
The Grass-Cast team aimed to take the same overarching process used in the Great Plains Grass-Cast, but adjust the methods to effectively use Grass-Cast in the Southwest. First, the team looked at different satellite indices for estimating grassland productivity in the hopes they might better address the challenges of the Southwest. They found that the previously utilized NDVI (normalized difference vegetation index) greenness index did work well in a lot of places in the Southwest, but not as well in others. These results supported the idea to try newer remote sensing platforms that don't rely on a greenness index, such as SIF (solar induced fluorescence). SIF is a different way of looking at plant activity that uses plant physiology to monitor how electrons are moving trough the photosynthetic chain. The Southwest is different from the Great Plains in that the dry environment means that you can have plants that are green but not very active, making the relationship between greenness and productivity more challenging. Additionally, many Southwestern grasslands have two growing seasons - spring and summer, representing a temporal challenge. Other remote sensing methods examined here were NIRv (near-infrared reflectance of vegetation), a greenness index that hones in specifically on green parts of remotely sensed pixels in images, and SATVI (Soil-Adjusted Total Vegetation Index), which takes into account soil brightness.
The team compared results from these different indices using eddy covariance data, and found that neither SIF or NDVI provided good results. However, NIRv and SATVI did a good job of predicting grassland productivity for the Southwest, and there is some promise in SIF as a proxy for capturing the timing of the growing season.
Grass-Cast now plans to incorporate data for the Southwest (Arizona and New Mexico) into the current tool. Ultimately, the team wants to integrate across these different methods and go beyond Arizona and New Mexico. There is a lot of room for collaboration; stay tuned for upcoming workshops and seminars.
GrassCast is available here.
Tracking and tagging data on individual animals provides key information about movements, habitat use, interactions and population dynamics, and there is a lot of this type of data currently available. For example, the Movebank database currently has 2 billion observations. Tracking data is expensive and requires time and effort to collect; TAME (tagged animal movement explorer) aims to help maximize the value of this data and make it easier to interact with these complex data.
TAME is a data exploration tool in the form of a web application, based on open source libraries. The TAME team's goal is to make TAME as easy to use as possible, and to allow for interaction and exploration of tagging data. Currently, TAME features include:
Ben Letcher (email@example.com) is excited to explore a podcast or video series centered on animal movement stories – please reach out to him if you have experience in this area!
Highlight images from the May 2020 Collaboration Area topics, from left to right: The User Experience Honeycomb (source) (Usability), interfacing with hydrologic data with Hydroshare (Tech Stack), machine learning Train and Tune steps covered by SageMaker (AIML).
CDI collaboration areas bring us focused information and tools to help us work with our data. Do you have an idea for a topic that you want to learn about or present to a group? Get in touch with us to coordinate! - Leslie, firstname.lastname@example.org
Amazon Web Services personnel and USGS scientists presented on SageMaker and an example of its use at the USGS Volcano Science Center. SageMaker provides the ability to build, train, and deploy machine learning models quickly. Phil Dawson of USGS showed an application to the continuous seismic data that is collected at all USGS volcanic observatories, and how to apply the models even though "every volcano speaks a different dialect" (the seismic energy looks different).
The recording is posted at the meeting wiki page.
Chris Bartlett presented on how records management is moving more aggressively to electronic records management, and it is a ripple of changes. She discussed what this means in relation to our records including data, our processes, and expectations.
Slides and recording are posted at the meeting wiki page.
The Fire Science Community of Practice heard the monthly fire update, discussion about fire science communications, and a science presentation from Ellis Margolis on Scaling up tree-ring fire history: from trees to the continent and seasons to centuries.
Contact Paul Steblein or Rachel Loehman for more information. Future meeting dates are listed on the Fire Science wiki page.
The group discussed the question "What type information (in the metadata) is necessary for a data publication vs research publication?" In addition, links were shared about an ongoing discussion on metadata for software and code.
See more notes on the discussion at their Meetings wiki page.
The Risk community of practice hosted a panel discussion on communicating hazard and risk science. The speakers were Sara McBride (USGS), Kerry Milch (Temple University), and Nanciann Regalado (Dept of Interior, US Fish and Wildlife Service). Each speaker shared news on some of their recent projects and lessons learned on the job. Projects discussed included ShakeAlert and aftershock forecasts, the USGS circular "Communicating Hazards – A Social Science Review to Meet U.S. Geological Survey Needs", and the Deepwater Horizon Oil Spill Natural Resource Damage Assessment Trustee Council.
See more at the Risk community of practice wiki page.
Brian Wee presented on an experiment to use concept maps for documenting science-informed, data-driven workflows for climate-related adaptation, mitigation, and response planning. The ESIP wiki page on the concept map repository describes how concept maps can be used to describe your own data-to-decisions narrative, as a just-in-time (i.e. as needed) educational resource, to provide context awareness about where you fit in the big picture, and to experiment with ideas for context-aware knowledge discovery.
See a link to the slides and recording at the Semantic Web meetings page.
May's topic was data warehousing and ETL (Extract, Transform, Load) pipelines. Cassandra Ladino presented on the use of Amazon Web Services (AWS) Redshift Data Warehouse as applied to the USGS Configuration Management Committee. Jeremy Newson presented on ETL pipelines using AWS Glue.
See more at the Software Dev wiki meetings page.
The joint CDI Tech Stack and ESIP IT&I Tech Dive hosted a presentation on CUAHSI HydroShare by Jerad Bales, Anthony Castronova, and Jeff Horsburgh. HydroShare is a platform for sharing hydrologic resources (data, models, model instances, geographic coverages, etc.), enabling the scientific community to more easily and freely share products, including the data, models, and workflow scripts used to create scientific publications.
Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.
A resource review was posted on the topic of how usability and interface influence user experience, including credibility and use. "The resource highlights that user interface and credibility influence user experience because design elements can impact whether users trust and believe what is being presented or delivered to them."
See more of the group's activity and resources on the Usability wiki page.