Skip to end of metadata
Go to start of metadata

CDI's May monthly meeting included updates on CDI projects focusing on FAIR data, a grassland productivity forecast, and animal movement visualization. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Building a Roadmap for Making Data FAIR in the U.S. Geological Survey, Fran Lightsom, USGS 

Fran Lightsom presented on the process of building a roadmap for making USGS data FAIR. FAIR stands for Findable, Accessible, Interoperable and Reusable and has become a popular way for organizations to improve the value and usefulness of data products. 

To begin building a roadmap for FAIR data, the project team conducted a survey of data producers, collected use cases of projects that integrate data, hosted a workshop on September 9th-11th, 2019, and drafted a report & list of recommendations. The workshop produced about 100 discrete recommendations, with 14 being deemed essential, 38 important, and 44 useful. 

Some broad thoughts that came out of the workshop included the assertion that open science requires extension of FAIR beyond data to samples, methods, software, and tools; a less-explored application of FAIR. Implementing recommendations would be the responsibility of many groups, and would require input from representatives of these groups. There may be a place for CDI to step in and coordinate in the future, as this effort continues. 

Further objectives coming out of this effort include increasing use of globally unique persistent identifiers (especially with physical samples and software), developing policy, researching best practices, creating support tools, enabling creation of digital products that are interoperable and usable by making use of existing standards, and improving interoperability through coordinated creation of shared vocab and ontology. 

An opportunity for CDI to view and provide feedback for the FAIR roadmap is upcoming. 

Implementing a Grassland Productivity Forecast Tool for the U.S. Southwest, Sasha Reed, USGS 

Grass-Cast is a CDI-funded project that is focused on producing near-term forecasts of grassland productivity for the U.S. southwest. The goal of the project is to bring together different kinds of data in order to provide upcoming growing season forecasts, updated very 2 weeks. This work started in the Great Plains to provide information about seasonal outlooks to ranchers. 

So, why are grasslands important? Grasslands provide a critical amount of ecosystem services. They are one of the largest single providers of agro-ecological services in the U.S., and they supply important habitat and food provision for wildlife. Productivity of grasslands helps to determine fire routines and how much carbon is coming from the atmosphere into the grass and soil. Dust reduction and problems associated with air quality can also be thought about from a grassland productivity perspective. 

Near-term productivity forecasts for grasslands can provide information to stakeholders on cattle stocking rates, where and how to allocate resources towards fire management, and rates of carbon sequestration. Grasslands are notably responsive to subtle changes in the environment and climate, and thus, they vary from year to year, making productivity predictions difficult. 

 

The diagram above outlines the process that informs Grass-Cast for the Great Plains, but the project team wants to expand to include the Southwest region. The Southwest region differs from the Great Plains in that it does not have the same homogeneous coverage of grasses, meaning that bare ground is often exposed, complicating the interpretation of remotely sensed data. The Southwest also has a more varied mix of vegetation types, including cacti and shrubs, which needs to be differentiated from grass cover. 

The Grass-Cast team aimed to take the same overarching process used in the Great Plains Grass-Cast, but adjust the methods to effectively use Grass-Cast in the Southwest. First, the team looked at different satellite indices for estimating grassland productivity in the hopes they might better address the challenges of the Southwest. They found that the previously utilized NDVI (normalized difference vegetation index) greenness index did work well in a lot of places in the Southwest, but not as well in others. These results supported the idea to try newer remote sensing platforms that don't rely on a greenness index, such as SIF (solar induced fluorescence). SIF is a different way of looking at plant activity that uses plant physiology to monitor how electrons are moving trough the photosynthetic chain. The Southwest is different from the Great Plains in that the dry environment means that you can have plants that are green but not very active, making the relationship between greenness and productivity more challenging. Additionally, many Southwestern grasslands have two growing seasons - spring and summer, representing a temporal challenge. Other remote sensing methods examined here were NIRv (near-infrared reflectance of vegetation), a greenness index that hones in specifically on green parts of remotely sensed pixels in images, and SATVI (Soil-Adjusted Total Vegetation Index), which takes into account soil brightness. 

The team compared results from these different indices using eddy covariance data, and found that neither SIF or NDVI provided good results. However, NIRv and SATVI did a good job of predicting grassland productivity for the Southwest, and there is some promise in SIF as a proxy for capturing the timing of the growing season. 

Grass-Cast now plans to incorporate data for the Southwest (Arizona and New Mexico) into the current tool. Ultimately,  the team wants to integrate across these different methods and go beyond Arizona and New Mexico. There is a lot of room for collaboration; stay tuned for upcoming workshops and seminars. 

GrassCast is available here. 

A generic web application to visualize and understand movements of tagged animals, Ben Letcher, USGS 

Tracking and tagging data on individual animals provides key information about movements, habitat use, interactions and population dynamics, and there is a lot of this type of data currently available. For example, the Movebank database currently has 2 billion observations. Tracking data is expensive and requires time and effort to collect; TAME (tagged animal movement explorer) aims to help maximize the value of this data and make it easier to interact with these complex data. 

TAME is a data exploration tool in the form of a web application, based on open source libraries. The TAME team's goal is to make TAME as easy to use as possible, and to allow for interaction and exploration of tagging data. Currently, TAME features include: 

  1. Four introduction videos 
  2. A user account system where users can upload their own data, with an option to publish and/or share 
  3. Ability to map observations to color, size, or outline 
  4. Ability to select individuals or select by area, with multiple area selections available 
  5. Ability to cross filter where users can filter any one variable, or multiple variables, and output a movie/time series of the data. 

 Screenshot of a slide showing features of the web application TAME

See the monthly meeting recording on the wiki page for a live demonstration of TAME, or explore for yourself on the TAME website. 

Ben Letcher (bletcher@usgs.gov) is excited to explore a podcast or video series centered on animal movement stories – please reach out to him if you have experience in this area! 

--  
All CDI Blog Posts  



Highlight images from the May 2020 Collaboration Area topics, from left to right: The User Experience Honeycomb (
source) (Usability), interfacing with hydrologic data with Hydroshare (Tech Stack), machine learning Train and Tune steps covered by SageMaker (AIML). 

CDI collaboration areas bring us focused information and tools to help us work with our data. Do you have an idea for a topic that you want to learn about or present to a group? Get in touch with us to coordinate! - Leslie, cdi@usgs.gov


Artificial Intelligence / Machine Learning, 5/12 - SageMaker for machine learning models 

Amazon Web Services personnel and USGS scientists presented on SageMaker and an example of its use at the USGS Volcano Science Center. SageMaker provides the ability to build, train, and deploy machine learning models quickly. Phil Dawson of USGS showed an application to the continuous seismic data that is collected at all USGS volcanic observatories, and how to apply the models even though "every volcano speaks a different dialect" (the seismic energy looks different).  

The recording is posted at the meeting wiki page 

Data Management, 5/11 - records management for electronic records 

Chris Bartlett presented on how records management is moving more aggressively to electronic records management, and it is a ripple of changes. She discussed what this means in relation to our records including data, our processes, and expectations. 

Slides and recording are posted at the meeting wiki page. 

Fire Science, 5/19 - scaling up tree-ring fire history 

The Fire Science Community of Practice heard the monthly fire update, discussion about fire science communications, and a science presentation from Ellis Margolis on Scaling up tree-ring fire history: from trees to the continent and seasons to centuries. 

Contact Paul Steblein or Rachel Loehman for more information. Future meeting dates are listed on the Fire Science wiki page

Metadata Reviewers, 5/4 - data publication versus research publication 

The group discussed the question "What type information (in the metadata) is necessary for a data publication vs research publication?" In addition, links were shared about an ongoing discussion on metadata for software and code.  

See more notes on the discussion at their Meetings wiki page. 

Risk, 5/12 - communicating hazard and risk science 

The Risk community of practice hosted a panel discussion on communicating hazard and risk science. The speakers were Sara McBride (USGS), Kerry Milch (Temple University), and Nanciann Regalado (Dept of Interior, US Fish and Wildlife Service). Each speaker shared news on some of their recent projects and lessons learned on the job. Projects discussed included ShakeAlert and aftershock forecasts, the USGS circular "Communicating Hazards – A Social Science Review to Meet U.S. Geological Survey Needs", and the Deepwater Horizon Oil Spill Natural Resource Damage Assessment Trustee Council.  

See more at the Risk community of practice wiki page 

Semantic Web, 5/14 - concept maps for modeling traceable adaptation, mitigation, and response plans 

Brian Wee presented on an experiment to use concept maps for documenting science-informed, data-driven workflows for climate-related adaptation, mitigation, and response planning. The ESIP wiki page on the concept map repository describes how concept maps can be used to describe your own data-to-decisions narrative, as a just-in-time (i.e. as needed) educational resource, to provide context awareness about where you fit in the big picture, and to experiment with ideas for context-aware knowledge discovery. 

See a link to the slides and recording at the Semantic Web meetings page. 

Software Dev, 5/28 - data warehousing and ETL pipelines 

May's topic was data warehousing and ETL (Extract, Transform, Load) pipelines. Cassandra Ladino presented on the use of Amazon Web Services (AWS) Redshift Data Warehouse as applied to the USGS Configuration Management Committee. Jeremy Newson presented on ETL pipelines using AWS Glue. 

See more at the Software Dev wiki meetings page. 

Tech Stack, 5/14 - HydroShare for sharing hydrologic resources 

The joint CDI Tech Stack and ESIP IT&I Tech Dive hosted a presentation on CUAHSI HydroShare by Jerad Bales, Anthony Castronova, and Jeff Horsburgh. HydroShare is a platform for sharing hydrologic resources (data, models, model instances, geographic coverages, etc.), enabling the scientific community to more easily and freely share products, including the data, models, and workflow scripts used to create scientific publications. 

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page. 

Usability, 5/20 - usability and trust 

resource review was posted on the topic of how usability and interface influence user experience, including credibility and use. "The resource highlights that user interface and credibility influence user experience because design elements can impact whether users trust and believe what is being presented or delivered to them." 

See more of the group's activity and resources on the Usability wiki page  

-- 
More CDI Blog Posts 

The CDI Collaboration Areas are keeping me busy. You can get to all of these groups and sign up for mailing lists on the CDI Collaboration Area wiki page.

From upper left corner, clockwise: DevOps: image from Tidelift website; SoftwareDev: logo for uvicorn; Risk: Impact360 worksheet; AI/ML: image from AI/ML DELTA presentation; Semantic Web: image from Garillo and Poveda-Villalon; Open Innovation: image from OI wiki page; Tech Stack: image from Unidata gateway webpage; Usability: image from Sayer's Paperwork Reduction Act presentation


4/6 Metadata Reviewers - revision or release information in titles

In April the Metadata Reviewers group dove into a question about including the date of a revision or release in the title of the data release. Doing so would help to distinguish between different versions of a data release. After much discussion the group concluded that two metadata records should not have the same title in their citation elements.

See more notes on the discussion at their Meetings wiki page.

4/7 DevOps - managed open source with Tidelift

The DevOps group heard a presentation from Tidelift. Tidelift partners with open source maintainers in order to support application development teams. This saves time and reduces risk when using open source packages to build applications.

See the recording and slides on the DevOps Meeting page. If you are interested in using Tidelift for a USGS application, get in touch with Derek Masaki at dmasaki@usgs.gov. If you'd like a presentation from Tidelift, contact Melanie Gonglach at melanie@tidelift.com.

4/9 Semantic Web - implementing FAIR vocabularies and ontologies

The group discussed  "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web" by Daniel Garijo and Marıa Poveda-Villalon. The discussion focused on sections 2 and 3 of the paper, URIs (uniform resource identifiers) and Documentation. The group recognized that implementation of the best practices in the paper (for example, stable, permanent identifiers) would depend not only on semantic specialists, but also those who set policy for the USGS network. This point was communicated to the group that is working on enabling FAIR practices in the USGS.

See more at the Semantic Web meetings page.

4/9 Tech Stack - Unidata Science Gateway

Julien Chastang presented on the Unidata Science Gateway (https://science-gateway.unidata.ucar.edu/) Unidata is exploring cloud computing technologies in the context of accessing, analyzing, and visualizing geoscience data. From the abstract: "With the aid of open-source cloud computing projects such as OpenStack, Docker, and JupyterHub, we deploy a variety of scientific computing resources on Jetstream for our scientific community. These systems can be leveraged with data-proximate Jupyter notebooks, and remote visualization clients such as the Unidata Integrated Data Viewer (IDV) and AWIPS CAVE."

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.

4/13 CDI Data Management - changes to the USGS Science Data Catalog

Lisa Zolly presented on changes coming with the USGS Science Data Catalog version 3. Today, the Science Data Catalog (https://data.usgs.gov/) has more than 21,000 metadata records. In order to serve its human and machine stakeholders, a number of changes are planned in order to address the changing landscape of federal data policy, substantial growth of the catalog, improvement of workflows, improvement of usability, and more robust reporting and metrics.

Slides and recording are posted at the meeting wiki page.

4/14 Artificial Intelligence / Machine Learning - fine scale mapping of water features at the national scale

Jack Eggleston (USGS), John Stock (USGS), and Michael Furlong (NASA) presented on "Fine scale mapping of water features at the national scale using machine learning analysis of high-resolution satellite images: Application of the new AI-ML natural resource software - DELTA." The availability of high-resolution satellite imagery, combined with machine learning analysis to rapidly process the satellite imagery, provides the USGS with a new capability to map natural resources at the national scale.

The recording is posted at the meeting wiki page.

4/15 Usability - how the Paperwork Reduction Act affects usability studies

James Sayer presented on the Paperwork Reduction Act (PRA) and Usability Testing. The PRA is designed to protect the public from inappropriate data collection. All agencies have their own PRA procedures, so implementation in other agencies won't necessarily translate to USGS implementation. James reviewed Fast Track procedures and exclusions. His advice included to start early in thinking about PRA in your usability work, and to talk to your ICCO (Information Collection Clearance Officer) if you have any questions.

The slides, notes, and recording are posted on the meeting wiki page. Do you have more questions? Contact James at jsayer@usgs.gov.

4/16 Risk - Product evaluation/testing and integrating solutions into strategy

The Risk Community of Practice April meeting was part 3 of a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. Emphasis was given to the tools for product evaluation/testing ("[Re]Solve") and integrating solutions into strategy ("[Re]Integrate"). Worksheets were provided to "Create and Test a Solution in Three Acts." A follow-up session on April 23 discussed examples of the worksheets.

Access the slides and recording, and handouts at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).

4/17 Ignite Open Innovation - Open Innovation and COVID-19

April was Citizen Science Month! At the Open Innovation meeting, Sophia B Liu (USGS Open Innovation Lead) provided an overview of the various open innovation efforts inside and outside of government that have emerged in response to COVID-19. She also discussed The Opportunity Project Earth Sprint and proposed Problem Statements.

See more information and list of COVID-19 sites at the meeting wiki page.

4/21 Fire Science - stakeholder input on USGS Fire Science

James Meldrum and Ned Molder of the USGS Fort Collins Science Center presented on Analysis of stakeholder input on USGS fire science communication and outreach, science priorities, and critical science needs. The group also heard updates on the USGS Fire Science strategy, recent fire activity, and held a discussion on "How is Covid 19 affecting your fire science"?

Contact Paul Steblein (psteblein@usgs.gov) or Rachel Loehman (rloehman@usgs.gov) for more information.

4/23 Software Dev - FastAPI

The Software Dev cluster had Brandon Serna and Jeremy Fee present about their work using FastAPI with some comparisons to Flask. I am not a developer so I will summarize by pasting some links, tag lines, and interesting things I heard.

Recommended resources.

I'm going to take a little bit of space to list some of the things I Googled while listening to this call, because to me these descriptions (and some of the logos) are fascinating. It would be fun to do a tagline-logo-name matching game.

  1. FastAPI, https://fastapi.tiangolo.com/: FastAPI framework, high performance, easy to learn, fast to code, ready for production
  2. Flask: https://flask.palletsprojects.com/en/1.1.x/: web development, one drop at a time
  3. Hot reloading <- this sounds very exciting, and according to the internet it is "The idea behind hot reloading is to keep the app running and to inject new versions of the files that you edited at runtime. This way, you don't lose any of your state which is especially useful if you are tweaking the UI"
  4. Uvicorn: https://www.uvicorn.org/: The lightning-fast ASGI server
  5. Cookiecutter https://cookiecutter.readthedocs.io/en/1.7.2/: Better Project Templates
  6. Gunicorn: https://gunicorn.org/: Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy
  7. Pyenv: https://github.com/pyenv/pyenv: pyenv lets you easily switch between multiple versions of Python. It's simple, unobtrusive, and follows the UNIX tradition of single-purpose tools that do one thing well
  8. Pipenv: https://pipenv-fork.readthedocs.io/en/latest/: Pipenv is a tool that aims to bring the best of all packaging worlds (bundler, composer, npm, cargo, yarn, etc.) to the Python world. Windows is a first-class citizen, in our world
  9. Hypercorn: https://pgjones.gitlab.io/hypercorn/: Hypercorn is an ASGI web server based on the sans-io hyper, h11h2, and wsproto libraries and inspired by Gunicorn

See more at the Software Dev wiki meetings page.


--
More CDI Blog Posts

We continued our exploration of 2019's CDI funded projects in April's monthly meeting with presentations on the Climate Scenarios Toolbox, developing cloud computing capability for camera image velocity gaging, and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database. 

For more information, questions and answers from the presentation, and a recording of the meeting, please visit the CDI wiki. 

Open-source and open-workflow Climate Scenarios Toolbox for adaptation planning 

Aparna Bamzai-Dodson, USGS, presented on the Climate Scenarios Toolbox (now renamed to the Climate Futures Toolbox!), an open-source tool that helps users formulate future climate scenarios for adaption planning. Scenario planning is a way to consider the range of possible outcomes by using projections based on climate data to develop usually 3-5 plausible divergent future scenarios (ex: hot and dry; moderately hot with no precipitation change; and warm and wet). Resource managers and scientists can use these scenarios to help predict the effects of climate change and attempt to select appropriate adaptation strategies. However, climate projection data can be difficult to work with in areas of discovery, access, and usage, involving multiple global climate model repositories, downscaling techniques, and file formats. The Climate Futures Toolbox aims to take the pain out of working with climate data.

Collection of photos of people collaborating around climate scenarios and adaptation planning graphs.

The creators of the Toolbox wanted a way to make working with climate data easier by lowering the barrier to entry, automating common tasks, and reducing the potential for errors. The Climate Futures Toolbox uses a seamless R code workflow to ingest historic and projected climate data and generate summary statistics and customizable graphics. Users are able to contribute open code to the Toolbox as well, building on its existing capabilities and empowering a larger user community. The Climate Futures Toolbox was created in collaboration with University of Colorado-Boulder's Earth Lab, the U.S. Fish and Wildlife Service, and the National Park Service. 

CDI members are encourage to become engaged in the Toolbox by installing and using it, providing feedback on issues, and contributing code to the package. Since April's monthly meeting, the project has developed and undergone renaming, so this is a rapidly evolving endeavor. 

Develop Cloud Computing Capability at Streamgages using Amazon Web Services GreenGrass IoT Framework for Camera Image Velocity Gaging 


Frank Engel at the USGS Texas Water Science Center presented next on a CDI project involving non-contact stream gaging within a cloud computing framework. 

Measuring stream flow is an important aspect of USGS' work in the Water Mission Area, and stream gaging, a way to measure water quantity, is a technique with which many scientists are familiar. However, it is sometimes difficult to obtain measurements with traditional stream gaging, like at times of flooding, or when measurement points are unsafe or unreachable. Additionally, post flood measurement methods can often be expensive and not as accurate. 

To get around these issues, scientists have developed non-contact methods with which to measure water quantity. For example, cameras are utilized to view a flooding river, which can produce a velocity measurement after processing and other analysis steps. This is a complicated method and requires many steps and extensive training. Thus, the goal of this project is to make this process work automatically utilizing cloud computing and IoT. 

The first step required building a cloud infrastructure, with the help of Cloud Hosting Solutions (CHS). This involves connecting the edge computing (camera and raspberry PI footage of a stream) to an Amazon Web Services (AWS) IoT system and depositing camera footage and derivative products into a S3 bucket. The code for this portion of the product is in a preliminary GitLab repository that is projected to be published as a part of the long-term project. The team is also still working toward building the infrastructure through to data serving and dissemination. 

Workflow for getting streamflow data into a cloud computing system.

Other successes accomplished with this project so far include auto-provisioning (transmitting location and metadata) of edge computing systems to the cloud; establishing global actions (data is transmitted to the cloud framework and can roll into automated processing, like extracting video into frames); and building automated time-lapse computation. 

Engel and the project team have taken away a couple lessons from their experience with this project: first, cloud computing knowledge takes a lot of work and time to acquire, and second, in the short term, It can be difficult to establish a scope that encompasses the needs and wants of all stakeholders. 

Establishing standards and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database 

Jason Ferrante with the Wetland and Aquatic Research Center discussed his team's project on establishing standards for eDNA data in the USGS Nonindigenous Aquatic Species database (NAS). 

eDNA is genetic material released by an organism into its environment, such as skin, blood, saliva, feces. By collecting water, soil, and air samples, scientists can detect the presence of a species with eDNA. Ferrante's project aims to combine the traditional specimen sightings already available in the NAS with eDNA detections for a more complete distribution record and improved response time to new invasions. 

There is currently a need for an open, centralized eDNA database. eDNA data is currently scattered among manuscripts and reports, and thus not easily retrievable via web searches. Additionally, there are no databases dedicated to Aquatic Invasive Species (AIS), which are the species of interest for this project. A centralized, national AIS viewer will allow vetting and integration of data from federal, academic, and other sources, increase data accessibility, and improve coordination of research and management activities. 

In order to successfully create a centralized AIS viewer, community standards need to be established so that data can be checked for quality and validity, especially within the FAIR data framework (Findable, Accessible, Interoperable, and Reusable). To establish community standards and successfully integrate eDNA into NAS, the project team accomplished several objectives: 

List of steps taken in integrating eDNA data into the Nonindigenous Aquatic Species Database

1) Experimental Standards 

  • Collating best standards and practices for sampling design and collection, laboratory processing, and data analysis, in an eDNA literature review. 

2) Stakeholder Backing 

  • Gathered a group of five other prominent/active eDNA researchers within DOI to discuss standards and vetting process 
  • Teleconferences to gain consensus 
  • Plan to produce a white paper 

3) Integration into NAS 

  • Pre-submission form about eDNA scientists' design and methodology in order to vet data 
  • Prototype web viewer (see meeting recording for more; must be logged into CDI wiki) 

Some challenges faced during the project included gaining consensus on the questions for the pre-submission form; staying organized and in communication; and meeting the needs of managers and researchers. Ferrante and the project team would love to follow up with CDI for help developing new tools which use eDNA data across databases to inform management; and providing feedback on an upcoming manuscript about the project's process. 


The CDI Monthly Meeting in March focused on three 2019 CDI-funded projects: a national-scale map of sinkhole subsidence susceptibility; a project that will allow collection of near real-time eDNA surveillance of invasive species or pathogens; and an overview of SEINed - a tool for Screening and Evaluating Invasive and Non-native Data.

Subsidence Susceptibility Map for the Conterminous U.S. Jeanne Jones, USGS 

Jeanne Jones at the Western Geographic Science Center shared progress on the creation of a Subsidence susceptibility Map for the conterminous U.S. that will identify hotspots for sinkholes and areas susceptible to developing sinkholes. Sinkholes can pose major issues by focusing contaminated and/or polluted surface water into groundwater and creating instability in the foundations of buildings and roads. As such, a consistent map for the identification of sinkhole hotspots is vital in order to anticipate and manage risks. 

The goals of this project include creating the first nationwide digital dataset of sinkhole hotspots, incorporating this dataset into the SHIRA (CDI) Risk map for use by DOI emergency agencies, and providing access to the dataset for external use by emergency managers, land use planners, and public works agencies. To meet these goals, the project team used The National Map, the National Hydrography Dataset (NHD), the Yeti supercomputer, and other data.


Jeanne shared challenges and solutions involved with several steps of the process. For instance, data had to be screened visually and manually in order to identify gaps in spatial coverage, and to screen out wetlands, open water, urban areas, and other non-karst landscape features. 

Jeanne also posed a question to the CDI: How does flow accumulation processing with DEMs compare across Arcpy, TauDem, RichDem in terms of speed, consistency of results, max size of raster for high performance computing? You can respond to her at jmjones@usgs.gov.

High-Resolution, Interagency Biosurveillance of Threatened Surface Waters in the United States Sara Eldridge and Elliott Barnhart, USGS

The project presented by Elliott Barnhart tackles the problem of rapid detection and prediction of biological hazards. USGS collects a massive amount of near real time data with stream gauges, but the analysis of this data can take much longer. To solve this problem, the project team created a cloud-hosted digital database that combines all the collected data, and can easily incorporate eDNA and other data streams into models that indicate the presence or absence of organisms. 

Challenges faced during the course of this project included creating effective quality control filters for funneling in data from multiple sources, and linking the benefits and capabilities of several different systems (like the MBARI Environmental Sample Processor, Department of Energy Systems Biology Knowledgebase, and more). 

National Public Screening Tool for Invasive and Non-native Aquatic Species Data Wesley Daniel, USGS

The Nonindigenous Aquatic Species (NAS) is the central repository for spatially referenced accounts of introduced aquatic species. NAS tracks over 1,290 aquatic species and stores over 600,000 observations from across the U.S. and spanning from the 1800's to the present. The SEINeD tool was developed to solve the problem: How does the NAS database get non-native occurrence data from groups not focused on invasive species? The SEINeD tool allows stakeholders to upload a biological dataset (fish, inverts, plants, etc.) collected anywhere in the conterminous US, Alaska, Hawaii, or US Territory that can then be screened for invasive or non-native aquatic species occurrences. 

The SEINed tool helps to filter out inaccuracies due to incorrect taxa and spatial identifications before checking the indigenous status of the species against the sighting location. The tool flags non-native species that are exotic (from other countries/continents), AND non-native species from within the U.S. (for example, rainbow trout native to the west coast on the east coast) The data is then enhanced with the addition of spatial information like hydrological unit codes (HUCs) and returned to the user. The user can then submit the enhanced/corrected CSV to the NAS program.  

The SEINed tool launches May 4thWatch the NAS website for updates. 

See the recording and slides at the March Monthly Meeting page.

--
More CDI Blog Posts

You can get to all of these groups and sign up for mailing lists on the CDI Collaboration Area wiki page.

3/2/20 Metadata Reviewers - sharing metadata creation practices for data and software

The Metadata Reviewers group discussed (1) At your office, how much do you create a single metadata record for? Individual data files, items in a database, collections of data, whole data releases, or what? (2) What about metadata for software or code? How can we prepare to think together about that, maybe on our next phone call? Should we invite a speaker? Bring in reference materials? Bring in good examples?

Link highlight: Data Management Training resources on the USGS Data Management Website

See other notes, links, and take-aways on the Metadata Reviewers Meetings Page! Contact: Lightsom, Frances L.


Training relevant to metadata creation is on the USGS Data Management Website!

3/3/20 Fire Science Update

At the March Fire Science Community of Practice meeting, Paul Steblein gave a fire science update. An FY2021 Request for Proposals will be issued this summer from the Joint Fire Science Program. Anna Stull presented on fire deployment requirements. Geoff Plumlee presented on the Department of Defense (DoD) Joint Artificial Intelligence Center (JAIC), Humanitarian Assistance and Disaster Relief (HADR) National Mission Initiative (NMI). During and post fire what can USGS be doing to link topography, mineralogy, debris flow, revegetation, invasive species? Rachel Loehman presented on the status and next steps of the USGS Fire Science Strategic Plan, and she is a new co-lead with Paul for the Fire Science CoP!

Past activity can be viewed at the Fire Science wiki page. Contact: Steblein, Paul Francis 


Recent wildland fire activity is shared at the Fire Science Community of Practice Meetings. 

3/9/20 Data Management - How do we convince them?

The Data Management Working Group again welcomed Science Gateways Community Institute trainers Claire Stirm and Juliana Casavan to host a working session on Communicating Data Value Propositions to Scientists. I found the concepts to be useful for communicating anything in general when trying to get "buy-in." Although some of us may be uncomfortable with the term "marketing," I think we may relate to the lessons of verbiage, graphics, actions, and strategies for trying to get buy-in for whatever we are working on.

Slides and recording are posted at the wiki meeting page. Contacts: Langseth, Madison Lee and Hutchison, Vivian B.


Slide from the "Communicating Data Value Propositions to Scientists" presentation.

3/12/20 Semantic Web - biodiversity in the world's oceans

Summary provided by Lightsom, Frances L.

Topic:  A practical example of semantic technology in action: assessing the status of biodiversity in the world’s oceans

Sky Bristol (USGS Core Science Systems, https://orcid.org/0000-0003-1682-4031) presented a big use case, in which semantic standards are needed to enable integration of multiple large datasets of ocean biological and ecological observations to understand effects of human activities on ocean ecosystems, as well as the sustainability of human uses of ocean resources. Several groups are working on ontologies that provide standard terms for the biota, ecosystem components, and relationships. A next step will be normalizing all the data with ontologies. This is an opportunity to assist with real world semantic web work.

See more at the Semantic Web meetings page.


A slide from Sky Bristol's presentation on a practical example of semantic technology - biodiversity in the world's oceans.

3/12/20 Tech Stack - provision of rapid response during Australian bushfires

Following the ESIP Theme "Putting Data to Work," the Tech Stack group had a presentation on the Discrete Global Grid System's use during the Australian bushfires.

From the abstract: The devastation caused by the Australian Bushfires highlighted the need for a new approach for rapid data integration. The total burnt area during Autumn-Summer 2019-2020 is 72,000 square miles, which is an equivalent to a half of Montana or North Dakota and Delaware areas combined. Rapid response in provision of information on areas affected by the bushfires was required to support evaluation of the impact, and also planning the recovery process and support for families, businesses and the environment. This presentation will discuss application of the Discrete Global Grid System (DGGS) in bringing together diverse complex information from multiple sources to support the response process. 

Slides and recording on the joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page. Contact: Blodgett, David L.


Slide from the DGGS and Australian bushfires presentation.

3/16/20 Ignite Open Innovation Forum - volcanic hazards and problem-based learning

Jefferson Chang presented on Using Volcanic Hazards in Hawai‘i as a STEM Platform in Problem-Based Learning.

From the abstract: We use emerging technology to empower youth in a problem-based learning approach during a summer-long course. With guidance from HVO scientists, students essentially adopt the hazards mission of the USGS. Students not only aid in the volcano monitoring efforts on Hawai‘i Island, but also (1) take ownership of their own learning, (2) increase their capacity in STEM, and (3) engage the local community and address its needs.

See more at the Open Innovation wiki site. Contact: Liu, Sophia

 
Slide from Jefferson Chang's presentation at the Ignite Open Innovation Forum.

3/17 - 3/18/20 ICEMM - Interagency Collaborative for Environmental Modeling and Monitoring public meeting - Integrated Modeling, Monitoring, and Working with Nature.

The ICEMM group held its 2020 Public Meeting (March 17-18, 2020) at USGS Headquarters, Reston, VA.  The theme was Integrated Modeling, Monitoring, and Working with Nature.

Selected presentation titles: Engineering with nature for sustainable systems; Building smarter water systems through improved sensors, autonomy, and data processing; Black swans, disappearing lakes, and the societal value of integrated modeling and monitoring; Integrated water prediction at the USGS; Next generation integrated modeling of water availability in the Delaware River Basin and beyond.

Links: Agenda.  Abstracts and Biographical Information. Description of the meeting.  

Please contact Glynn, Pierre D  by email (pglynn@usgs.gov) for further information or questions.


Slide from Branko Kerkez's presentation on Building smarter water systems.

3/18/20 Usability Resource Review Posting - using web analytics to inform how our web pages and tools are being used

From Sophie Hou, Hou, Chung Yi (Contractor) :

For March, I have prepared a resource review to address questions relating to the “How can we use Google Analytics DataStudio to inform how our online tools are used?” topic posted to the forum.

Using analytics to inform how our web pages/tools are being used

Please note that although Google Analytics is referenced in the original question and in my resource review, there are other options. If you have used tools other than Google Analytics, please could you share the information/experience through the usability listserv?


Behavior flow snapshot from the resource: How to use the behavior flow report to improve your webpage user experience.

3/19/20 Risk Community of Practice - situation assessment, stakeholder alignment, prototyping, and strategic planning

From the Risk CoP wiki: This was part 2 of a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. During this webinar, participants took a deeper dive into six of the tools from Impact360's Toolkit360. Toolkit360 includes six tools for collaboratively understanding wicked problems and six tools for collaboratively generating strategies to solve wicked problems. The twelve tools bridge the "problem space" and "solution space" using situation assessment, stakeholder alignment, prototyping, and strategic planning.

Access the slides and recording, and handouts at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet). Contact: Ludwig, Kristin A.


I appreciated the clear steps for the different tools in the toolkit, for example for [Re]Assess.

3/26/20 Software Development - using the cloud to support geophysical research

Kirstie Haynie, a Mendenhall post-doc at the USGS, is exploring how to use the cloud to support geophysical research. She presented from a geophysicist's point of view, running the Slab2 model in the cloud. With the goal of operationalization of Slab2, she demonstrated the process of using CloudFormation templates for reproducibility, automation, and long term success.

Get to more resources on the Software Dev Cluster wiki page. Contacts: Ladino, Cassandra C.Guy, Michelle , Newson, Jeremy K.


Subduction zones via Slab2 at the software dev cluster meeting!


--
More CDI Blog Posts

I attended the "Advancing FAIR and Go FAIR in the U.S." workshop in February; the workshop covered topics on how to establish and promote FAIR culture and capabilities within a community. Many of the discussions were synergistic with the CDI activities, so I wanted to share some key points from the workshop with the CDI community. - Sophie Hou


(Logo from the Go FAIR Initiative)

Workshop Info 

Title: Advancing FAIR and Go FAIR in the U.S.  

Date: February 24th to 27th, 2020 

Location: Atlanta, Georgia 

Goals: 

  • Facilitate development of a community of practice for FAIR awareness and capacity-building in the US 
  • Improve understanding of FAIR technologies, and how to teach this to others 
  • Preparation for teaching or supporting FAIR data management and policies for researchers, local institutions, professional organizations, and others 

Link: https://www.sdsc.edu/services/data_science/research_data_services.html  

 

Overall Summary: 

  • The workshop highlighted that advancing FAIR requires communal effort. 
  • In order to "FAIRify," it is important for a community to first determine its scope, goals, and objectives. 

 

Key Notes: 

  • FAIR is an acronym from Findable, Accessible, Interoperable, and Reusable. 
  • Typical challenges that a community could face when working on FAIR include:
    • Knowledge gap
    • Institutional inertia
    • Community relationship building
    • Expanding FAIR capacity
    • Best way to adapt and adopt available FAIR resources
  • The ultimate goal of enabling FAIR is to allow both humans and machines (especially machines) to use digital resources, so that analytics and re-use can be optimized.
    • According to the Go FAIR Initiative (https://www.go-fair.org), FAIR can also be understood as Fully AI Ready. In other words, machines are able to know what the digital resources mean. Additionally, the digital resources are as distributed/open as possible, but can also be as central/closed as needed.
  • Implementation of FAIR can be challenging because many concepts in the principles are multifaceted (including social, resource, and technical considerations).
  • In order to advance FAIR, it is important to first establish a good (common) understanding of the FAIR principles.
  • FAIR requires technical and disciplinary resources, but it also requires community support.
    • When implement FAIR, we need to review choices and accept challenges; e.g. who is our "community", and determine what is specific to our "community".
    • FAIR is not a “standard”. The local community context is important and necessary.
  • The Go FAIR Initiative offers a 7-step "FAIRification" process: https://www.go-fair.org/fair-principles/fairification-process/ 
  • Options for conducting a FAIR event/activity with one's community include:
    • Multiple day, experts convening, tutorial/webinar, conference, unconference, hackathon, symposium, sprint, posters, etc.
  • Participants of an FAIR event/actiity might have the following expectations:
    • Share best practices/resources/learn new skills
    • Tackle a problem
    • Learn new concepts/skills
    • Use FAIR as a them to track for other topics
    • Collaborate to create a resource to be shared
    • And more!
  • Once a community has established its version of FAIR, it is important to connect with other communities. Convergence with different communities is key to grow FAIR. 


CDI's February meeting featured a discussion on the value of CDI to you, and a deep dive into Pangeo.

Pangeo: A flexible open-source framework for scalable, data-proximate analysis and visualization

Rich Signell, a Research Oceanographer at the Coastal and Marine Science Center in Woods Hole and member of the Pangeo Steering Council, presented an overview of Pangeo and examples of uses for Pangeo for several different types of USGS workflows. The Pangeo framework is deployed by Cloud Hosting Solutions (CHS) and funded by EarthMAP as a new form of cloud-based model data analysis. Community-driven, flexible, and collaborative, Pangeo is slowly building out a set of tools with a common philosophy. In one example, Rich used a Pangeo Jupyter Notebook to process a dataset in one minute that had previously taken two weeks. Cloud costs, skills, cloud-optimized data, and Pangeo development are issues that are currently being addressed.

For more:

https://medium.com/pangeo

https://discourse.pangeo.io/

https://gitter.im/pangeo-data

Pangeo and Landsat in the Cloud

Renee Pieschke, a Technical Specialist for the Technical Services Support Contract at the Earth Resources Observation and Science Center in Sioux Falls, SD, continued our Pangeo focus with some information on Landsat in the cloud. Renee and her team is looking to a spring release of collection two data, which will exponentially increase the amount of data available. Level 2 processing will be required for the collection two data (trying to get close to what it would be like if you were looking at the ground; taking out disturbances, clouds, etc).

The Landsat Look upgrade uses a cloud-native infrastructure and a cloud-optimized GeoTIFF format. It uses new SpatioTemporal Asset Catalog metadata to programmatically access the data. The new Landsat Look can filter pixels with a QA Band so that any clouds, shadows, snow, ice, or water is removed to produce the best possible image.

The SpatioTemporal Asset Catalog was developed to help standardize metadata across the entire geospatial data provider community, using a simple JSON structure. It normalizes common names, simplifies the development of third-party applications, and helps enable querying in Pangeo. Another in-progress goal is connecting with Landsat data in the cloud. Getting this Landsat data into the cloud involves converting the data to a cloud-optimized GeoTIFF format and this kind of data is already fueling the backend of Landsat Look.

USGS users can access Pangeo and some test notebooks through http://support.chs.usgs.gov/ and code.usgs. More information is available on the meeting slides.

Why is CDI valuable to you? Why do you participate?

A poll was administered on sli.do to participants to see what the value of CDI is to them. Some responses are below.

"I like to hear about (and share) the cool work folks are doing throughout the USGS! The Communities are valuable because they allow folks to share innovative research and discuss ways we can do so while following Department, Bureau, Mission Area policy."
"CDI provides relevant, useful, and timely data management related issues, projects, and tools."
"I learn about new technology applications and learn of colleagues I might collaborate with."
"The CDI helps me to get my work done in my daily job! I find the people who are part of the CDI are amazing to interact with - they are engaged, enthusiastic, and interested in making things better at USGS. CDI has made me feel like I am more in touch with the USGS - there is so much going on in this Bureau, and CDI keeps me informed and makes me feel like I am part of something bigger than just my daily job."
"Demonstrate that best practices in data sci/software/etc. is important to colleagues."
"Diverse community, wide range of experience and expertise."

More information, including notes, links, slides and video recordings on the meeting, are available here.


January's monthly meeting covered how to evaluate web applications and better understand how they are working for users, and explored well-established strategies for USGS crowdsourcing, citizen science, and prize competition projects. 

Application Evaluation: How to get to a Portfolio of Mission Effective Applications 

Nicole Herman-Mercer, a social scientist in the Decision Support Branch of the Water Resources Mission Area's Integrated Information Dissemination Division, presented on how to evaluate web applications based on use, value, impact, and reach, as defined below. 

Use  

Definition: take, hold, view, and/or deploy the data/application as a means of accomplishing or achieving something. 

  • How many people use this application? 
  • How many are new users? 
  • How many are returning users? 
  • Are users finding what they need through this site/application? 

Herman-Mercer used Google Analytics to answer some of these questions. Google Analytics provided information such as total daily visits, visits through time, what pages users are visiting and how they're getting there (links from another website, search, or direct visits), how often they're visiting, how many repeat visits occur, and how long users spend on individual pages. 

Value 

Definition: The importance, worth, and/or usefulness of the application to the user(s) 

  • How willing are users to pay for the application? 
  • How important is this application to the user's work and/or life? 
  • What/how large would the impact of the loss of this application be to the user? 

To estimate the value of selected applications to users, an electronic survey was sent to internal water enterprise staff, which asked respondents to indicate which applications they used for work, and then to answer a series of questions about those applications. Questions attempted to pinpoint how important applications were to users, and how affected their work would be should the application be decommissioned. 

Impact 

Definition: The effect the application has on science, policy, or emergency management 

  • How many scientific journal articles use this application? 
  • Is this application relevant for policy decisions? 
  • Do emergency managers use this application? 

Publish or Perish software for text mining was used to get at some of these data points. Publish or Perish searches a variety of sources (Google Scholar, Scopus, Web of Science, etc.) and returns any citations that applications are getting. Attempts to search for policy document citations has proven more difficult, and was not factored into this evaluation as a result. 

Reach 

Definition: How broadly the application reaches across the country and into society 

  • Where are users? (Geographically) 
  • Who are users? (Scientists? Academia? Government?) 

Google Analytics was again used to gather visits by state, which was then compared with the state population to get an idea of use. These analytics could also identify which networks users are on, i.e., .usgs, .gov, or .edu. Finally, an expert survey was deployed, surveying users who developed the application or currently manage it to get a sense of who the experts think the intended and actual audience is. 

Contact Nicole at nhmercer@usgs.gov for a detailed report on the full evaluation. 

Herman-Mercer's team was inspired by Landsat Imagery Use Case studies. 

USGS Open Innovation Strategy for Crowdsourcing, Citizen Science, and Competitions 

Sophia Liu, an Innovation Specialist at the USGS Science and Decisions Center in Reston, VA, as well as the USGS Crowdsourcing and Citizen Science Coordinator and Co-Chair of the Federal Community of Practice for Crowdsourcing and Citizen Science, presented an overview of well-established USGS crowdsourcing, citizen science, and prize competition projects. 

Citizen science, crowdsourcing, and competitions are all considered by Liu to be types of open innovation. Definitions of these terms are as follows: 

  • Citizen science: public participation or collaboration with professional scientists requesting voluntary contributions to any part of the scientific research process to enhance science. 
  • Crowdsourcing: a way to quickly obtain services, ideas, or content from a large group of people, often through simple and repeatable micro tasks. 
  • Competitions: challenges that use prize incentives to spur a broad range of innovative ideas or solutions to a well-defined problem. 

A popular example of citizen science/crowdsourcing is citizen seismology or public reports of earthquakes, like Did You Feel It? 

Liu has documented about 44 USGS crowdsourcing and citizen science projects, and 19 USGS prize competitions. Some examples of open innovation projects and information sources are listed here: 

Participants during the presentation were asked to use the following Mentimeter poll to answer short questions and provide feedback on the talk. 

Sophia is looking for representatives from across all USGS mission areas, regions, and science support offices interested in giving feedback on the guidance, catalog, toolkit, and policies she is developing for the USGS Open Innovation Strategy. Feedback can be provided by joining the USGS Open Innovation Strategy Teams Site or emailing her at sophialiu@usgs.gov. 

See the recording and slides at the meeting page. 

You'll probably want to join a new collaboration area after reading all of this exciting news. You can do that by following the instructions on this wiki page. You can also get to all CDI Collaboration Area wiki pages here. We recently added quick link buttons to meeting content to most collaboration area wiki pages.

2/3/20 Metadata Reviewers - Persistent unique identifiers for USGS metadata records

The group had a conversation about the need for persistent unique identifiers in USGS metadata records that could be used across different government systems, including usgs.gov and data.gov. Lisa Zolly presented some slides frame the conversation, and take-aways are on the Metadata Reviewers Meetings Page.


Slide from the Metadata Reviewers Community of Practice February meeting.

2/4/20 DevOps - USGS Map Production on Demand (POD)

February's DevOps meeting was like a Valentine's Day-themed love letter to the partnership between Dev and Ops. (Yes, that is a subjective opinion.)  Andy Stauffer (Dev) and Robert Djurasaj (DevOps) combined to present on "Automating the Deployment of a National Geospatial Map Production Platform Using DevOps Workflows." A dedicated DevOps team was critical for scaling up workflow and infrastructure for the Map Production On Demand (POD) system. See the recording at the DevOps Meeting page.


Slide from the DevOps February meeting presentation.

2/10/20 Data Management - What's the value of your project?

The Data Management working group meeting used an interactive virtual format to have small group discussion about developing data management value propositions. Science Gateways Community Institute superstars Claire Stirm and Juliana Casavan presented the essence of value propositions - A clear understanding of the unique value your project delivers to your users or stakeholders. Virtual breakout groups held discussions and came up with many answers to "Why is CDI important for data managers?" Claire and Juliana's tips on value propositions included being succinct and developing different value propositions for different audiences. See the slides and the value propositions at the meeting notes page!


A general formula for developing a value proposition statement.

2/13/20 Tech Stack - Urban Flooding Open Knowledge Network

Mike Johnson from UC Santa Barbara presented on the Urban Flooding Open Knowledge Network. This is an exciting stakeholder-driven knowledge network project with emphasis on prototyping interfaces and web resources. See other joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.

2/19/20 Usability - Choosing Usability Techniques

Sophie Hou presented on "Choosing Usability Techniques." The process starts with establishing the context: what are you trying to learn and why? What are the gaps? What information do we need to learn and why? Next you should decide on the types of data needed: Attitudinal vs. Behavioral vs. Qualitative vs. Quantitative. Thank you to Sophie for continuing to build our knowledge about how to improve usability in our projects! See the slides and the notes on the Usability meeting page.


Slide from the Usability group's February presentation showing the differences between different types of usability data.

2/20/20 Risk CoP - Human-centered design, frame innovation, and systems theory (Panarchy)

The Risk CoP hosted guest Scott Miles from Impact360 to kick off a special training series. Scott's presentation was loaded with information!

From the Risk Meetings page: This was the kickoff meeting for a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. We met the Impact360 team and were introduced to the foundations and terminology of Impact360's toolkit, Toolkit360. Toolkit360 is a rigorous, intentional process to collectively amplify researcher and practitioner superpowers to integrate knowledge and action, unlike doing it “the way we’ve always done it” or working in silos. Toolkit360 fuses processes and methods from human-centered designframe innovation,and systems theory (Panarchy)The Toolkit360 process uses 12 tools to bridge the problem and solution spaces with situation assessment, stakeholder alignment, problem framing, and prototyping. Future training webinars during the Risk COP's March and April monthly meetings will take deeper dives into these tools. 

Access the slides and recording at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).


Slide from the Risk Community of Practice's February meeting.

2/26/20 Open Innovation

Sheree Watson presented on Using Open Innovation to Engage Youth and Underserved Communities in the Pacific Islands. Sophia B. Liu followed with a discussion of why Open Innovation matters and how to participate in the USGS Open Innovation Strategy. See more ways to get involved in USGS Open Innovation at the Open Innovation wiki site!

2/27/20 Geomorphology Tools - Floodplain and Channel Evaluation Tool (FACET) and Hyper-Res Hydrography

Peter Claggett and Labeeb Ahmed presented on Mapping Channels & Floodplains: Hyper-Res Hydrography and FACET. FACET stands for Floodplain and Channel Evaluation Tool and more can be found at https://code.usgs.gov/water/facet.

Matt Baker (University of Maryland, Baltimore County) presented on 'Hyper'-resolution Geomorphic Hydrography: Methods, advantages, and shifting paradigm.

I always try to make myself look like I've been around for a long time with references to "do you remember when?" and I will do that now. I remember when the first LIDAR images started showing up at conferences and presenters would first show a standard (at the time) 30-meter DEM and then flip to the next slide that was LIDAR 1-meter resolution, and the whole audience would gasp in astonishment. Then this was followed by people saying "Well, none of the old tools work on LIDAR data, we have to build a whole bunch of new tools to analyze this higher resolution data." And that is what I thought of when watching these presentations. We've come a long way!

Access the recording at the Geomorphology Tools meeting page.


Slide from the Hyper-Resolution Geomorphic Hydrography presentation.

2/27/20 Software Development - API development for everyone

Jeremy Fee presented on Swagger and Micronaut. Jim Kreft showed the example of the Water Quality Data Portal at https://www.waterqualitydata.us/.

Learn more at:

  • https://swagger.io/ - "API Development for Everyone"
  • https://micronaut.io/ - "A modern, JVM-based, full-stack framework for building modular, easily testable microservice and serverless applications."
  • http://www.ogcapi.org/: The OGC API family of standards are being developed to make it easy for anyone to provide geospatial data to the web.

See the recording at the Software Dev meeting page!


Jim Kreft demonstrated some of the details of the National Water Quality Data Portal.


Thanks to our collaboration area group leads, who organized topics and speakers ! Lightsom, Frances L. , Masaki, Derek , Hughes, David R. , Langseth, Madison Lee , Hutchison, Vivian B. , Blodgett, David L. , Signell, Richard P. , Hou, Chung Yi (Contractor) , Ludwig, Kristin A. , Emily Brooks , Ramsey, David W. , Liu, Sophia , Ladino, Cassandra C. , Guy, Michelle , Newson, Jeremy K.  


--

More CDI blog posts

You can add yourself to a CDI collaboration area by following the instructions on this wiki page.

Usability Resource Reviews Posted - Quick reads on usability

Want to know more about When to do usability testing, How to include usability effectively, and How to foster a user-centered approach in a project?  The January Usability collaboration area resource reviews address these topics. Sophie Hou Hou, Chung Yi (Contractor) will be posting resource reviews every-other month in order to help us build up our usability knowledge. Check out the usability resource reviews on our wiki here.

Sharing ownership of UX, from https://www.uxmatters.com/mt/archives/2007/05/sharing-ownership-of-ux.php

1/6 Metadata Reviewers CoP - USGS Digital Object Identifier Tool

Lisa Zolly, Zolly, Lisa  presented on "What's new with the DOI Tool?"

The USGS DOI tool allows USGS personnel to assign globally unique, persistent, and resolvable identifiers, registered with DataCite, to USGS-funded data products. Recent improvements to the tool address usability issues and compatibility with external systems such as CrossRef DOIs for related publications, DataCite, and ORCiD researcher IDs. DOI Tool FAQs link.

Find the slides, which clearly lay out the changes and the reasons for the changes, on the Metadata Reviewers Community of Practice meeting page.


The USGS DOI Tool allows for different levels of versioning of data releases.

1/7 DevOps - Serverless Platforms for improved data quality and more efficient data intake and validation - a PAD-US Pilot

Mike Giddens from Xentity presented "A Serverless Platform for Protected Areas of the U.S." He described a pilot project to explore a possible next generation of the PAD-US (Protected Areas of the U.S.) application in a "completely serverless architecture." The revised framework aims to allow for more efficient intake and validation of data, improved data quality, and capacity to publish new data products for biodiversity assessments and recreational applications. The presentation recording is available on the DevOps meetings page.

More info: PAD-US Viewer. PAD-US Data Overview.


The current interface of the Protected Areas Database of the United States.

1/9 Semantic Web WG: a semantically powered modular architecture and resources about the semantic web

The group looked at the paper "SemantEco: a semantically powered modular architecture for integrating distributed environmental and ecological data." In addition, resources on the basics of the semantic web were shared on the topics of graph databases and ontologies and reasoning. See more at the Semantic Web meetings page.

Figure detailing a subset of the wildlife ontology from the SemantEco paper.

1/13 Data Management WG - two USGS data inventory systems

Heather Schreppel Schreppel, Heather and VeeAnn Cross Cross, VeeAnn A presented on COMPASS, the data inventory system for the USGS Coastal and Marine Hazards and Resources Program (CMHRP). COMPASS provides a centralized location to collect and preserve field activity information, track data, search for data within the system, facilitate field activity management, aid data managers in the preservation of scientific data, and maintain a comprehensive data catalog.

Colin Talbert Unknown User (talbertc@usgs.gov) presented on DataDash: Fort Collins Science Center’s Proposal, Project, Deliverable, and Archive Tracking Dashboard. DataDash links the Center's project records with several other USGS systems to improve project tracking.

The presentation slides and recording are available on the meeting wiki page.


Slide about the COMPASS data inventory system and its purpose.

1/16 Risk CoP - Usability and engaging stakeholders

Risk CoP summary provided by Kris Ludwig Ludwig, Kristin A. :

Applying Usability to Web-Tool Design and Development – Using HERA as a Use Case.  Sophie Hou Hou, Chung Yi (Contractor) provided an overview of HERA (the USGS Hazard Exposure and Analytics Tool,  https://www.usgs.gov/apps/hera/), the motivations for HERA to enhance its user interface and user experience, the usability strategy and techniques the HERA team has been using for the redesign, and recent results.

Update on the Strategic Hazard Identification and Risk Assessment (SHIRA) Project: Engaging Stakeholders in the Pacific Northwest. Emily Brooks Emily Brooks  and Alice Pennaz Pennaz, Alice Bridget provided an update on the SHIRA project and shared themes from stakeholder interviews with emergency managers in National Parks and National Wildlife Refuges in Washington state.

The recording and slides are available on the Risk meetings page

In the coming months during their regular meeting time, the Risk CoP will host a series of trainings from the Impact360 alliance "Elevating the interconnected nature of research and practice to reduce impacts of natural hazards and disasters." Learn more on the Risk wiki page


Slide from the SHIRA - engaging stakeholders in the Pacific Northwest presentation. 

1/23 Software Dev - Software policies you should know, and O365

Cassandra Ladino Ladino, Cassandra C. presented on Software Policies You Should Know, followed by a discussion of O365 Tips and Tricks. Ever wonder about FITARA, FedRAMP, GitLab and Federal Source Code Policy, Systems and Records Notices, the USGS Software Management Website, or your new O365 tools, this is the meeting for you. Recording available on the SoftwareDev meetings page.


And that's it for January!

--
More CDI blog posts



You can add yourself to a CDI collaboration area by following the instructions on this wiki page.

12/2 Geomorphology Tools - GIS tools to locate potential sediment sources and Automated hydro-enforcement

The Geomorphology Tools group has been sharing methods that have been developed at different science centers in the USGS.

In December, GIS Tools to locate potential sediment sources in stream-channel networks was presented by Jen Cartwright of the Lower Mississippi-Gulf Water Science Center. More information about the tools is in the USGS publication: Automated identification of stream-channel geomorphic features from high‑resolution digital elevation models in West Tennessee watersheds.

Luke Sturtevant of the New England Water Science Center presented on Automated Hydro-enforcement, describing some tools that were developed to help with flood insurance studies. The tools use python and the Spatial Analyst toolbox in ArcGIS Pro to derive high resolution hydrologic models on a HUC8 scale from the most current lidar terrain at full resolution.


Image from Luke Sturtevant's presentation.

The Geomorphology Tools group is coordinated by Pete McCarthy McCarthy, Peter M. and recordings and more can be found on their wiki meeting notes page.

12/2 Metadata Reviewers - Questions on the metadata reviewers forum

The group reviewed questions from their forum, which collects and documents Q&A within the metadata reviewers community. The forum currently has 29 topics, including specifics about data dictionaries, metadata editors, metadata training, and more!

The Metadata Reviewers group is led by Fran Lightsom Lightsom, Frances L. and more can be found on their wiki meeting notes page.

12/9 Data Management - Data flow and connections between USGS data, publication, and web systems

The Data Management working group had two presentations:

How data flow from IPDS to CMS (the USGS Drupal web content management system), presented by Lance Everette, Web Reengineering Team (WRET). Lance described the data pipeline for data releases and also some details about special data such as provisional data.

Connecting USGS systems, presented by Drew Ignizio, Science Analytics and Synthesis. Drew described ongoing work that will inform how to structure information well when we collect it, what are connections to different tools, and how can data creation and formats be homogenized. This work will allow more efficient understanding of relationships between publications, data, authors, and USGS organizations.



An image from Drew Ignizio's presentation.

The Data Management Working Group is led by Viv Hutchison Hutchison, Vivian B. and Madison Langseth Langseth, Madison Lee , and December's recording can be found on their wiki meeting notes page.

12/10 Fire Science - USGS Fire Science Strategic Plan and Fire in Alaskan Boreal Forests

The Fire Science community of practice heard an update on the draft USGS Fire science Strategic Plan and Stakeholder Input, led by Paul Steblein and Mark Miller.

There was also a science talk: Aquatic Ecosystem Vulnerability to Fire and Climate Change in Alaskan Boreal Forests, given by Jeffrey Falke, Alaska Fish and Wildlife Coop Unit Leader.


Slide from Paul Steblein's presentation

The Fire Science community of practice is led by Paul Steblein Steblein, Paul Francis and Mark Miller Miller, Mark P. .  Meeting materials are posted on the Fire Science Forum.

12/18 Usability - Overview of Usability Concepts

The first meeting of the CDI Usability collaboration area was led by Sophie Hou. The collaboration area will alternate between two formats: live Town Hall meetings and posting of Resource Reviews. Sophie provided an overview of usability concepts including Definitions: Utility, Usability, Usefulness, User Interface (UI), User Experience (UX), Assessing Needs & Establishing Scope, Structuring Information, Creating Content and Features, and Testing & Validating User Feedback. Her presentation materials have many links to references and further information.

The group established a page to describe their usability interests and their willingness to participate in usability studies.

The key take-away: … usability is a necessary condition for survival..., it will help not only to gain users, but to develop partnerships.


Image from Sophie's presentation, retrieved from https://nimitmangal.wordpress.com/2013/09/19/what-is-usability/

The Usability group is led by Sophie Hou Hou, Chung Yi (Contractor) and she is looking for any interested co-leaders. Notes, slides, and recording from December can be found on the meeting wiki page.

12/19 Risk - Risk Analysis Meeting and Ecological risk assessment

What We Learned at the 2019 Society for Risk Analysis meeting - Kris Ludwig shared highlights from the recent Society for Risk Analysis conference.

Ecological risk assessment: demystifying our science to inform regulatory decisions - Jeff Steevens, Research Toxicologist, USGS Columbia Environmental Research Center (CERC) provided a brief overview of the science being developed by USGS CERC for aquatic ecological risk assessment. He shared a few real-world examples to show how a conceptual evaluation and a weight-of-evidence (WOE) approach can be used to integrate new techniques and data into the regulatory decision-making process. Jeff posed a couple of questions to the group, which we should revisit at a future meeting! 1) What methods/tools can we use to consolidate complex data so that it is not confusing to decision-maker? 2) How do we transition new tools/data within decision-making process?

This month's summary is provided by Kris Ludwig.


Image from Kris Ludwig's presentation.

The Risk Community of Practice is led by Kris Ludwig Ludwig, Kristin A. , Dave Ramsey Ramsey, David W. , and Emily Brooks Emily Brooks , and the meeting materials and recording can be found on their meeting wiki page (CDI member log-in required).

--
More CDI Blog posts

11/4 Metadata Reviewers - "Metadata" means different things to different people

November's topic was "identifying different types of metadata." From the Metadata Reviewers Meeting page: …"metadata" is a word that refers to a variety of things. It would be good for USGS to develop accepted terms for different kinds of metadata. If the Metadata Reviewers Community agreed on those terms and their meanings, we could lead USGS just by the way we talk and write. Let's see what we can agree on!

11/5 DevOps - USGS Software Policy

Cassandra Ladino gave an overview USGS Software Policy, including topics such as policy that affects purchasing and management of computer technology, authorized cloud solutions, and authorization of software installations. 

See more at the DevOps wiki page

11/12 Artificial Intelligence/Machine Learning - Classifying eagle telemetry data

Natalya Rapstine presented on AI/ML in the USGS enabled by Tallgrass: classifying golden eagle behavior using telemetry. She used a recurrent autoencoder neural network to encode sequential California golden eagle telemetry data, followed by an unsupervised clustering technique, Deep Embedded Clustering (DEC).


Two slides from Natalya Rapstine's November AI/ML presentation.

See more at the AI/ML meetings page. The AI/ML group is going to take a two month break and the next meeting will be in February.

11/14 Tech Stack - the Location Index Project

The Tech Stack group is hosting a series of linked data and spatial data integration talks. The discussion will be continued at the Earth Science Information Partners Winter Meeting coming up in January 2020. In November, Shane Crossman and Matt Purss presented on the Australian Location Index (LOC-I) http://locationindex.org/.


Frame from a video describing the Location Index project, which can be found at http://locationindex.org/.

CDI Tech Stack meetings are held jointly with the ESIP IT&I Tech Dive. See recording at the ESIP Tech Dive page.

11/18 Data Management - Bulk updating metadata

The Data Management working group's "Speed data-ing" session at the CDI workshop revealed a challenge about bulk updating metadata, and also corresponding expertise from other members of the group. Greg Gunther presented the Data Management Challenge - Updating Thousands of Metadata Records. Then VeeAnn Cross & Peter Schweitzer presented on Bulk Updating Metadata Records in Practice in the Coastal and Marine Hazards group.

The recording is available at the Data Management WG meeting page.

11/21 Risk CoP - Usability and Post Wildfire Response

Understanding the Value of Usability Evaluation. Mike Frame, Rachel Volentine, and Sophie Hou reviewed the general concepts of usability and user analysis, benefits seen by USGS, and current collaborations with the University of Tennessee User eXperience Laboratory. They also invited people to join the new Community for Data Integration Usability collaboration area.

After the Flames: The Science of Post Wildfire Response. Steve Sobieszczyk highlighted some of the requirements, tools, and activities tied into post-wildfire response and what it is like to deploy to an active fire incident as a burned area emergency response (BAER) team member.


Title slide from Steve Sobieszczyk's presentation.

See a more detailed meeting recap at the Risk CoP Forum.

--

More CDI Blog posts


November's monthly meeting had a visualization theme, building off a discussion that was first started at the CDI workshop this past summer. The presenters showed some examples of visualization tools in use at the USGS.

A poll showed some tools in use by the meeting attendees:

Visualization - start with a problem, not a tool.

Lindsay Platt reminded us that there are many different formats and tools for data visualization, and we should think about the user needs before committing to a specific tool.

Visualization with R

Laura DeCicco showed us some tools in R to develop visualizations directly from the data, such ass ggplot2, thus cutting down on mistakes in creating plots. She also shared many blog posts about other visualization techniques.

HoloViz

Rich Signell demoed HoloViz, "a coordinated effort to make browser-based data visualization in Python easier to use, easier to learn, and more powerful." The demo showed how compact code can create something almost like a dashboard with powerful interactive capabilities. The demo can  be found at https://github.com/reproducible-notebooks/HoloViz-CDI-Demo.

Interactive Mapping Tools for decision makers.

Ben Letcher demoed dynamic visualization interfaces at ice.ecosheds.org. Multivariate filtering is one of the neat capabilities of the tool, which is built on Vue/vuex, D3, Leaflet, and Cross-filter. He would like to have more discussions about the role of data visualizations in decision making.

Tableau

Dionne Zoanni gave an overview of the recent USGS Tableau workshop. She also shared several links to further resources for Tableau users in the USGS.


Finally, we unveiled the Community Voting Results from Phase 1 of the CDI Request for Proposals, showing widespread support for all of our 24 statements of interest.

See many more links and resources at the monthly meeting page!

--
More CDI Blog posts

10/1 DevOps - DevOps in the Water Mission Area

In October, the DevOps group heard updates about the Federal DevOps Community of Practice (Brian Fox, General Services Administration), the DevOps Federal Interagency Council (Annette Mitchell, IRS), and DevOps in the Water Mission Area (Ivan Suftin, USGS).

See more at the DevOps wiki page 

10/1 Geomorphology Tools - Riparian areas, flood inundation, and channel incision

A group discussing geomorphology tools and methods is now organized under the CDI wiki space. On October 1, Sinan Abood (USDA) presented on the USDA Forest Service National Riparian areas project (check out a storymap here). Greg Petrochenkov (USGS) and Fernando Aristizabal (NOAA) presented on Developing quick estimates of flood inundation for CONUS using an optimized GIS Flood Tool.


Image from National Riparian areas project storymap.

In a second meeting on October 31, Marina Metes presented on "Remotely detecting channel incision in headwater streams using lidar and topographic openness." She described a method to map reach-scale incision from lidar-derived digital elevation models using topographic openness, a landscape metric measuring the enclosure of an area (i.e. channel bottoms) relative to the surrounding landscape (i.e. stream banks). The method was validated with field surveys and local photogrammetric models of stream banks.

See notes and more information at the Geomorphology Tools Meetings Page.

10/7 Metadata Reviewers - Revisions to USGS Data Releases

The Metadata Reviewers group talked about the new Fundamental Science Practices guidance, a revision of Guidance on Documenting Revisions to USGS Scientific Digital Data Releases. There was a request for examples of revised data releases in ScienceBase and here are links to a few examples: 

  • Massachusetts Shoreline Change Project, 2018 Update: A GIS Compilation of Shoreline Change Rates Calculated Using Digital Shoreline Analysis System Version 5.0, With Supplementary Intersects and Baselines for Massachusetts, https://doi.org/10.5066/P9RRBEYK
  • Data for calculating population, collision and displacement vulnerability among marine birds of the California Current System associated with offshore wind energy infrastructure (ver. 2.0, June 2017), https://doi.org/10.5066/F79C6VJ0
  • Environmental DNA surveillance data for USGS streamgage sampling in the Columbia River Basin, 2018, https://doi.org/10.5066/P9Q8GCLM 

See more at the Metadata Reviewers Meeting page.

10/8 Artificial Intelligence/Machine Learning - Radiant MLHub: A Repository for Machine Learning Ready Geospatial Training Data

Hamed Alemohammad, Chief Data Scientist at Radiant Earth Foundation, presented on Radiant MLHub: A Repository for Machine Learning Ready Geospatial Training Data. Radiant Earth Foundation has established Radiant MLHub to foster sharing of geospatial training data for different thematic applications. Radiant MLHub is hosted on the cloud and users will be able to search for different training datasets, and quickly ingest them into their pipelines using an API. This presentation reviewed the architecture of Radiant MLHub, its API access, example applications for landcover classification from multi-spectral data and surface water detection from Synthetic Aperture (SAR) data.


Image from the Radiant Earth Foundation webpage.

See more at the AI/ML meetings page.

10/10 Tech Stack - Interoperability of environmental linked features

Dave Blodgett (USGS) provided an update on activities of the OGC Environmental Linked Features Interoperability Experiment (ELFIE) and a preview of activities of the Second ELFIE (SELFIE). The meeting was an interactive demonstration-based session focused on practical application of JSON-LD to link features and observations.


Image from the SELFIE webpage.

CDI Tech Stack meetings are held jointly with the ESIP IT&I Tech Dive. See recording at the ESIP Tech Dive page.

10/17 Risk CoP - Usability of volcano notifications

From the Risk Community of Practice meetings page:

We reviewed a few key announcements, including a preview to the FY20 Risk RFP Opportunity.

Following announcements, we heard from Carolyn Driedger, Hydrologist/PIO/Outreach Coordinator, Cascades Volcano Observatory and Tina Neal, HVO Scientist-in-Charge who discussed the process and results from a recent usability Study of the USGS Volcano HAzards Notification System (HANS). The Communications Work Group within the USGS Volcano Science Center began to write a 'Style Guide' for its HAzards Notification System (HANS) information products, which convey volcano status to the public and officials. As they wrote the style guide, they recognized the need for more formalized user input on which to base their guidance, both with formatting and wording. Carolyn shared some of the documents from this study and the group had a vibrant conversation about science communication following some questions she posed to everyone at the end of her talk. 


How should a daily update for volcano notifications be formatted for best readability and understanding?

See a more detailed meeting recap at the Risk CoP Forum

10/24 Software Development - Software Release and Inventories; USGS Cloud Environment Cookbook and Assessing lessons in scientific visualization

Recommended links about October's Software Dev topics are the USGS include the Software Management Website, and the USGS Scientific Software IM. The group also heard short talks from two CDI statements of interest that were up for evaluation: Aaron Fox, USGS Cloud Environment Cookbook and Steve Fick (for Michael Duniway), So you want to build a web-tool?: Assessing successes, pitfalls, and lessons learned in an emerging frontier of scientific visualization.


Front page of the USGS Software Management website.

See more at the Software Development wiki page.

--
More CDI Blog posts

--
See all CDI Blog posts

  • No labels