Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.

Blog from March, 2020

Highlights from the "Advancing FAIR and Go FAIR in the U.S." Workshop

I attended the "Advancing FAIR and Go FAIR in the U.S." workshop in February; the workshop covered topics on how to establish and promote FAIR culture and capabilities within a community. Many of the discussions were synergistic with the CDI activities, so I wanted to share some key points from the workshop with the CDI community. - Sophie Hou


(Logo from the Go FAIR Initiative)

Workshop Info 

Title: Advancing FAIR and Go FAIR in the U.S.  

Date: February 24th to 27th, 2020 

Location: Atlanta, Georgia 

Goals: 

  • Facilitate development of a community of practice for FAIR awareness and capacity-building in the US 
  • Improve understanding of FAIR technologies, and how to teach this to others 
  • Preparation for teaching or supporting FAIR data management and policies for researchers, local institutions, professional organizations, and others 

Link: https://www.sdsc.edu/services/data_science/research_data_services.html  

 

Overall Summary: 

  • The workshop highlighted that advancing FAIR requires communal effort. 
  • In order to "FAIRify," it is important for a community to first determine its scope, goals, and objectives. 

 

Key Notes: 

  • FAIR is an acronym from Findable, Accessible, Interoperable, and Reusable. 
  • Typical challenges that a community could face when working on FAIR include:
    • Knowledge gap
    • Institutional inertia
    • Community relationship building
    • Expanding FAIR capacity
    • Best way to adapt and adopt available FAIR resources
  • The ultimate goal of enabling FAIR is to allow both humans and machines (especially machines) to use digital resources, so that analytics and re-use can be optimized.
    • According to the Go FAIR Initiative (https://www.go-fair.org), FAIR can also be understood as Fully AI Ready. In other words, machines are able to know what the digital resources mean. Additionally, the digital resources are as distributed/open as possible, but can also be as central/closed as needed.
  • Implementation of FAIR can be challenging because many concepts in the principles are multifaceted (including social, resource, and technical considerations).
  • In order to advance FAIR, it is important to first establish a good (common) understanding of the FAIR principles.
  • FAIR requires technical and disciplinary resources, but it also requires community support.
    • When implement FAIR, we need to review choices and accept challenges; e.g. who is our "community", and determine what is specific to our "community".
    • FAIR is not a “standard”. The local community context is important and necessary.
  • The Go FAIR Initiative offers a 7-step "FAIRification" process: https://www.go-fair.org/fair-principles/fairification-process/ 
  • Options for conducting a FAIR event/activity with one's community include:
    • Multiple day, experts convening, tutorial/webinar, conference, unconference, hackathon, symposium, sprint, posters, etc.
  • Participants of an FAIR event/actiity might have the following expectations:
    • Share best practices/resources/learn new skills
    • Tackle a problem
    • Learn new concepts/skills
    • Use FAIR as a them to track for other topics
    • Collaborate to create a resource to be shared
    • And more!
  • Once a community has established its version of FAIR, it is important to connect with other communities. Convergence with different communities is key to grow FAIR. 


CDI's February meeting featured a discussion on the value of CDI to you, and a deep dive into Pangeo.

Pangeo: A flexible open-source framework for scalable, data-proximate analysis and visualization

Rich Signell, a Research Oceanographer at the Coastal and Marine Science Center in Woods Hole and member of the Pangeo Steering Council, presented an overview of Pangeo and examples of uses for Pangeo for several different types of USGS workflows. The Pangeo framework is deployed by Cloud Hosting Solutions (CHS) and funded by EarthMAP as a new form of cloud-based model data analysis. Community-driven, flexible, and collaborative, Pangeo is slowly building out a set of tools with a common philosophy. In one example, Rich used a Pangeo Jupyter Notebook to process a dataset in one minute that had previously taken two weeks. Cloud costs, skills, cloud-optimized data, and Pangeo development are issues that are currently being addressed.

For more:

https://medium.com/pangeo

https://discourse.pangeo.io/

https://gitter.im/pangeo-data

Pangeo and Landsat in the Cloud

Renee Pieschke, a Technical Specialist for the Technical Services Support Contract at the Earth Resources Observation and Science Center in Sioux Falls, SD, continued our Pangeo focus with some information on Landsat in the cloud. Renee and her team is looking to a spring release of collection two data, which will exponentially increase the amount of data available. Level 2 processing will be required for the collection two data (trying to get close to what it would be like if you were looking at the ground; taking out disturbances, clouds, etc).

The Landsat Look upgrade uses a cloud-native infrastructure and a cloud-optimized GeoTIFF format. It uses new SpatioTemporal Asset Catalog metadata to programmatically access the data. The new Landsat Look can filter pixels with a QA Band so that any clouds, shadows, snow, ice, or water is removed to produce the best possible image.

The SpatioTemporal Asset Catalog was developed to help standardize metadata across the entire geospatial data provider community, using a simple JSON structure. It normalizes common names, simplifies the development of third-party applications, and helps enable querying in Pangeo. Another in-progress goal is connecting with Landsat data in the cloud. Getting this Landsat data into the cloud involves converting the data to a cloud-optimized GeoTIFF format and this kind of data is already fueling the backend of Landsat Look.

USGS users can access Pangeo and some test notebooks through http://support.chs.usgs.gov/ and code.usgs. More information is available on the meeting slides.

Why is CDI valuable to you? Why do you participate?

A poll was administered on sli.do to participants to see what the value of CDI is to them. Some responses are below.

"I like to hear about (and share) the cool work folks are doing throughout the USGS! The Communities are valuable because they allow folks to share innovative research and discuss ways we can do so while following Department, Bureau, Mission Area policy."
"CDI provides relevant, useful, and timely data management related issues, projects, and tools."
"I learn about new technology applications and learn of colleagues I might collaborate with."
"The CDI helps me to get my work done in my daily job! I find the people who are part of the CDI are amazing to interact with - they are engaged, enthusiastic, and interested in making things better at USGS. CDI has made me feel like I am more in touch with the USGS - there is so much going on in this Bureau, and CDI keeps me informed and makes me feel like I am part of something bigger than just my daily job."
"Demonstrate that best practices in data sci/software/etc. is important to colleagues."
"Diverse community, wide range of experience and expertise."

More information, including notes, links, slides and video recordings on the meeting, are available here.


January's monthly meeting covered how to evaluate web applications and better understand how they are working for users, and explored well-established strategies for USGS crowdsourcing, citizen science, and prize competition projects. 

Application Evaluation: How to get to a Portfolio of Mission Effective Applications 

Nicole Herman-Mercer, a social scientist in the Decision Support Branch of the Water Resources Mission Area's Integrated Information Dissemination Division, presented on how to evaluate web applications based on use, value, impact, and reach, as defined below. 

Use  

Definition: take, hold, view, and/or deploy the data/application as a means of accomplishing or achieving something. 

  • How many people use this application? 
  • How many are new users? 
  • How many are returning users? 
  • Are users finding what they need through this site/application? 

Herman-Mercer used Google Analytics to answer some of these questions. Google Analytics provided information such as total daily visits, visits through time, what pages users are visiting and how they're getting there (links from another website, search, or direct visits), how often they're visiting, how many repeat visits occur, and how long users spend on individual pages. 

Value 

Definition: The importance, worth, and/or usefulness of the application to the user(s) 

  • How willing are users to pay for the application? 
  • How important is this application to the user's work and/or life? 
  • What/how large would the impact of the loss of this application be to the user? 

To estimate the value of selected applications to users, an electronic survey was sent to internal water enterprise staff, which asked respondents to indicate which applications they used for work, and then to answer a series of questions about those applications. Questions attempted to pinpoint how important applications were to users, and how affected their work would be should the application be decommissioned. 

Impact 

Definition: The effect the application has on science, policy, or emergency management 

  • How many scientific journal articles use this application? 
  • Is this application relevant for policy decisions? 
  • Do emergency managers use this application? 

Publish or Perish software for text mining was used to get at some of these data points. Publish or Perish searches a variety of sources (Google Scholar, Scopus, Web of Science, etc.) and returns any citations that applications are getting. Attempts to search for policy document citations has proven more difficult, and was not factored into this evaluation as a result. 

Reach 

Definition: How broadly the application reaches across the country and into society 

  • Where are users? (Geographically) 
  • Who are users? (Scientists? Academia? Government?) 

Google Analytics was again used to gather visits by state, which was then compared with the state population to get an idea of use. These analytics could also identify which networks users are on, i.e., .usgs, .gov, or .edu. Finally, an expert survey was deployed, surveying users who developed the application or currently manage it to get a sense of who the experts think the intended and actual audience is. 

Contact Nicole at nhmercer@usgs.gov for a detailed report on the full evaluation. 

Herman-Mercer's team was inspired by Landsat Imagery Use Case studies. 

USGS Open Innovation Strategy for Crowdsourcing, Citizen Science, and Competitions 

Sophia Liu, an Innovation Specialist at the USGS Science and Decisions Center in Reston, VA, as well as the USGS Crowdsourcing and Citizen Science Coordinator and Co-Chair of the Federal Community of Practice for Crowdsourcing and Citizen Science, presented an overview of well-established USGS crowdsourcing, citizen science, and prize competition projects. 

Citizen science, crowdsourcing, and competitions are all considered by Liu to be types of open innovation. Definitions of these terms are as follows: 

  • Citizen science: public participation or collaboration with professional scientists requesting voluntary contributions to any part of the scientific research process to enhance science. 
  • Crowdsourcing: a way to quickly obtain services, ideas, or content from a large group of people, often through simple and repeatable micro tasks. 
  • Competitions: challenges that use prize incentives to spur a broad range of innovative ideas or solutions to a well-defined problem. 

A popular example of citizen science/crowdsourcing is citizen seismology or public reports of earthquakes, like Did You Feel It? 

Liu has documented about 44 USGS crowdsourcing and citizen science projects, and 19 USGS prize competitions. Some examples of open innovation projects and information sources are listed here: 

Participants during the presentation were asked to use the following Mentimeter poll to answer short questions and provide feedback on the talk. 

Sophia is looking for representatives from across all USGS mission areas, regions, and science support offices interested in giving feedback on the guidance, catalog, toolkit, and policies she is developing for the USGS Open Innovation Strategy. Feedback can be provided by joining the USGS Open Innovation Strategy Teams Site or emailing her at sophialiu@usgs.gov. 

See the recording and slides at the meeting page. 

You'll probably want to join a new collaboration area after reading all of this exciting news. You can do that by following the instructions on this wiki page. You can also get to all CDI Collaboration Area wiki pages here. We recently added quick link buttons to meeting content to most collaboration area wiki pages.

2/3/20 Metadata Reviewers - Persistent unique identifiers for USGS metadata records

The group had a conversation about the need for persistent unique identifiers in USGS metadata records that could be used across different government systems, including usgs.gov and data.gov. Lisa Zolly presented some slides frame the conversation, and take-aways are on the Metadata Reviewers Meetings Page.


Slide from the Metadata Reviewers Community of Practice February meeting.

2/4/20 DevOps - USGS Map Production on Demand (POD)

February's DevOps meeting was like a Valentine's Day-themed love letter to the partnership between Dev and Ops. (Yes, that is a subjective opinion.)  Andy Stauffer (Dev) and Robert Djurasaj (DevOps) combined to present on "Automating the Deployment of a National Geospatial Map Production Platform Using DevOps Workflows." A dedicated DevOps team was critical for scaling up workflow and infrastructure for the Map Production On Demand (POD) system. See the recording at the DevOps Meeting page.


Slide from the DevOps February meeting presentation.

2/10/20 Data Management - What's the value of your project?

The Data Management working group meeting used an interactive virtual format to have small group discussion about developing data management value propositions. Science Gateways Community Institute superstars Claire Stirm and Juliana Casavan presented the essence of value propositions - A clear understanding of the unique value your project delivers to your users or stakeholders. Virtual breakout groups held discussions and came up with many answers to "Why is CDI important for data managers?" Claire and Juliana's tips on value propositions included being succinct and developing different value propositions for different audiences. See the slides and the value propositions at the meeting notes page!


A general formula for developing a value proposition statement.

2/13/20 Tech Stack - Urban Flooding Open Knowledge Network

Mike Johnson from UC Santa Barbara presented on the Urban Flooding Open Knowledge Network. This is an exciting stakeholder-driven knowledge network project with emphasis on prototyping interfaces and web resources. See other joint CDI Tech Stack and ESIP IT&I webinars on the ESIP page.

2/19/20 Usability - Choosing Usability Techniques

Sophie Hou presented on "Choosing Usability Techniques." The process starts with establishing the context: what are you trying to learn and why? What are the gaps? What information do we need to learn and why? Next you should decide on the types of data needed: Attitudinal vs. Behavioral vs. Qualitative vs. Quantitative. Thank you to Sophie for continuing to build our knowledge about how to improve usability in our projects! See the slides and the notes on the Usability meeting page.


Slide from the Usability group's February presentation showing the differences between different types of usability data.

2/20/20 Risk CoP - Human-centered design, frame innovation, and systems theory (Panarchy)

The Risk CoP hosted guest Scott Miles from Impact360 to kick off a special training series. Scott's presentation was loaded with information!

From the Risk Meetings page: This was the kickoff meeting for a series of training webinars provided by Impact360 Alliance on human-centered design thinking and inclusive problem solving. We met the Impact360 team and were introduced to the foundations and terminology of Impact360's toolkit, Toolkit360. Toolkit360 is a rigorous, intentional process to collectively amplify researcher and practitioner superpowers to integrate knowledge and action, unlike doing it “the way we’ve always done it” or working in silos. Toolkit360 fuses processes and methods from human-centered designframe innovation,and systems theory (Panarchy)The Toolkit360 process uses 12 tools to bridge the problem and solution spaces with situation assessment, stakeholder alignment, problem framing, and prototyping. Future training webinars during the Risk COP's March and April monthly meetings will take deeper dives into these tools. 

Access the slides and recording at the Risk Meetings page (must log in as a CDI member, join here if you're not a member yet).


Slide from the Risk Community of Practice's February meeting.

2/26/20 Open Innovation

Sheree Watson presented on Using Open Innovation to Engage Youth and Underserved Communities in the Pacific Islands. Sophia B. Liu followed with a discussion of why Open Innovation matters and how to participate in the USGS Open Innovation Strategy. See more ways to get involved in USGS Open Innovation at the Open Innovation wiki site!

2/27/20 Geomorphology Tools - Floodplain and Channel Evaluation Tool (FACET) and Hyper-Res Hydrography

Peter Claggett and Labeeb Ahmed presented on Mapping Channels & Floodplains: Hyper-Res Hydrography and FACET. FACET stands for Floodplain and Channel Evaluation Tool and more can be found at https://code.usgs.gov/water/facet.

Matt Baker (University of Maryland, Baltimore County) presented on 'Hyper'-resolution Geomorphic Hydrography: Methods, advantages, and shifting paradigm.

I always try to make myself look like I've been around for a long time with references to "do you remember when?" and I will do that now. I remember when the first LIDAR images started showing up at conferences and presenters would first show a standard (at the time) 30-meter DEM and then flip to the next slide that was LIDAR 1-meter resolution, and the whole audience would gasp in astonishment. Then this was followed by people saying "Well, none of the old tools work on LIDAR data, we have to build a whole bunch of new tools to analyze this higher resolution data." And that is what I thought of when watching these presentations. We've come a long way!

Access the recording at the Geomorphology Tools meeting page.


Slide from the Hyper-Resolution Geomorphic Hydrography presentation.

2/27/20 Software Development - API development for everyone

Jeremy Fee presented on Swagger and Micronaut. Jim Kreft showed the example of the Water Quality Data Portal at https://www.waterqualitydata.us/.

Learn more at:

  • https://swagger.io/ - "API Development for Everyone"
  • https://micronaut.io/ - "A modern, JVM-based, full-stack framework for building modular, easily testable microservice and serverless applications."
  • http://www.ogcapi.org/: The OGC API family of standards are being developed to make it easy for anyone to provide geospatial data to the web.

See the recording at the Software Dev meeting page!


Jim Kreft demonstrated some of the details of the National Water Quality Data Portal.


Thanks to our collaboration area group leads, who organized topics and speakers ! Lightsom, Frances L. , Masaki, Derek , Hughes, David R. , Langseth, Madison Lee , Hutchison, Vivian B. , Blodgett, David L. , Signell, Richard P. , Unknown User (chungyihou@usgs.gov) , Ludwig, Kristin A. , Emily Brooks , Ramsey, David W. , Liu, Sophia , Ladino, Cassandra C. , Guy, Michelle , Newson, Jeremy K.  


--

More CDI blog posts