Blog

  • 7/2/18 Metadata Reviewers Community of Practice - metadata requirements for legacy data sets
  • 7/9/18 Data Management Working Group - Metadata Wizard and a Science Data Catalog and data.gov update.
  • 7/17/18 eDNA Community of Practice call - Data Release
  • 7/18/18 Citizen-Centered Innovation Monthly Meeting - report to Congress about crowdsourcing, citizen science, and prizes and challenge competitions
  • 7/26/18 Software Development Cluster - Git Fork and Feature Branch Workflow

7/2/18 Metadata Reviewers Community of Practice - metadata requirements for legacy data sets

In July, the Metadata Reviewers Community took a look at the newly proposed metadata requirements for the old "legacy" data sets that have been traditionally released on USGS web sites. The USGS Web Re-engineering Team ("WRET") is planning on using these metadata to display legacy data. Lisa Zolly introduced the topic and led the discussion.

Metadata Reviewers Community Meeting page.

7/9/18 Data Management Working Group - Metadata Wizard and a Science Data Catalog and data.gov update.

July’s theme was Metadata! Ben Wheeler gave a brief update on Science Data Catalog (SDC), an important part of the USGS Public Access Plan.

Colin Talbert provided an overview and demo of the Metadata Wizard (version 2.0!), a tool for creating robust metadata.


See slides and recording on the meeting wiki page.

7/17/18 eDNA Community of Practice call - Data Release

JC Nelson led a discussion on data release issues, and recent developments in guidelines for data sharing agreements and software release policies and issues. Pete Ruhl is currently compiling examples of eDNA (environmental DNA)-related data releases. He can be contacted at pmruhl@usgs.gov.

Visit the eDNA wiki page.

7/18/18 Citizen-Centered Innovation Monthly Meeting - report to Congress about crowdsourcing, citizen science, and prizes and challenge competitions

The group began discussions on some upcoming federal activities related to crowdsourcing, citizen science, and prizes & challenge competitions as well as upcoming meetings in DC related to crowdsourcing and citizen science. The group will discuss the same topic at next month's meeting on Wednesday, August 15 with more updated materials.

  • A report to Congress about federal crowdsourcing and citizen science activities is due at the end of 2018 this winter. This report is required by the American Innovation and Competitiveness Act (of which the Citizen Science and Crowdsourcing Act is a component).

  • A draft of the Form for Collecting CCS Projects from the White House Office of Science and Technology Policy (OSTP) and the Science and Technology Policy Institute (STPI).

For more information, contact Sophia Liu, sophialiu@usgs.gov.

Citizen-Centered Innovation wiki page.

7/26/18 Software Development Cluster - Git Fork and Feature Branch Workflow

Carl Schroedl from the USGS Water Mission Area gave us an in-depth look at the Git Fork and Feature Branch Workflow. This method works well for his group, which incorporates code reviews in their workflow. 

  • Read more at this blog post on using the Fork-and-Branch Git Workflow

  • There are infinite possible workflows in Git, adjust it, make it work for you, Atlassian is a good resource to learn about different workflows https://www.atlassian.com/git/tutorials/comparing-workflows

  • If this workflow seemed too complicated for your purposes, you could simplify by not doing branches under a fork, or possibly not doing forks, but this takes away from the motivation and the benefits when getting code reviews.

  • Q: What about branch naming conventions? A: You can use an issue code/identifier. This can help you trace back to details on the issue or motivation for the code changes.


This is a diagram of a branching workflow. The new feature (purple) is merged into the stable project. More info at the Atlassian page on Feature Branch Workflow.

Software Development Cluster wiki page

--
More CDI Blog posts



At the Community for Data Integration July 11, 2018 monthly meeting, we heard about two programs in the USGS, the STEP-UP program and the Cloud Hosting Solutions program.

Chris Hammond told us about the STEP-UP (Secondary Transition to Employment Program - USGS Partnership) program. STEP-UP provides employment training to young adults (ages 18-22) with cognitive and other disabilities. Despite these disabilities, they may be highly competent at certain tasks, for example data preparation tasks. The overview will explained how the program works and described several success stories. The CDI has recently heard from a number of research groups that are looking for solutions to migrating legacy data or websites, we introduce the STEP-UP program as a possible solution to investigate further. To learn more, get in touch with Chris at chammond@usgs.gov.

We also heard an update on the latest services provided by USGS Cloud Hosting Solutions (CHS) from Jennifer Erxleben and Harry House. Cloud Hosting Solutions (CHS) is the required, supported, secure Cloud offering for USGS Science Centers and mission programs. Jennifer and Harry told us about CHS managed services, CHS custom services, and the sandbox environment. They also went over some example projects and costs. After the presentation, we had an active 30-minute long Q&A session, which is documented on the meeting page.

The best way to get started with CHS is to email cloudservices@usgs.gov.

See the recording and slides, and Q&A here.


--
More CDI Blog posts

Things learned in the writing of this blog post: What is SNP, how do you pronounce SNP, and what USGS research involves SNPs? What is cloud.gov? Where can I find a up-to-date list of USGS science centers that are used in ScienceBase and the USGS Science Data Catalog?

  • 6/4/2018 Metadata Reviewers Community of Practice - metadata implementation and FAIR metrics
  • 6/5/2018 DevOps - cloud.gov and NASA’s EOSDIS data system
  • 6/11/2018 Data Management Working Group - USGS data source list and metadata implementation
  • 6/14/2018 TechStack Meeting - analyzing massive video data in the cloud
  • 6/19/2018 Bioinformatics Community of Practice and SNPs

6/4/2018 Metadata Reviewers Community of Practice - metadata implementation and FAIR metrics

Ray Obuch provided an overview of the new Department of the Interior Metadata Implementation Guide, available at https://doi.org/10.3133/tm16A1.


The group also examined the proposed FAIR metrics, mentioned at a recent CDI Monthly Meeting, which have a lot to do with metadata. (FAIR stands for findable, accessible, interoperable, reusable.)

Peter Schweitzer shared this link about a similar but different way of thinking about the problem, "5 Star Open Data," https://5stardata.info/en/.

See meeting notes here.

6/5/2018 DevOps - cloud.gov and NASA’s EOSDIS data system

At the DevOps Project Management Sync meeting, topics included

  • An update from USGS Cloud Hosting Solutions (CHS)

  • An update on the USGS Software Management website, which is under development (Cassandra Ladino, USGS)

  • A world wind tour of cloud.gov and the default DevOps pipeline to deploy applications to it (Andrew Burnes, 18F). Cloud.gov is a secure, fully compliant Platform as a Service (PaaS), built specifically for government work. Find out more at What is cloud.gov?

At the DevOps SysAd/Dev Sync, Dan Pilone of Element84 presented on Supporting NASA’s Earth Observing System Data and Information System (EOSDIS). “NASA's Earth Observing System Data and Information System (EOSDIS) is working towards a vision of a cloud-based, highly-flexible, ingest, archive, management, and distribution system for its ever-growing and evolving data holdings. This effort is emerging from its prototype stages and is poised to make a huge impact on how NASA manages and disseminates its nearly 30PBs Earth science data as that grows to over 300PBs in the coming years. This talk outlines the motivation for this work, presents the achievements and hurdles of the past 18 months and charts a course for the future expansion of NASA’s cloud based EOSDIS.”

Bonus reading: EOSDIS Science System Description; EOSDIS Handbook v1.3

6/11/2018 Data Management Working Group - USGS data source list and metadata implementation

Drew Ignizio presented on the data source list that is used in ScienceBase and the USGS Science Data Catalog. One way this data source list is used is to attribute official USGS data releases to their related USGS science center (the data source). As USGS science centers merge or otherwise change name, having an up-to-date authoritative list is important, not just for ScienceBase and Science Data Catalog, but for linking many other systems in the USGS. The list that Drew previewed during the talk can be accessed at https://www.sciencebase.gov/directory/organizations?displayHints=SDC_List.

Ray Obuch (USGS) provided an overview of the new Department of the Interior Metadata Implementation Guide, available at https://doi.org/10.3133/tm16A1.

See the recording here.

6/14/2018 TechStack Meeting - analyzing massive video data in the cloud

Tim Crone of Lamont-Doherty Earth Observatory presented on Analysis of Massive Underwater Video Data in the Cloud using Pangeo.

Summary: An open-source environment for parallel analysis of massive (100TB) image data in the Cloud is now available via the Pangeo environment, which allows you to apply the power of the Python ecosystem from your browser. Technologies include JupyterHub, Kubernetes, Docker, and Dask distributed. Learn more about Pangeo at https://pangeo-data.github.io.

See the recording here.

6/19/2018 Bioinformatics Community of Practice and SNPs

The Bioinformatics Group conducted a survey and had a discussion to gauge interest in various genome analysis topics and outlets for technical exchange. There was interest in analyzing SNP datasets and sharing knowledge about current SNP projects. See meeting notes here.

Perhaps you are wondering, “What is an SNP, and how does USGS use SNPs?”

SNP stands for single nucleotide polymorphism, and is pronounced “snips.”  

An SNP is a variation in a single nucleotide that occurs at a specific position in the genome, where each variation is present to some appreciable degree within a population (e.g. > 1%). (https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism).

SNPs can generate biological variation between two members of a species. Those differences can in turn influence a variety of traits such as appearance, disease susceptibility or response to drugs. (https://www.23andme.com/gen101/snps/)

USGS studies SNPs of many species, a quick Pubs Warehouse search returns recent studies on steelhead trout, wolves, prairie falcon, fungus, salmonid, and the Florida panther. See more at https://pubs.er.usgs.gov/search?q=SNPs


SNPs explanation from https://www.23andme.com/gen101/snps/


--
More CDI Blog posts


The USGS has a large and active community using Geographic Information Systems tools, including Esri ArcGIS, QGIS, Python, R, gdal, and many others. Recently, the CDI has become more involved in co-hosting presentations and activities to discuss GIS technology, enterprise solutions, and challenges.

Shane Wright, Roland Viger, and Andy Lamotte are a few of the USGS folks that are helping to coordinate this community. (Thanks!)

After gauging community interest last winter, the CDI recently helped to host a two-part series on ArcGIS Pro.

Part 1: Transition from ArcGIS Desktop to ArcGIS Pro, and

Part 2: ArcGIS Python API.


James Sill from Esri demonstrated capabilities of the new ArcGIS Pro in two presentations.

The CDI is also happy to help promote the meetings of the Alaska GIS and Data Science Webinar (contact: Evan Thoms).

You can access further information, presentation recordings, and the GIS Forum at the CDI GIS Focus Group wiki page.

--
More CDI Blog posts

June’s Python for Data Management Training Series reached over 380 participants, addressing the topics of Working with Local Files, Batch Creating and Updating Metadata, and Automation with PySB (python tools for working with ScienceBase).

Drew Ignizio and Madison Langseth of Core Science Analytics, Synthesis, and Library, led the three 1.5 hour sessions for participants of varying levels of Python experience. The training made use of the Jupyter notebook and Python bundle that ships with the USGS Metadata Wizard 2.0. Especially helpful were the example Jupyter notebooks and data files that were supplied as course resources, allowing participants to execute code in real time, and have a copy of the code for future modification and use.

I attended the first two sessions and am getting ready to watch the recording of the third session, and I know I am not alone in telling Drew and Madison: Thanks! Great job! Very helpful! Good work! YOU WERE AMAZING! (Because those are all direct quotes from the feedback form.)

After these short sessions, I feel that I have the knowledge I need to get started with Jupyter notebooks for my own purposes.

If you missed them, you can download the course resources and watch the recordings on the course wiki site: Python for Data Management.


Behind the scenes with our excellent instructors for the Python for Data Management Training Series:

--
More CDI Blog posts

At the June 13, 2018 Monthly Meeting, CDI had a data visualization extravaganza.

Timely and digestible data visualizations

Jordan Read, the chief of the Water Mission Area’s Data Science Branch, spoke on “Amplifying USGS science with timely and digestible data visualizations.” Data show that web visualizations can reach a far larger audience than formal reports. Jordan gave us a view into the process of creating time-sensitive visualizations, such as those that address incoming hurricanes. He also noted that methods that allow reproducibility are key for efficiency, and that there are benefits of communicating more frequently between different mission areas in the USGS about data visualization techniques and projects.



Dashboard building software packages - Tableau, ArcGIS Online, and PowerBI

A team from the USGS Western Geographic Science Center presented on “Data visualization for science: comparing 3 dashboard building software packages.” Despite an ill-timed power outage at the Menlo office, Kevin Henry and Jason Sherba (and Jeff Peters in spirit) told us about their experiences with Tableau, ArcGIS Online, and PowerBI in visualizing data for hazard exposure analysis. They told us about the pros and cons, summarized in a slide that may “live on in infamy” (shown below). They stressed that the best platform will depend on your specific case, and encouraged further sharing of people’s experiences with data visualization platforms.



Other news from the monthly meeting:

  • To help the process of formalizing USGS data sharing agreement guidelines, JC Nelson asked for your examples of when you needed to sign data sharing agreements with another agency. Contribute here.

  • We polled you on tracks and topics for the 2019 CDI Workshop (June 4-7, 2019 in Boulder, CO). Top three tracks (voted by the CDI distribution list): Data visualization, data management, and data science! See more results at the monthly meeting page.


See the full meeting notes, slides, and recording at the monthly meeting page.



Disclaimer: Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the U.S. Government. 

--

More CDI Blog posts

See all of the CDI Collaboration Areas


Git, Remedy, and JIRA

USGS DevOps 5/1/2018

If you are interested in updates on Git and different task management systems like Remedy and JIRA, this is the group to watch. Eric Martinez from USGS kept us up to date with Git and two presenters from Tasktop (a provider of software integration solutions) presented at the May 1st meeting about integrating USGS IT and System Development teams by sharing information from their ticketing systems. DevOps wiki space for more information.

Genetics Guide to Data Release and Associated Data Dictionary

Metadata Reviewers Community of Practice 5/7/2018

The group discussed the Genetics Guide to Data Release and Associated Data Dictionary, which was compiled by Barbara Pierson at the Alaska Science Center. Metadata Reviewers Meetings page for more information.

Semantic Web news and ideas

Semantic Web Working Group 5/10/2018

The group shared news and ideas, such as using the new Quality Management System (QMS) for USGS as a good example for which to start developing a data dictionary database, a blog post about the Semantic Web by Ken Bagstad, and a competency framework for professional development in the use of linked data at http://explore.dublincore.net/. Semantic Web working group meetings page.

Major enhancements to NetCDF-CF

Tech Stack Working Group 5/10/2018

What’s new with the Network Common Data Form - Climate and Forecast?

"NetCDF-CF Advances - Simple Geometries, Swaths, and Groups" was the May topic for the Tech Stack group. Speakers were Dave Blodgett (USGS), Tim Whiteaker (UT Austin), Aleksander Jelanek (HDF Group) and Daniel Lee (EUMETSAT). Simple geometry (points, lines, and polygons) has now been accepted as part of the Open Geospatial Consortium’s NetCDF-CF specification. This a major enhancement to a widely used standard whose utility has previously been limited to time-series of point or (raster) coverage data only. Advances on Groups and Swaths will also be presented. Exciting! Tech Stack and ESIP Tech Dive meeting page.

Integrative, FAIR, multidisciplinary modeling

Data Management Working Group 5/14/2018

Data management to support integrative, FAIR, multidisciplinary modeling: Lessons from the last decade and paths forward - Ken Bagstad, USGS.

Since 2007, the Artificial Intelligence for Ecosystem Services (ARIES) project has been developing an open-source software package, modeling language, and data repository to enable integrated, multidisciplinary environmental and Earth systems modeling (more details here http://www.integratedmodelling.org/). Data Management Working Group May Meeting page.

Open Source Software Licenses - way more interesting than I thought

Software Development Cluster 5/31/2018

Software Licensing Aspects, Leon Foks, USGS

There are many ways to license software as open source. Using his own example of software developed for USGS, Leon Foks walked us through what he learned about placing proper licenses on USGS-produced software that may have some special considerations. It is much more interesting than just “Use CC0”. USGS software products must be in the public domain, meaning that copyright is waived. However it is best practice to apply an approved Open Source license to the software. Usually it is advised that we apply the CC0 license. However, we learned that in some cases, it is necessary and possible to release code with a dual license, with different licenses (e.g., CC0, MIT, BSD, LGPL, GPL) applying to different parts of the code. (You would explain the intricacies of the licensing in the README.md). Maybe I’m weird, but this presentation blew my mind. See the recording on the Software Development Cluster meetings page, and those with access to the USGS GitLab instance can view an .ipynb at https://code.usgs.gov/nfoks1/Software_Licensing.

--
More CDI Blog posts

At the May 9, 2018 CDI Monthly Meeting, we heard from three FY17 CDI Funded Projects:

Developing APIs to support enterprise level monitoring using existing tools

(Brian Reichert, Fort Collins Science Center and Becca Scully, PNAMP). NABat database (North American Bat database), NABat web portal and MonitoringResources.org are now linked with two way APIs, which will help collaborators to coordinate their sampling efforts.


Extending ScienceCache—a Mobile Application for Data Collection—to Accommodate Broader Use within USGS

(Mark Wiltermuth, Northern Prairie Wildlife Research Center). The ScienceCache application has been extended to use a more flexible data model, allowing more types of mobile app data collection. Testing phase will come later this year, contact Mark Wiltermuth (mwiltermuth@usgs.gov) if interested.


Evaluation and testing of standardized forest vegetation metrics derived from lidar data

(John Young, Leetown Science Center). A project on deriving vegetation metrics from lidar data provides 10m and 25m products for use by stakeholders such as staff at Shenandoah National Park.



A little bit more information about USGS lidar in the cloud from Jason Stoker:

There was a question from someone about the status of lidar in the cloud. We are in the process of replicating all lidar point cloud data that we serve on FTP in S3 as well. Our plan is to have all ~10 Trillion lidar points (110+ TB) that we have in archive on S3 by the end of the year. I believe we have a little more than half out there now, and growing.

Due to the large volumes of data (and small budgets) we only provide lidar point cloud data as a 'requester pays' option in S3. It is still free to access via FTP

A web page with instructions can be found here:

https://viewer.nationalmap.gov/laz_in_cloud_instructions/


We also announced to save the date for the next CDI Workshop: June 4-7, 2019 in Boulder CO!

Other topics included the FAIR data principles and metrics (https://github.com/FAIRMetrics/Metrics) for measuring how “FAIR” data is (Findable, Accessible, Interoperable, Resuable). It's so unFAIR! 


See the meeting recording, slides, and more details at the May Monthly Meeting Page!

--
More CDI Blog posts

Highlights from the last month of CDI Collaboration Area activity:

Metadata Reviewers Community of Practice 4/2/18

The group took a look at the Genetics Guide to Data Release and Associated Data Dictionary, which was spearheaded by Bobbi Pierson, Alaska Science Center Geneticist, and the Genetics Metadata Working Group. They found it to be a great resource for those that need to author genetics metadata under USGS guidelines. More meeting information.

USGS DevOps Sync 4/3/18

In April, the DevOps Project Manager Sync had several topics:

  • Update from USGS Cloud Hosting Solutions

  • Software management website update (Cassandra Ladino)

  • Zero Trust Networking: What is it? (internal link) (Tom Van Dreser)

  • Overview of Cloud activities at Cal Poly (Paul Jurasin)

 
The Zero Trust Model: Should we be taking the information security advice from Congressmen? Drawbridge Network, November 2016

Data Management Working Group 4/9/18

Lance Everette and Tara Bell presented in the theme of “Preserve”: Taking action against USGS legacy data challenges. See recording and slides at the meeting page.

Semantic Web Working Group 4/12/18

Alan Allwardt demonstrated the creation of a new set of persistent identifiers using the PURL system (https://archive.org/services/purl/). He used the example of the Data Categories of Marine Planning vocabulary, and described other use cases. (Notes)

Tech Stack Working Group 4/12/18

Jeremy Fischer from Indiana University presented on "Jetstream: A free national science and engineering cloud environment on XSEDE." (video)

Bioinformatics Community of Practice 4/17/18

Dr. Bonnie Hurwitz from University of Arizona demonstrated the iMicrobe platform that runs on CyVerse: https://www.imicrobe.us/. Slides and recording available at the meeting page.


GIS Community of Practice 4/25/18

The GIS Community of Practice hosted a webinar on ArcGISDesktop to ArcGIS Pro Transition. James Sill and Stephen Zahniser of Esri gave an overview of the user interface and architecture and a demo. We received over 50 questions and comments during the presentation via sli.do and chat and we’re working on getting the Q&A up on our wiki. Recording is available at the meeting page.


New ArcGIS Pro interface.


--
More CDI Blog posts



April 2018 Monthly Meeting

At the CDI monthly meetings, our goal is to bring you tools and information to help you do your daily work.

On April 11, 2018, we started with a review of the Reproducible Notebook Series, started in October 2017. The series has been showcasing different examples of reproducible and executable online notebooks. These notebooks are cast as the successor to the traditional scientific paper in a recent Atlantic article that has been making the rounds: The Scientific Paper is Obsolete.


Read about the history of reproducible notebooks in this article from The Atlantic.

Reproducible notebooks to access ocean biogeographic information

April’s reproducible notebook installment: OBIS (Ocean Biogeographic Information System) and R - Filipe Fernandes, SECORRA/IOOS (Southeast Coastal Ocean Observing Regional Association/Integrated Ocean Observing System). Filipe’s presentation used the jupyter nbviewer, creating a presentation directly from the notebook! He showed how to connect sea turtle observation points to create possible migration paths in the Atlantic Ocean.


Screenshot from Filipe's notebook, plotting and connecting sea turtle observations.

Biodiversity monitoring and citizen science

Taxa Taxi: An automated process for using citizen science data to facilitate biodiversity monitoring (Erin Boydston and Toni Lyn Morelli)

iNaturalist citizen science observations are helping researchers understand biodiversity monitoring (after some automated data processing). iNaturalist got a thumbs up from a meeting participant as a neat mobile app to take on your hikes.

Preserving USGS Legacy Data

USGS Data at Risk: Expanding Legacy Data Inventory and Preservation Strategies (Lance Everette and Tara Bell)

Rescuing legacy data at the USGS remains a Herculean effort. The Legacy Data Inventory Reporting System (LDIRS) and its evaluation criteria can help the USGS address this need.


Web mapping for photo collections

Web Mapping Application for a Historical Geologic Field Photo Collection (Sarah Nagorsen and Jason Sherba)

Need guidance for proper documentation and publication of geolocated photo collections? See the CDI-funded project on a web mapping application for photo collections.


Visit the April 11, 2018 CDI Monthly Meeting Wiki Page.

--
More CDI Blog posts

Some highlights from March 2018 CDI collaboration area activity:

Metadata Reviewers Community of Practice 3/5/18

The group discussed a CDI proposal to create specifications for USGS data products so that ISO standard metadata records can be created in tools like the ADIwg metadata toolkit (mdEditor, mdTools). (Update: funded). The group also got a sneak peak at the new Data Dictionary page on the USGS Data Management Website (Update: published).


Screenshot from the new data dictionaries web page.

USGS DevOps Sync 3/6/18

Brian Fox shared a cloud training resources wiki page.
Ross Wickman gave an update from Cloud Hosting Solutions (CHS).
Eric Martinez gave a presentation entitled Software Inventory, What it is, how it's made, and how you can make it better (
internal link). More info: https://sourcecode.cio.gov/


Tech Stack Working Group 3/8/18

Zarr: A simple, open, scalable solution for big NetCDF/HDF data on the Cloud": Alistair Miles, University of Oxford. The motivation, current status and future plans for Zarr were discussed, along with a demo of basic functionality, and, an analogy between virtual machines and cows. (link to video)

Data Management Working Group 3/12/18

Capturing your processing and analysis workflow in R - Alison Appling. Alison introduced tools in R for dealing with reproducibility of analysis, size and complexity of analysis, collaboration on analysis, and dissemination. (Just a sampling of tools: remake, drake, googledrive, sbtoolswhisker). (slides)


R tools for modern data analyses

eDNA Community of Practice 3/20/18

The group discussed potential activities for future conference calls. The group also maintains links to eDNA talks being hosted outside of the CDI on their wiki page.

Software Development Cluster 3/29/18

Chris Johnson presented on USGS EDGE (Equipment Development Grade Evaluation): What is it, how does it apply to you, and why you may be interested in participating. You can access the recording on their meetings page if you are logged in.

 
EDGE presentation


--
More CDI Blog posts


The CDI Pi Day (3.14.2018) monthly meeting was overflowing with content - here are some highlights. Check out all the details, including recording and slides, at the monthly meeting page.

FAIR Data

We continued learning about the FAIR Data Principles - I and R stand for Interoperable and Reusable. Awareness of these principles is growing within the CDI.

February (left) and March (right) polls about the FAIR data principles - growing the awareness of Findable, Accessible, Interoperable, and Reusable data!

CDI proposals advancing to full proposal stage

Cheryl Morris gave the opening announcements, displaying the 18 CDI Proposals that moved to Phase 2, shown around the CDI Science Support Framework. For teams that did not advance to Phase 2, the we always welcome further discussion about how to better frame projects with CDI principles (and we’re not just saying that). She also reminded us that the FY19 Request for Proposals is not too far away, and encourages groups to start the discussion.


FY18 proposals advancing to full proposal stage, around the CDI Science Support Framework.


Group Announcements - help design the USGS Software management website

Group Announcements - A USGS Software Management Website is being planned and Cassandra Ladino (ccladino@usgs.gov) is looking for volunteers to help with the design - this includes everyone from the individual scientist developing software to large development teams. See all announcements.

Science for a Risky World

Kristin Ludwig briefed us “Science for a Risky World: A USGS Plan for Risk Research & Applications” giving us more information about efforts around the USGS that we can join.


Funded project presentations

There were four CDI funded project presentations from last year, sharing their findings regarding making data more accessible, high throughput computing and docker containers, benefits and limitations of using Tableau for USGS data, and new technologies that allow us to “do science” in the cloud.


Kate Allstadt - An Interactive Web-based Application for Earthquake-triggered Ground Failure Inventories

Richard Erickson - Flocks of a feather dock together: Using Docker and HTCondor to link high-throughput computing across the USGS

Jeff Peters - Visualizing community exposure and evacuation potential to tsunami hazards using an interactive Tableau dashboard

Rich Signell - Exploring the USGS Science Data Life Cycle in the Cloud

Tell us about your experience in this year's Community Voting on Statements of Interest

New Monthly Meeting Highlights

  • We’re trying out a new “Highlights” section on the Monthly Meeting pages that will list major links and resources presented at the meeting, these will be posted well before my blog posts! 

Selected links from March 14, 2018

  1. Landslide Inventory Web Application: http://doi.org/10.5066/F7D799CT
  2. Data Series Report for the Earthquake-triggered ground-failure inventories: https://pubs.er.usgs.gov/publication/ds1064
  3. Code for semi-automation of metadata creation for landslide inventories: https://github.com/usgs/landslides-metadata
  4. Code repository for the HTCondor project: 

    https://my.usgs.gov/bitbucket/projects/CDI/repos/hunting_invasive_species_with_htcondor

  5. Tsunami Evacuation Tableau app: https://geography.wr.usgs.gov/science/vulnerability/oahuEvacDashboard.html

  6. USGS Data Life Cycle in the Cloud: https://github.com/USGS-CMG/data-life-cycle-cloud


The frequency of exciting CDI collaboration area meetings is far greater than the frequency of my writing about them. Here are some highlights from the past two months:

The friendly metadata editor

Looking for a new, browser-based, easy-to-use metadata authoring tool? mdEditor is here!! Visit mdeditor.org to check it out. (See slides at the Metadata Reviewers Meetings page.) 


mdEditor: the friendly metadata editor (best tagline ever)

Cloud Training Resources

The DevOps group has started a Cloud Training Resources page. (Example: Amazon Web Services "What is Cloud Computing?") If you find other training opportunities, please let Brian Fox (bfox@usgs.gov) know, so that he can add them to the list.

Online platforms for data analysis

Online platforms for data analysis have arrived, as illustrated by the recent Tech Stack/Tech Dive presentations. The webinar page has links and recordings for The Pangeo Project (an open-source big data science platform), and the National Data Service Labs Workbench (a scalable platform for research data access, education, and training).

Tidy data and more

The Data Management Working Group has covered several topics including: Publishing metadata to the Science Data Catalog and Data Management Challenges (Jan 2018); Tidy data, Biological Analysis Packages, and Volunteered Geographic Information (Feb 2018).

Bonus: Read the original Tidy Data paper (Wickham, Journal of Statistical Software, 2014).  Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.


Is your data tidy?

Linked data to protect cultural heritage 

In February, the Semantic Web Working Group tuned in to a National Park Service presentation on the use of linked data to protect cultural heritage resources in the national parks from climate change, using the Digital Index of North American Archaeology.


The Digital Index of North American Archaeology interface.

All things microbiome

In February, the Bioinformatics group covered All things microbiome. Many different groups within the USGS have some element of microbiome research - check out the USGS Fact Sheet on microbiome research for more information.


Image from the USGS Fact Sheet on microbiome research.

The USGS EDGE program and more

Open Source Coffee Talks decided to combine forces with the Software Development Cluster as of February 2018. The Software Development cluster discussed the topics of the USGS HPC/HTC Workshop, the USGS EDGE (Equipment Development Grade Evaluation) program (more info), and 508 Compliance (IT Accessibility) for websites and web applications. In March, Chris Johnson will give a presentation with further information on the EDGE Program.

New Groups! Citizen-Centered Innovation, eDNA, and GIS

The Citizen-Centered Innovation group held its inaugural call on February 21, 2018. Anyone interested in crowdsourcing, citizen science, civic hacking, and challenge & prize competitions are encouraged to join. Contact Sophia B Liu at sophialiu@usgs.gov for more information.

The eDNA Community of Practice held its inaugural call on January 16, 2018. They will be held every other month at the same time slot as the Bioinformatics group (3rd Tuesday from 2-3p ET). Contact Pete Ruhl (pmruhl@usgs.gov) for more information.

The USGS GIS community followed up on their inaugural call with a message about next steps. You can reply to this short form to log your interest in future talks and topics, including ArcGIS Pro, Serving GIS Data with ESRI, Open-source GIS topics, GIS on the cloud, and Global mapper. You can also suggest a new topic!


Whew. My next goal: Update the blog with collaboration area news in less than two months! 

View all CDI Collaboration Areas

--
More CDI Blog posts

On February 14, 2018, the CDI Monthly Meeting started off with an introduction to the FAIR data principles and what you can do about it. We learned about "F" for findable and "A" for Accessible.

Kyle Enns and Cristiana Falvo from the USGS gave a presentation on "Using Python to Bring Geophysical Data to the Surface", showing the CDI another example of a way to officially share python scripts for reproducibility. Kyle and Cristiana also shared the documents they use for Pre-review quality control, Releasing accessible python code, and their Technical peer review checklist (log in at the meeting page to view).


The feature presentation was "Semantic web for scientific information: streamlining how we write, find, link, and reuse data and models" by Ferdinando Villa of the Basque Centre for Climate Change. After describing the challenge of data and model integration and reuse, and a project he is working on to address the problem (The Integrated Modelling Partnership, www.integratedmodelling.org), he invited us all to come join in the adventure of working together in partnership to build an integrated information landscape! You can contact him at ferdinando.villa@bc3research.org.


Ferdinando's presentation was followed by a panel discussion on the semantic web and the USGS with Ken Bagstad, Dalia Varanka, Julia Moriarty, and hosted by myself. It was clear that we need more time to learn from each other about the challenges and opportunities of the semantic web!

You can view the recording and slides on the February 2018 monthly meeting page.

--
More CDI Blog posts


Kevin Gallagher opened the January 10, 2018 meeting by announcing the CDI FY18 Request for Proposals, which was released on December 18, 2017. This year, there is a topical focus on Risk Assessment and Hazards Vulnerability.

In our Reproducible Notebook Series, Chris Sherwood presented his experience with officially publishing a Jupyter Notebook on code.usgs.gov as part of an official code release. You can see the finished product at https://code.usgs.gov/usgs/whcmsc-rdc/tree/v1.0.


Our reproducible notebook series highlights repeatable, executable, and documented methods in Jupyter Notebooks.


Brian May, who manages the USGS FOIA (Freedom of Information Act) program, presented "How the Freedom of Information Act impacts Data." Did you know that the USGS receives and processes over 200 FOIA requests a year? Brian’s talk touched on some of the more routine questions posed to the FOIA program, however he is happy to provide more detailed trainings or discussions on the topic to a smaller group. You can reach him at foia@usgs.gov.



Finally, I gave a brief overview of the CDI Request for Proposals Process: Past, Present, and Future. This presentation was an opportunity for me to emphasize some of the unique features of our RFP, such as the community commenting and voting, encouragement to make new connections and discuss and promote in-progress ideas, and the benefits of participating in the voting process. Your vote counts, and as CDI members, you are registered voters!


The CDI has funded over 80 projects since 2010.

All slides and the recording are available on the January 10, 2018 Meeting Page.

--
More CDI Blog posts