Blog

Metadata Reviewers, 4/1/2019: MonitoringResources.org

Summary extracted from notes of Fran Lightsom Lightsom, Frances L. , lead of the Metadata Reviewers group:

Sheryn Olson Olson, Sheryn Joy demonstrated the metadata collecting system used by MonitoringResources.org to encourage discussion of how it might be simpler and easier to use, as well as good ideas that the rest of us can copy. MonitoringResources.org is part of the Pacific Northwest Aquatic Monitoring Partnership (PNAMP) and uses the metadata to provide an index of monitoring activities, especially the ecology of streams of the U.S. Pacific Northwest, and the procedures, protocols, and monitoring designs that are in use.

View more notes and the presentation slides on the Metadata Reviewers Meetings page.


DevOps Sync, 4/2/2019: DevOps at Housing and Urban Development

Summary provided by Derek Masaki Masaki, Derek , co-lead of the DevOps group:

Presenters: Kevin Portanova, Director of IT for Public and Indian Housing, and Mel Hurley, DevOps Manager. The presentation provided an overview of the shift that HUD is taking away from traditional on-premise IT operations toward cloud-focused DevOps. Kevin and Mel took us through their process of re-organizing a contractor based IT environment, re-factoring their development process, and creating a Federal employee centric staff oriented toward Agile and a DevOps workflow in the Microsoft Azure environment.

See the slides on the DevOps Meetings page.

Data Management Working Group: 4/8/2019: USGS Trusted Digital Repositories and a USGS Data Manager Position Description Series

The DMWG heard two presentations, first from John Faundeen Faundeen, John L. and Natalie Latysh Latysh, Natalie   about  “Becoming a USGS Trusted Digital Repository,” and second from Viv Hutchison Hutchison, Vivian B. and John Faundeen on “Progress on a USGS Data Manager Position Description Series.

The slides and recording are posted on the meeting page.

Tech Stack Working Group, 4/11/2019: Pachyderm

John Karabaic presented on Pachyderm, a data science platform that lets you deploy and manage multi-stage, language-agnostic data pipelines while maintaining complete reproducibility and provenance. Read the docs here: http://docs.pachyderm.io/en/latest/index.html

Tech Stack calls are joint with the ESIP Interoperability and Technology Tech Dive Webinars. You can review the recording here.

Bioinformatics Community of Practice, 4/16/2019: White Shark eDNA

Kevin Lafferty Lafferty, Kevin D. , senior ecologist at Western Ecological Research Center, presented on White Shark eDNA. In recent work he has been refining methods to get better data from white shark eDNA. Kevin is based in Santa Barbara, CA, and surely made many people jealous while describing data collection with instruments on paddle boards.

View the recording on the Bioinformatics Meetings page.

Kevin is looking for new collaborations within USGS and you can email him at klafferty@usgs.gov if interested. (Remember: data collection with instruments on paddle boards.)

Sophia Liu Liu, Sophia led a discussion covering many topics, including the OSTP Draft Report to Congress for the Crowdsourcing and Citizen Science Act, a Dept of the Interior Generic Information Collection Request, the USGS Open Innovation Strategy, the CitizenScience.gov Website, including USGS CCS Projects, and Past and Upcoming Events like the Citizen Science Association (CSA) Conference - March 13-17, 2019, and the Federal Crowdsourcing Webinar - Episode 1: Citizen Science, and upcoming Federal Crowdsourcing Webinars that can currently be found on this page: https://digital.gov/events/. Sophia’s use of Mentimeter added a great element of interactivity to the meeting. See more on the group wiki page.

Risk Community of Practice, 4/18/2019: Communities of Practice and User Engagement in ShakeCast

Kris Ludwig Ludwig, Kristin A. and Dave Ramsey Ramsey, David W.  lead the Risk CoP and hosted a call with presentations about the benefits of communities of practice (Leslie Hsu lhsu, CDI Coordinator) and user engagement in the development of ShakeCast (Dave Wald Wald, David J. , Seismologist).

With respect to user engagement, Dave shared several titles that present “logical approaches for bringing products to users,” including The Power of Habit, Contagious, To Sell is Human, Nudge, Made to Stick, Diffusion of Innovators, and The Undoing Project. Book club, anyone?

View the presentations and recording on the Risk Meetings page.



Reads related to user engagement recommended by David Wald.

Software Development Cluster, 4/25/2019: Building Connections and Desktop Installers/Code Signing

Cassandra Ladino Ladino, Cassandra C. led a discussion on building connections, inspired by this Better Scientific Software post: Building Connections and Community within an Institution.

The group had recently fielded a question about desktop installers, and the challenges of code signing. An internal site on application and script signing was shared. Some group members were also of the opinion that providing a method to install your application using Anaconda (on all OSs) was adequate.


--
More CDI Blog posts

A huge thanks to the three CDI Project teams who presented at our April Monthly Meeting.

An Interactive Web-based Tool for Anticipating Long-term Drought Risk

Caitlin Andrews Andrews, Caitlin Marie , a landscape ecologist in the Southwest Biological Science Center, explained how she used Rshiny and Amazon Web Services to create an interactive, online, front-end for a proven model of ecosystem water balance, SOILWAT2. This tool helps to predict and understand site-specific risk of future drought. Lots of lessons here for people who want to make user-friendly online tools out of more traditional scientific models within the USGS IT ecosystem. Code repository at https://github.com/DrylandEcology

Knowledge Extraction Algorithms (KEA): Turning Literature Into Data 

Matt Neilson Neilson, Matthew E. , a fishery biologist and co-lead for the Nonindigenous Aquatic Species Database program, delivered the line of the day: We are living in a machine-readable world. His project uses natural language processing and the xDD (eXtract Dark Data, formerly GeoDeepDive) literature database to improve, modernize, and greatly increase the efficiency of literature review. For people who used to walk to the library and photocopy stuff (and record radio songs on cassettes and dial with rotary phones), this is strange, but I will attempt to evolve with the times. See more information, like code repositories, in the Related External Resources links on the project's ScienceBase page.

Mapping land-use, hazard vulnerability and habitat suitability using deep neural networks

Jon Warrick Warrick, Jonathan , research geologist in the Coastal/Marine Hazards and Resources Program described the software tools, resources, and training workshops developed to allow USGS scientists to apply deep learning to remotely sensed imagery and better understand natural hazards and habitats. The 2 in-person workshops on these tools held in 2018 were able to accommodate only a fraction of the interested applicants. The CDI hopes to be able to provide more trainings like this to help build deep learning expertise and capacity in the USGS. See more at https://github.com/dbuscombe-usgs/cdi_dl_workshop and https://github.com/dbuscombe-usgs/dl_tools.


Log in to see the meeting recording and slides at the meeting page.

--
More CDI Blog posts

Software Development Cluster, 3/28/2019: Brainstorming future discussion topics

Cassandra Ladino led a brainstorming session for topics that could be discussed within the Software Development cluster, using sli.do to collect ideas and trello to organize them. Some ideas included: code.usgs.gov - what is it, who should use it and when; Using US Web Design System in USGS web sites; Docker training for distributing scientific software; Python APIs using Swagger and/or Flask; How to grow grassroots development efforts to enterprise systems; Creating a community of practice for unit testing code so that it can be easily reviewed by anyone in the software dev community; Should there be separation between scientific software and web development software discussions? (pros and cons). Lots of exciting topics!

Software Dev contacts are Michelle Guy Guy, Michelle , Blake Draper Draper, Blake A. , and Cassandra Ladino Ladino, Cassandra C. .

Risk Community of Practice, 3/21/2019: Inaugural meeting

Risk Community of Practice leads Kris Ludwig and Dave Ramsey introduced the new Risk Community of Practice, reviewed the USGS Risk Plan and implementation plans for FY19, and announced the FY19 Risk RFP.  The purpose of the group is to

    • build connections across centers, programs, mission areas

    • create a central point of contact for USGS risk research and applications

    • identify needs and opportunities to benefit the community

    • generate project ideas

    • share resources, expertise

Besides the Risk Plan, another recent publication mentioned was Assessing Hazards and Risks at the Department of the Interior—A Workshop Report, by Nate Wood, Alice Pennaz, Kristin Ludwig, Jeanne Jones, Kevin Henry, Jason Sherba, Peter Ng, and others.


Risk Community of Practice contacts are Kris Ludwig Ludwig, Kristin A. and Dave Ramsey Ramsey, David W. , who can be reached at riskyworld@usgs.gov.

Tech Stack Working Group, 3/14/2019: Integrating SciServer and OceanSpy to enable easy access to oceanographic model output

Mattia Almansi from Johns Hopkins University presented on Integrating SciServer and OceanSpy. OceanSpy is an open-source and user-friendly Python package that enables scientists and interested amateurs to use ocean model data sets with out-of-the-box analysis tools. OceanSpy builds on software packages developed by the Pangeo community (in particular xarray, dask, and xgcm). OceanSpy accelerates and facilitates exploration (including visualization) of terascale data. (Adapted from the presentation abstract.)


See more, including a link to the recorded session, on the group presentation website, hosted by ESIP - the Earth Science Information Partners. TSWG contacts are Dave Blodgett Blodgett, David L. and Rich Signell Signell, Richard P. .

Semantic Web Working Group, 3/14/2019: Semantic Web at the CDI Workshop

The Semantic Web Working Group held a discussion about Semantic Web elements at the upcoming CDI Workshop. Ken Bagstad mentioned the breakout session he is co-leading at the workshop, which will include semantics in the context of predictive modelling, intersecting with artificial intelligence and machine learning. Other topics included FAIR (findability, accessibility, interoperability, and reusability) in machine- and human-readable contexts and the importance of standard data dictionaries.

See more at the group meeting notes page.  The SMWG contact is Fran Lightsom Lightsom, Frances L. .

Artificial Intelligence/Machine Learning Community of Practice, 3/12/2019: New Infrastructure for Deep Learning at the USGS and AI for Ecosystem Services

Jeff Falgout Falgout, Jeff T. presented on infrastructure for deep learning and Ken Bagstad Bagstad, Kenneth J. presented on AI for Ecosystem Services.

What I learned at the AI/ML group call:

    • USGS is setting up a new machine for AI, it is named Tallgrass after this NPS park in Kansas

    • Projected timeline for the set up: mid April - Tallgrass Installation; Early May - friendly testing; early June - general availability.

    • Reminder of what GPUs are vs. CPUs

    • AI for Ecosystem Services: What if our data and models could talk to one another, and decision makers could use scientific information to more quickly and reliably answer questions about today’s most urgent problems? Find out more at http://www.integratedmodelling.org

    • JC pointed out some activity on the AI/ML forum and encouraged members to post

    • Group leads reminded members to contribute to a spreadsheet for collecting USGS AI/ML project descriptions to communicate to USGS leadership.


You should think of this image whenever we mention the Tallgrass infrastructure. (from the NPS Tallgrass Prarie website)

Meeting notes and recordings at the group meetings page. Contacts for the group are JC Nelson Nelson, John C. and Pete Doucette doucette.

Data Management Working Group, 3/11/2019: Brainstorming topic ideas

Cassandra Ladino led the working group in a discussion of topics to be discussed at the CDI Workshop or at future DMWG meetings. Some ideas for further discussion included:

    • Data Management Plans - streamlining process from DMP to publishing; enforcing; hosting

    • QMS (Quality Management System for USGS labs) integration with data management and records management

    • Metadata for the National Digital Catalog

    • More information and guidance on USGS Software Release

    • UAS (Unmanned Aircraft Systems/AKA Drone) data

    • Data sharing agreements

See more information in the attached slides at the meeting page. The DMWG contacts are Viv Hutchison Hutchison, Vivian B. and Cassandra Ladino Ladino, Cassandra C. .

DevOps Sync, 3/5/2019: Booz Allen Hamilton DevOps Environment and CI/CD pipeline

Martin Folkoff, lead DevOps engineer at Booz Allen Hamilton provided a technical overview of the DevOps environment he has designed and the CI/CD (continuous integration/continuous deployment) pipeline employed by his teams at BAH. He provided a look at the tools he uses to orchestrate his production environments.

See more information at the DevOps Meeting page. The DevOps contacts are Derek Masaki Masaki, Derek and David Hughes Hughes, David R. .

Metadata Reviewers Community of Practice 3/4/2019: News and Updates

The Metadata Reviewers Community of Practice will be hosting a breakout session at the CDI Workshop to provide guidance for data and metadata review, and tips and tricks for data and metadata authors. Virtual participation is planned.

The ISO Content Specs project will be hosting workshop sessions on Thursday and Friday at the CDI Workshop. The sessions will focus on collecting requirements for metadata specification modules, most likely modules for experimental data, computational data, and observational data. To learn more, contact Dennis Walworth Walworth, Dennis H. , Fran Lightsom Lightsom, Frances L. , or Lisa Zolly Zolly, Lisa .

See more news on the group meeting notes page. The Metadata Reviewers CoP contact is Fran Lightsom Lightsom, Frances L. .


--
More CDI Blog posts

CDI Workshop - From Big Data to Smart Data

At the March 13, 2019 monthly meeting, CDI’s executive sponsor Kevin Gallagher talked about the theme of this year’s CDI workshop: From Big Data to Smart Data - this concerns turning our huge volumes of diverse data into usable, actionable, integratable, or “smart” data. Registration for the workshop (June 4-7, 2019 in Boulder, CO) is open and can be found on the workshop wiki page.



We heard presentations from three FY18 CDI Funded Projects:

Nonindigenous Aquatic Species Alert Risk Mapper

Wesley Daniel Daniel, Wesley Michael presented on the Nonindigenous Aquatic Species Alert Risk Mapper and reported that the team will be posting a write-up of their challenges transitioning to ArcGIS Pro as part of their outcomes. See more accomplishments on their ScienceBase page.


Transition to ISO metadata

Dennis Walworth Walworth, Dennis H. and Fran Lightsom Lightsom, Frances L. presented on the Transition to ISO metadata project and reported that the project team will host several activities at the CDI workshop, they are looking for users to test their interface. They are using the previously-funded mdEditor application (ScienceBase page) in their work.


CDI Risk Map, Risk Workshop Report, and New Risk Community of Practice

Nate Wood Wood, Nathan J. and Jeanne Jones Jones, Jeanne M. presented on the Department of Interior Risk and CDI Risk Map. They reported many links that are available for Department of Interior users to test out, including data description, codebase, the risk map, GeoServer, and the API. CDI members, go to the meeting page and log in to view their slides - links are on the last slide.


The DOI Risk Workshop Report is out! Wood, N., Pennaz, A., Ludwig, K., Jones, J., Henry, K, Sherba, J., Ng, P., Marineau, J., and Juskie, J., 2019, Assessing hazards and risks at the Department of the Interior—A workshop report: U.S. Geological Survey Circular 1453, 42 p., https://doi.org/10.3133/cir1453.

The newly formed Risk Research Community of Practice is getting started, get in touch with Kris Ludwig Ludwig, Kristin A. and Dave Ramsey Ramsey, David W.  at riskyworld@usgs.gov to learn more.


--
More CDI Blog posts

DevOps Sync 2/5/2019 - Web Informatics and Mapping Program, WIM

Hans Vraga from the Web Informatics and Mapping Program (WIM, wim.usgs.gov) gave an overview of the group, of which he is the Project Manager. WIM is a web development shop that has cooperators from both within and outside of the USGS. Some of their products include a SPARROW model output visualizer, StreamStats, and a WHISPers wildlife event reporting system (coming soon).

As you can imagine, their expertise is in high demand. Things they look for in cooperators include a match of scientific/subject matter expertise to complement their group’s technical expertise, the cooperator as an active product owner, focusing on development and minimizing time for operations, and fast turnaround time projects. Check out their website or contact Hans Vraga, Hans Wegmueller for more information.

Derek Masaki Masaki, Derek and David Hughes Hughes, David R. are the points of contact for the DevOps group.


From the Web Informatics and Mapping homepage at wim.usgs.gov.

Metadata Reviewers Community of Practice, 2/4/2019

In February, the group had two major questions come up for discussion - these were passed along to the appropriate committees and officials for guidance and answers were produced quickly!

First: Is there updated guidance on the volume of data necessary to trigger a separate data release? (As opposed to a table in a publication.) Short answer: Having the data in the paper is ok - however, if data is big enough to be moved into a supplemental section of the paper, it has to be a USGS data release.

Second: How should authors reference data that is not publicly available when writing a manuscript? Short answer: there is updated guidance on the FSP “Guide to Data Releases” page for data that are not available at the time of publication, or that have limited availability owing to restrictions, in the section Data Associated with a Publication.

Madison Langseth Langseth, Madison Lee  helped to facilitate February’s discussion and replies. See past notes and future topics on their meetings page.

Artificial Intelligence/Machine Learning, 2/12/2019

John Stock @ of the USGS Innovation Center joined to talk about some opportunities available for postdoctoral research, future workshops, and future discussions related to AI/ML in the USGS. The joint USGS-NASA postdoctoral fellowships are now posted: https://geography.wr.usgs.gov/InnovationCenter/fellowship.html

Pete Doucette Doucette, Peter Joseph presented a talk “Ruminations on AI and Land Imaging.” He included a great intro on the difference between the AI and machine learning of decades ago versus the capabilities now (e.g. neural networks versus DEEP neural networks). Several land imaging projects and datasets at the USGS are becoming more “analysis-ready” for data science, predictive analytics, and to inform decisions. For example, see “Continuous change detection and classification of land cover using all available Landsat data.” Zhu and Woodcock 2014.

A major theme was the need for the combination of disciplinary expertise and AI/ML expertise, essentially team science, in order to reach the full potential of AI/ML. (See the NAS report Enhancing the Effectiveness of Team Science.)

A White House Fact Sheet on “Accelerating America’s Leadership in Artificial Intelligence” was shared with the group by Mona Khalil @mkhalil and Leah Colasuonno Colasuonno, Leah Taylor .


A few slides from Pete Doucette's talk on AI and Land Imaging.

Semantic Web Working Group, 2/14/2019

Cassandra Ladino Ladino, Cassandra C. stepped in to lead the February Semantic Web Working Group discussion, which focused on the theme of FAIR (Findable, Accessible, Interoperable, Reusable) in USGS. The group discussed ideas for a proposed FAIR Workshop, including the topic of new approaches and technologies to further enhance FAIRness at USGS. See the meeting notes for more resources and references.

Tech Stack Working Group, 2/14/2019

The joint ESIP Tech Dive - CDI Tech Stack presentation was on “Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo,” by Scott Henderson, University of Washington. “The integration of new technologies with several high-level Python packages are enabling Cloud-native workflows and circumvent the bottleneck of downloading large amounts of data.”

Aptly summarized: “If that doesn’t get people excited I don’t know what will,” said Rich Signell Signell, Richard P. , co-chair of the Tech Stack Group.

Link to slides, data, tutorial, and blog post on the ESIP Tech Dive page.


Screenshot from a demo linked to the post "Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo."

Bioinformatics Community of Practice, 2/19/2019

The latest monthly eDNA webinars organized by Scott Cornman Cornman, Robert S. was on CALeDNA (California Environmental DNA), by Rachel Meyer of UCLA. CALeDNA capitalizes on the enthusiasm of citizen scientists - they provide kits for collection of data in the field. Data collectors also take iNaturalist observations for benchmarking. The data are provided online for the public to identify patterns, and are also used for academic research on topics like phylogenetic diversity and functional diversity.

CALeDNA used the Kobo toolbox to build their data collection form, they found it to be the most robust platform for cell phone data collection. https://www.kobotoolbox.org/

rANACAPA - an R package developed so that non-specialists without community ecology background can generate the relevant plots. Ranacapa: An R package and Shiny web app to explore environmental DNA data with exploratory statistics and interactive visualizationshttps://f1000research.com/articles/7-1734/v1

Check out one of their case studies and the data visualizations available! https://data.ucedna.com/research_projects/pillar-point

The Bioinformatics and eDNA groups alternate months that they meet but have some overlap in content and membership.


A few slides from Rachel Meyer's talk on the California eDNA program. 

Software Development Cluster, 2/28/2019

The Software Development Cluster hosted a discussion on Cloud and Big Data in the Cloud. Cassandra Ladino started off the discussion with a presentation on Cloud and Big Data, including a summary of resources she has been using to learn more. There is information in the notes on how to sign up for a USGS Cloud Hosting Solutions Sandbox.

Michelle Guy Guy, Michelle See the Software Development Cluster page for more info and link to meeting notes.


--
More CDI Blog posts

Our first monthly meeting of 2019 was on February 13, and we heard about forward-looking water research tools, new outputs to help resource managers deal with invasive species, and information about how to get the most out of the upcoming June CDI workshop. View the recording and slides on the February 13 Monthly Meeting page.

Water Research Tools

Tony Castronova of CUASHI (Consortium of Universities for the Advancement of Hydrologic Science, Inc.) gave an overview of HydroShare and CUAHSI-JupyterHub, tools that help researchers to develop, save, and share water research workflows. This gave a cool perspective on tools that use USGS water data and complement existing USGS tools. CUAHSI has a large education component, including plentiful cyberseminar presentations that address topics of interest overlapping with the CDI!


Hydroshare workflow at https://www.hydroshare.org/

Forecasting Invasive Species and stakeholder engagement

Jake Weltzin opened a series of CDI funded project presentations that will occur in the next few months, presenting on “Workflows to support integrated predictive science capacity: Forecasting invasive species for natural resource planning and risk assessment.” In addition to the daily map forecasts and other outputs about invasive insect activity, the project team is working on a report that will outline their experiences with stakeholder engagement.


Screenshot of an animated slide showing invasive insect activity through time. More at www.usanpn.org/data/visualizations and www.usanpn.org/data/forecasts

Community for Data Integration in-person workshop in Boulder, June 4-7, 2019

Finally, Madison Langseth and I gave some of the latest information about how everyone can benefit from the upcoming CDI Workshop in Boulder, June 4-7. Right now we are focusing on getting community members to submit and comment on session ideas by the end of February, so that we can organize the topics in early March. Also, we are working on stepping up our game for virtual participation and interactive content that will help members meet and connect with each other.


Join us on March 13 for our next monthly meeting and more presentations from CDI community selected and supported projects!


--
More CDI Blog posts

Metadata Reviewers, 12/3/2018 - Q&A and resources on creating and reviewing metadata

The Metadata Reviewers group met and continued to share resources for effective metadata review. Among the topics that they discussed:

The group also has recent Q&A posted on their Metadata Discussion Forum.

Group contact: Lightsom, Frances L. (flightsom@usgs.gov).

Data Management Working Group, 12/10/2018 - Pubs Warehouse, ISO for USGS, and the DOI Tool

The DMWG had three very informative presentations in December!

    • Kelly Haberstroh – Updates about the Publications Warehouse

    • Dennis Walworth – Updates on ISO for USGS: content specifications and current status of ADIwg

    • Lisa Zolly – Updates to the Digital Object Identifier Tool

Group contacts: Hutchison, Vivian B. (vhutchison@usgs.gov) and Ladino, Cassandra C. (ccladino@usgs.gov)

CDI DMWG wiki page


Artificial Intelligence/Machine Learning, 12/11/2018 - Sharing AI/ML work at the USGS

Nelson, John C. hosted the first AI/ML CDI call and discussed plans for the group. Over the next several months, we will hear from different researchers around the USGS that are incorporating AI/ML techniques into their work. The group will also be a forum for questions for practitioners, such as one asked by Michelle Guy: Are people doing AI/ML work in the cloud, on local GPU hardware, or another option?

The group will stay in touch with another USGS effort focused on AI and image processing. This group was initiated in the Ecosystems Mission Area and is led by Mona Khalil. Mona held calls on 12/18 and 12/19 that focused on hearing about current activities and resources for AI and image processing.

Both groups mentioned the dl_tools lectures and toolboxes that were developed and presented by Buscombe, Daniel D. and others with support from the CDI!

For lessons: https://github.com/dbuscombe-usgs/cdi_dl_workshop
For tools:  https://github.com/dbuscombe-usgs/dl_tools

Group contacts: Nelson, John C. (jcnelson@usgs.gov) and Doucette, Peter Joseph (pdoucette@usgs.gov)


dl_tools method schematic from https://dbuscombe-usgs.github.io/dl_tools

Tech Stack, 12/13/2018 - JupyterLab Extensions

The Tech Stack group invited Ian Rose (University of California, Berkeley) to demo “Developing JupyterLab Extensions.” Ian took us through a live demo of the process of building a JupyterLab extension. “In fact, the whole of JupyterLab itself is simply a collection of extensions that are no more powerful or privileged than any custom extension.”

For fun: From the JupyterLab documentation: Let's Make an xkcd JupyterLab Extension

Group contacts: Blodgett, David L.  and Signell, Richard P. 

Visit the joint Tech Stack and ESIP Tech Dive webinar page to see the next few months of topics!

Completed xkcd extension screenshot
Image from the JupyterLab documentation: https://jupyterlab.readthedocs.io/en/stable/developer/xkcd_extension_tutorial.html


--
More CDI Blog posts


Another month and another group of topics - stay informed!

Metadata Reviewers, 11/5/2018 - Guidelines for metadata review

The group, led by Tamar Norkin, had a discussion on the Guidelines for Metadata Review. They discussed ways to improve the usability of the document as an actual checklist, and what information would be good to include, such as “tips and tricks” for metadata reviewers. Looks like a great resource for anyone who is called upon to review metadata!

See more notes on their meetings page.

DevOps 11/6/2018 - recreation.gov

In addition to regular updates on the USGS Git Hosting Platform and the USGS Software Management website, in November the DevOps group heard about recent Recreation.gov activities from Shums Hoda and Martin Folkoff of Booz Allen Hamilton.

Recreation.gov is a gateway to discover America's Outdoors and more, a place for trip planning, information sharing and reservations with information from 12 federal Participating Partners.

The website is at  https://www.recreation.gov.  API documentation of the RESTful services for the Recreation Information Database are at https://ridb.recreation.gov/docs. Other topics covered included microservices and domain driven design, and high level architecture.


What's the tech behind reserving your campsites at recreation.gov?

Tech Stack 11/8/2018 - Intake

Martin Durant (Anaconda) presented on "Intake: Lightweight tools for loading and sharing data in data science projects"

Intake has a nice tag line: “Taking the pain out of data access and distribution”

Intake is a set of free open-source Python tools that help load data from a variety of formats into familiar containers like Pandas dataframes, Xarray datasets, and more. Boilerplate data loading code can be transformed into reusable Intake plugins. Datasets can be described for easy reuse and sharing using Intake catalog files. Martin will gave an overview of Intake and demonstrated use via Jupyter Notebooks. You can check out the video here.

https://intake.readthedocs.io

https://github.com/ContinuumIO/intake



eDNA, 11/13/2018

Austen Thomas presented data on a backpack-style eDNA acquisition device, including aspects of flow regulation and filter pore size. Austen also presented data on the performance of a field test for specific targets relative to conventional laboratory approaches. A paper describing some of these results is available here:

Thomas, A. C., Howard, J., Nguyen, P. L., Seimon, T. A., & Goldberg, C. S. (2018). ANDe™: A fully integrated environmental DNA sampling system. Methods in Ecology and Evolution, v. 9(6), 1379-1385. https://doi.org/10.1111/2041-210X.12994   

Semantic Web, 11/15/2018

The group had a discussion about what's happening with the FAIR Principles (here is just one explanatory website about FAIR), the CDI Proposal Process, the CDI 2019 Workshop (June 4-7, 2019 in Boulder, CO).




--

More CDI Blog posts

In November, we heard more about the CDI Request for Proposals and commenting and voting in this year’s process. The proposals process is one of the major ways that we are able to share our ideas and comments as a community of practice. We are using new tools this year, and so far the commenting on our wiki and the voting through SimplyVoting seems to be working. All CDI members should have received a ballot on November 30 and the deadline to vote is Friday, December 14 at midnight!

USGS Director Reilly dropped by to talk about Artificial Intelligence and Machine Learning and opportunities for the USGS to capitalize on these techniques. JC Nelson and Pete Doucette will be leading a new CDI Collaboration Area in Artificial Intelligence and Machine Learning, and they are having their first meeting on December 11, more details are on the group’s wiki page.

3DEP Lidar Products and Elevation Services

Rob Dollison from the National Geospatial Program presented on “The new 3D Elevation Program Lidar Products and Elevation Services from the National Map.” The National Map has a new web presence, map service notifications, and several viewers to browse the data, including the National Map Viewer, Elevation Viewer, and a Lidar explorer. They are moving to a system where you don’t need to download large volumes to your local drives, instead, basic visualization, analysis, and extraction functions are available through services on an open platform.



ESIP Lab Opportunities

Annie Burgess from ESIP spoke about ESIP Lab Opportunities - funding from the Earth Science Information Partners and ways that CDI members could participate. Their community and goals are very similar to the CDI, but within a larger context of other agencies and institutions. The latest ESIP Lab round closes on December 18. Check out previous projects and outputs on their webpage.


ESIP Lab - facilitating pathways for 'data people' to engagement with critical developer communities.


We're taking a break from monthly meetings in December and will see you on January 9, 2019!

--
More CDI Blog posts


At the October 10, 2018 CDI monthly meeting, we heard about ongoing projects that could help us with our spatial data workflow, share solutions for the challenges of integrating incomplete and disparate data, and allow us to test and use technologies for storing and managing large volumes of data.

First, Kevin Gallagher gave us a preview of the FY19 CDI Request for Proposals themes - Biosurveillance of emerging invasive species and health threats, building national datasets, reusing previously funded CDI outputs, and enabling FAIR (Findable, Accessible, Interoperable, Reusable) data. The official Request for Proposals was released the following week and you can see the details here: https://my.usgs.gov/confluence/display/cdi/2019+Proposals

The deadline for 2-page statements of interest is November 16, 2018!

Next, I had a brief Q&A with Sky Bristol about building a spatiotemporal feature registry. This is a concept about designing and building a system for usable and repeatable processes that use spatial features. Sky is looking for feedback on how such a system can be built broadly to benefit many people. I hope to have more Q&A with CDI members and their projects in the future!

Ben Mirus from the Geologic Hazards Science Center presented on Assembling a National Scale Map of Landslide Inventories from Incomplete and Disparate Spatial Data. From his presentation, some topics that came up to explore further with CDI are: figuring out what other types of disciplinary data have this type of incomplete and disparate data (for example, species occurrence), and what is the theory about quantitatively analyzing incomplete and disparate data (for example, a dataset that is a mix of point locations and polygons of landslide scars).

Previous landslide compilation.

Matt Davis, from the Advanced Research Computing group, presented on A Cost Effective Approach to Scientific Data Storage and Management: BlackPearl and Globus. This presentation was exciting because we often get questions about how we in the USGS are supposed to meet data release requirements, or even share within a group of researchers, large volumes of data. Here, large files >>10GB. Matt let us know that YES, there are new options for storing and managing large data that are available to USGS researchers now (in beta). To get started, contact hpc@usgs.gov and tell the Advanced Research Computing team about your data needs.


An image from Matt Davis' presentation.

--
More CDI Blog posts

Looks like October brought back collaboration area activity in full swing. Here are October’s topics and discussions in reverse chronological order!

Data Management 10/29/2018: Trust Repositories, but FAIR-ify, puns on FAIR data

The Data Management Working group held a special session - Wade Bishop of University of Tennessee presented his findings on a data fitness-for-use study. In his study he asked participants to consider a recent example of when they searched for data and decided if it was fit for them to (re)use. Then he asked questions related to each of the elements in the FAIR data framework (Findable, Accessible, Interoperable, Reusable). Wade provided many fine puns on “FAIR” (if that is FAIR to say) and quotes such as “Deciding if data is fit for reuse is kind of like thumping on a melon or smelling bread before you buy it.” (Maybe you had to be there?) Participant quotes provided interesting insights, such as the metadata-data disconnect - do people understand how metadata and keywords are helping them to discover or use data? Perhaps if data providers do such a good job in making data FAIR, the data consumers will not even notice, they will just happily reuse the data. Slides can be found on the DMWG meeting page.


Software Development 10/25/2018: Discussing the USGS Git Migration Plan

The Software Development Cluster discussed a draft Git migration plan (link accessible by Dept of Int) for USGS. Last June, an announcement about the USGS Git Platform (link accessible on the USGS network) was distributed. Members of the Software Development Cluster are providing information to help USGS code repository owners meet the requirements on the announcement. Note that the plan is still in early draft and open to suggestions. The contact for the plan is Eric Martinez, emartinez@usgs.gov.

Subduction Zone Focus Group 10/18/2018: Cascadia Recurrence Project Meeting Notes

The Subduction Zone Focus Group posted notes from their October meeting, summarizing ongoing projects, new members, and other opportunities. Topics included land-level changes along the Olympic Peninsula, SZ4D Research Coordination Networks, a Cascadia Recurrence database, a Mendenhall Fellowship focused on Cascadia landslides now being advertised, automated turbidite analysis, tsunamis, and recent papers and reports from the M(agnitude)9 project.


Snapshot of a data compilation for a Cascadia 3D seismic model, summary of the locations of 34 individual controlled-source wide-angle seismic imaging experiments dating to the 1960s. (T. Brocher)

Bioinformatics 10/16/2018: CDI Request for Proposals Discussion

The Bioinformatics Community of Practice had a discussion about the newly released CDI Request for Proposals, including what is in scope, how to meet the 30% in-kind match, and how the two-phase selection process works. Notes can be found on the RFP Collaboration forum.

Tech Stack 10/11/2018: Building a SpatioTemporal Feature Registry

The Tech Stack group didn’t have a live meeting, but Sky Bristol made a video demonstrating some of the concepts behind a SpatioTemporal Feature Registry. The group was encouraged to ask questions about the video using our wiki page. Further discussion is at the ESIP-hosted IdeaScale ideation page.


Semantic Web 10/11/2018: semantic approaches to enable FAIR data at USGS

The Semantic Web working group discussed semantic approaches to enable USGS data to be FAIR (Findable, Accessible, Interoperable, Reusable). They used the list of FAIR Principles at https://www.go-fair.org/fair-principles/, which includes links to explanations. Notes can be viewed at their meeting page.

eDNA CoP 10/2/2018: Examples of recent eDNA data releases

The eDNA community of practice created a page sharing recent example data releases for environmental DNA.

DevOps 10/2/18: Develop Intelligence training opportunities

In FY19, DevOps will consolidate to one meeting per month with both Project Management and SysAd/Developer Topics. Sarah Battani from Develop Intelligence gave an introduction to their DevOps Academy training opportunities.

https://www.developintelligence.com/catalog


Metadata Reviewers 10/1/2018: the state of USGS Metadata

The group took stock of the state of USGS metadata: and challenges and needs.

Fran set up a wiki page at https://my.usgs.gov/confluence/display/cdi/Metadata+Reviewers+Training+Collection as a place to share resources on Metadata Reviewers training.


Learn more at the CDI Collaboration Area Page.

View the CDI Calendar to see upcoming meetings.

--
More CDI Blog posts



  • Data Management Working Group, 9/10/18 - connecting existing data assets and our new USGS websites

  • Semantic Web Working Group, 9/13/18 - potential future plans with the USGS Thesaurus and with FAIR data

  • Citizen-Centered Innovation Monthly Meeting, 9/17/18 - the upcoming Crowdsourcing and Citizen Science Report to Congress

Data Management Working Group, 9/10/18

Lance Everette presented information to help update data managers and web masters on how to use new tools that are available for connecting the Science Data Catalog, ScienceBase, and the new USGS websites.

Raad Saleh (rsaleh@usgs.gov) from EROS sought ideas and examples of processes for transitioning research data to operational data.


Presentations given by Lance Everette and Raad Saleh found at the meeting page.

Semantic Web Working Group, 9/13/18

A small group talked about the history, status, and future plans of the working group. Some potential future plans included working on the USGS Thesaurus, as presented at the September CDI Monthly Meeting, and activities make USGS data more consistent with the FAIR Data Principles – not just focusing on integrating data to support a particular use, but improving our data practices so that all USGS data is findable, accessible, interoperable, and re-usable for multiple unanticipated uses. Contact Fran Lightsom (flightsom@usgs.gov) if you are interested in participating. More information at the SWWG Meetings page.

Citizen-Centered Innovation Monthly Meeting, 9/17/18

Sophia Liu led the monthly meeting, addressing questions or issues that people had about the Crowdsourcing and Citizen Science Report to Congress. She is working on getting the USGS contribution reviewed as it gets closer to the deadline - the final report will be submitted in January 2019 or later. Let Sophia (sophialiu@usgs.gov) know if you would like to schedule a meeting to discuss your report in more detail before she begins the review process.


Learn more at the CDI Collaboration Area Page.

View the CDI Calendar to see upcoming meetings.

--
More CDI Blog posts

At the September 12, 2018 CDI Monthly Meeting, topics included sedimentary geology data, online python training, the CDI request for proposals, a spatiotemporal feature registry challenge, STEP-UP student opportunities at the USGS, Bayesian networks, and the USGS Thesaurus. View the recording, slides, Q&A, and highlighted links on the meeting page.


Field and Outcrop Data Challenge

September’s Scientist’s Challenge came from Anjali Fernandes at University of Connecticut - “do you know of an open access database that offers archival of outcrop scans (geo-referenced point clouds) & surfaces mapped on said scans, as well as geo-referenced grain-size distributions, geochemical analyses, sedimentary facies descriptions, etc.?” Initial answers include OpenTopography, Safaridb, and resources at virtualoutcrop.com.

Learn Python for Data Science Together

After August’s successful foray into online learning with DataCamp’s Git tutorial, we’re going to try the Introduction to Python for Data Science module next. It is about a 4-hour commitment and I will send reminders from the period October 3-October 24. Read more here and sign up here.


What is the CDI Request for Proposals all about?

We’ve updated the 2019 Proposals wiki space in preparation for the next round of CDI project ideas!

SpatioTemporal Feature Registry

Sky Bristol presented a challenge in finding the appropriate and best sources for spatial features including boundaries, identifiers, and associated information. Read more and add your ideas at the ESIP-hosted IdeaScale site.

STEP-UP Student helps with a legacy data management challenge

Sue Kemp presented on the experience working with a STEP-UP student to remotely work on a legacy data management challenge - the SageMap site. If you think your center has a STEP-UP opportunity for a student, you can submit it at this Google Form.

Empowering Decision Makers

Erika Lentz presented some lessons learned through the ongoing conversion of a probabilistic modeling framework from proprietary to freely available open-source software. The project goal is to create a portable interactive web-interface to demonstrate how interdisciplinary USGS science and models can be transformed into an approachable format for decision-makers, such as those making decisions about impacts of sea level rise.


Using and Improving the USGS Thesaurus

Peter Schweitzer presented on the USGS Thesaurus: what it is, how you can use it, and how you can improve it. The USGS Thesaurus is an important resource that helps us to categorize, browse, and compare the data and science at USGS by using a controlled vocabulary. It is incorporated into multiple USGS data management tools, and is accessible here: https://www2.usgs.gov/science/about/.  

Peter described opportunities to correct, refine, and extend Thesaurus concepts; create cross-walks to other controlled vocabularies; build more web services and application interfaces; and help other people use this resource effectively. The presentation led to an extensive Q&A which can be found on the meeting page. Contact Peter (pschweitzer@usgs.gov) if you are interested in learning more.


--
More CDI Blog posts

8/7/18 DevOps: Meet the Software Development Cluster, Migrating to Amazon Web Services at Cal Poly
8/9/18 Tech Stack: EarthSim Lightweight Python Tools  
8/13/18 Data Management: Preserve
8/15/18 Citizen-Centered Innovation: Report to Congress
8/30/18 Software Development: A Deeper Dive into Git

8/7/2018: DevOps meets the Software Development Cluster

Project Management Sync

Michelle Guy gave an overview of the CDI Software Development Cluster activities. Sharing information across different CDI collaboration areas is a great way to learn from related, but separate, groups of expertise. DevOps expressed an interest in being more informed of other CDI activities and we shared the CDI Calendar.

SysAd and Developer Sync

Paul Jurasin, Theresa May, and Ben Butler, of California Polytechnic State University shared their experiences from their institution’s migration to Amazon Web Services. They stressed the importance of introducing enough training to accompany new tools, the importance of putting people first and keeping them informed during major institutional shifts in technology, and the importance of acknowledging different skills, values, and priorities of different groups of people (such as developers, systems people, and infrastructure people.) Thanks to the presentation team for sharing their experiences in a major enterprise migration to the cloud.


A slide from the Cal Poly team's presentation on migration to Amazon Web Services. 

8/9/18 Tech Stack: EarthSim Lightweight Python Tools

Dharhas Pothina, from the US Army Engineer Research and Development Center, presented on EarthSim: lightweight python tools for environmental simulation. EarthSim provides a set of tools that can easily be reconfigured and repurposed as needed to rapidly solve specific emerging issues. By interacting and visualizing data in the browser, it is easier to deliver products to customers, and allows users to run the tools locally or on HPC.

EarthSim is a website and github repo, a place to try things out and see examples. http://earthsim.pyviz.org/

See the recording on the joint CDI Tech Stack / ESIP Tech Dive website.


8/13/18 Data Management: Preserve

The Data Management Working Group focused on the “preserve” theme in August, with two presentations.

Chris Bartlett, USGS Records Officer and Chief, Information Management Branch, presented on the relationship between Records Management and Science Data.

Larry Reedy, Records Disposition Coordinator, presented on the NARA ARCIS system to submit scientific records to Federal Records Centers.

See the slides and recording on the DMWG meeting page.


A slide on records management areas from Chris Bartlett's talk. 


8/15/18 Citizen-Centered Innovation: Report to Congress

Topics discussed at the August Citizen-Centered Innovation call included

Contact Sophia Liu, sophialiu@usgs.gov, for more details.


From http://www.citizenscience.org/association/conferences/citsci2019/

8/30/18 Software Development: A Deeper Dive into Git

George Rolston from USGS Cloud Hosting Solutions shared his knowledge and enthusiasm for Git, in particular, different Git branching strategies.

He shared the following resources:


The recording is accessible from the Software Development meetings wiki page.



From https://git-scm.com/book/en/v2


See all CDI Collaboration Areas.

--
More CDI Blog posts


At our last virtual monthly meeting on August 8, 2018, we heard about the upcoming Community for Data Integration Request for Proposals, opportunities for group learning about Git, and recent data-related activities at the USGS Office of Enterprise Information.

CDI sponsor Kevin T. Gallagher thanked the current CDI Project Teams that shared their progress at the Summer Earth Science Information Partners (ESIP) meeting.


Participants at the CDI Session at the ESIP Summer Meeting in July 2018: Supporting integrated and predictive science: Community for Data Integration focus on risk assessment.

Kevin also reminded us that the next CDI Request for Proposals will be happening soon, hopefully September! Like last year, Kevin and Tim Quinn will select a theme or themes for us to work on together as a community. Kevin stressed that the CDI is aiming to help develop the capacity of the entire USGS for data integration and management through this proposals process, and therefore we should always keep an eye out for project outcomes may be relevant to our own work. Selecting projects with wide applicability is a priority. Stay tuned for more info.

I announced that in the next month, I will complete the DataCamp Git for Data Science module, and I invite anyone interested to join me. Announcing this goal to the entire CDI is the best shot I have at actually doing it. This is a new CDI experiment in group learning. Read more at this wiki page and sign up to get weekly reminders and updates on my progress - this way we will complete the 4 hour module together.

Tim Quinn (Chief, USGS Office of Enterprise Information) and Nancy Sternberg (Senior Advisor for Strategic Planning, USGS OEI) presented us their vision of Component Architecture for Integrated Science and some recent workshops that have helped them to identify next steps for implementation. The OEI oversees a tremendous amount of data and hardware in the USGS, and is working hard to modernize the entire system. In the past year, they have held workshops on High Performance Computing and High Throughput Computing, Data Storage, and Sensor Networks, to guide their plans. They also gave us insight into the data storage strategy for the next 5 years and beyond. Tim, Nancy, and Paul Exter welcomed feedback from the community on their presentation.


Insights into a strategy for storing our increasing data volumes.

Meeting recording and slides are posted on the August 2018 Meeting page.

--
More CDI Blog posts