Blog

Blog

Highlights from the last month of CDI Collaboration Area activity:

Metadata Reviewers Community of Practice 4/2/18

The group took a look at the Genetics Guide to Data Release and Associated Data Dictionary, which was spearheaded by Bobbi Pierson, Alaska Science Center Geneticist, and the Genetics Metadata Working Group. They found it to be a great resource for those that need to author genetics metadata under USGS guidelines. More meeting information.

USGS DevOps Sync 4/3/18

In April, the DevOps Project Manager Sync had several topics:

  • Update from USGS Cloud Hosting Solutions

  • Software management website update (Cassandra Ladino)

  • Zero Trust Networking: What is it? (internal link) (Tom Van Dreser)

  • Overview of Cloud activities at Cal Poly (Paul Jurasin)

 
The Zero Trust Model: Should we be taking the information security advice from Congressmen? Drawbridge Network, November 2016

Data Management Working Group 4/9/18

Lance Everette and Tara Bell presented in the theme of “Preserve”: Taking action against USGS legacy data challenges. See recording and slides at the meeting page.

Semantic Web Working Group 4/12/18

Alan Allwardt demonstrated the creation of a new set of persistent identifiers using the PURL system (https://archive.org/services/purl/). He used the example of the Data Categories of Marine Planning vocabulary, and described other use cases. (Notes)

Tech Stack Working Group 4/12/18

Jeremy Fischer from Indiana University presented on "Jetstream: A free national science and engineering cloud environment on XSEDE." (video)

Bioinformatics Community of Practice 4/17/18

Dr. Bonnie Hurwitz from University of Arizona demonstrated the iMicrobe platform that runs on CyVerse: https://www.imicrobe.us/. Slides and recording available at the meeting page.


GIS Community of Practice 4/25/18

The GIS Community of Practice hosted a webinar on ArcGISDesktop to ArcGIS Pro Transition. James Sill and Stephen Zahniser of Esri gave an overview of the user interface and architecture and a demo. We received over 50 questions and comments during the presentation via sli.do and chat and we’re working on getting the Q&A up on our wiki. Recording is available at the meeting page.


New ArcGIS Pro interface.


--
More CDI Blog posts



April 2018 Monthly Meeting

At the CDI monthly meetings, our goal is to bring you tools and information to help you do your daily work.

On April 11, 2018, we started with a review of the Reproducible Notebook Series, started in October 2017. The series has been showcasing different examples of reproducible and executable online notebooks. These notebooks are cast as the successor to the traditional scientific paper in a recent Atlantic article that has been making the rounds: The Scientific Paper is Obsolete.


Read about the history of reproducible notebooks in this article from The Atlantic.

Reproducible notebooks to access ocean biogeographic information

April’s reproducible notebook installment: OBIS (Ocean Biogeographic Information System) and R - Filipe Fernandes, SECORRA/IOOS (Southeast Coastal Ocean Observing Regional Association/Integrated Ocean Observing System). Filipe’s presentation used the jupyter nbviewer, creating a presentation directly from the notebook! He showed how to connect sea turtle observation points to create possible migration paths in the Atlantic Ocean.


Screenshot from Filipe's notebook, plotting and connecting sea turtle observations.

Biodiversity monitoring and citizen science

Taxa Taxi: An automated process for using citizen science data to facilitate biodiversity monitoring (Erin Boydston and Toni Lyn Morelli)

iNaturalist citizen science observations are helping researchers understand biodiversity monitoring (after some automated data processing). iNaturalist got a thumbs up from a meeting participant as a neat mobile app to take on your hikes.

Preserving USGS Legacy Data

USGS Data at Risk: Expanding Legacy Data Inventory and Preservation Strategies (Lance Everette and Tara Bell)

Rescuing legacy data at the USGS remains a Herculean effort. The Legacy Data Inventory Reporting System (LDIRS) and its evaluation criteria can help the USGS address this need.


Web mapping for photo collections

Web Mapping Application for a Historical Geologic Field Photo Collection (Sarah Nagorsen and Jason Sherba)

Need guidance for proper documentation and publication of geolocated photo collections? See the CDI-funded project on a web mapping application for photo collections.


Visit the April 11, 2018 CDI Monthly Meeting Wiki Page.

--
More CDI Blog posts

Some highlights from March 2018 CDI collaboration area activity:

Metadata Reviewers Community of Practice 3/5/18

The group discussed a CDI proposal to create specifications for USGS data products so that ISO standard metadata records can be created in tools like the ADIwg metadata toolkit (mdEditor, mdTools). (Update: funded). The group also got a sneak peak at the new Data Dictionary page on the USGS Data Management Website (Update: published).


Screenshot from the new data dictionaries web page.

USGS DevOps Sync 3/6/18

Brian Fox shared a cloud training resources wiki page.
Ross Wickman gave an update from Cloud Hosting Solutions (CHS).
Eric Martinez gave a presentation entitled Software Inventory, What it is, how it's made, and how you can make it better (
internal link). More info: https://sourcecode.cio.gov/


Tech Stack Working Group 3/8/18

Zarr: A simple, open, scalable solution for big NetCDF/HDF data on the Cloud": Alistair Miles, University of Oxford. The motivation, current status and future plans for Zarr were discussed, along with a demo of basic functionality, and, an analogy between virtual machines and cows. (link to video)

Data Management Working Group 3/12/18

Capturing your processing and analysis workflow in R - Alison Appling. Alison introduced tools in R for dealing with reproducibility of analysis, size and complexity of analysis, collaboration on analysis, and dissemination. (Just a sampling of tools: remake, drake, googledrive, sbtoolswhisker). (slides)


R tools for modern data analyses

eDNA Community of Practice 3/20/18

The group discussed potential activities for future conference calls. The group also maintains links to eDNA talks being hosted outside of the CDI on their wiki page.

Software Development Cluster 3/29/18

Chris Johnson presented on USGS EDGE (Equipment Development Grade Evaluation): What is it, how does it apply to you, and why you may be interested in participating. You can access the recording on their meetings page if you are logged in.

 
EDGE presentation


--
More CDI Blog posts


The CDI Pi Day (3.14.2018) monthly meeting was overflowing with content - here are some highlights. Check out all the details, including recording and slides, at the monthly meeting page.

FAIR Data

We continued learning about the FAIR Data Principles - I and R stand for Interoperable and Reusable. Awareness of these principles is growing within the CDI.

February (left) and March (right) polls about the FAIR data principles - growing the awareness of Findable, Accessible, Interoperable, and Reusable data!

CDI proposals advancing to full proposal stage

Cheryl Morris gave the opening announcements, displaying the 18 CDI Proposals that moved to Phase 2, shown around the CDI Science Support Framework. For teams that did not advance to Phase 2, the we always welcome further discussion about how to better frame projects with CDI principles (and we’re not just saying that). She also reminded us that the FY19 Request for Proposals is not too far away, and encourages groups to start the discussion.


FY18 proposals advancing to full proposal stage, around the CDI Science Support Framework.


Group Announcements - help design the USGS Software management website

Group Announcements - A USGS Software Management Website is being planned and Cassandra Ladino (ccladino@usgs.gov) is looking for volunteers to help with the design - this includes everyone from the individual scientist developing software to large development teams. See all announcements.

Science for a Risky World

Kristin Ludwig briefed us “Science for a Risky World: A USGS Plan for Risk Research & Applications” giving us more information about efforts around the USGS that we can join.


Funded project presentations

There were four CDI funded project presentations from last year, sharing their findings regarding making data more accessible, high throughput computing and docker containers, benefits and limitations of using Tableau for USGS data, and new technologies that allow us to “do science” in the cloud.


Kate Allstadt - An Interactive Web-based Application for Earthquake-triggered Ground Failure Inventories

Richard Erickson - Flocks of a feather dock together: Using Docker and HTCondor to link high-throughput computing across the USGS

Jeff Peters - Visualizing community exposure and evacuation potential to tsunami hazards using an interactive Tableau dashboard

Rich Signell - Exploring the USGS Science Data Life Cycle in the Cloud

Tell us about your experience in this year's Community Voting on Statements of Interest

New Monthly Meeting Highlights

  • We’re trying out a new “Highlights” section on the Monthly Meeting pages that will list major links and resources presented at the meeting, these will be posted well before my blog posts! 

Selected links from March 14, 2018

  1. Landslide Inventory Web Application: http://doi.org/10.5066/F7D799CT
  2. Data Series Report for the Earthquake-triggered ground-failure inventories: https://pubs.er.usgs.gov/publication/ds1064
  3. Code for semi-automation of metadata creation for landslide inventories: https://github.com/usgs/landslides-metadata
  4. Code repository for the HTCondor project: 

    https://my.usgs.gov/bitbucket/projects/CDI/repos/hunting_invasive_species_with_htcondor

  5. Tsunami Evacuation Tableau app: https://geography.wr.usgs.gov/science/vulnerability/oahuEvacDashboard.html

  6. USGS Data Life Cycle in the Cloud: https://github.com/USGS-CMG/data-life-cycle-cloud


The frequency of exciting CDI collaboration area meetings is far greater than the frequency of my writing about them. Here are some highlights from the past two months:

The friendly metadata editor

Looking for a new, browser-based, easy-to-use metadata authoring tool? mdEditor is here!! Visit mdeditor.org to check it out. (See slides at the Metadata Reviewers Meetings page.) 


mdEditor: the friendly metadata editor (best tagline ever)

Cloud Training Resources

The DevOps group has started a Cloud Training Resources page. (Example: Amazon Web Services "What is Cloud Computing?") If you find other training opportunities, please let Brian Fox (bfox@usgs.gov) know, so that he can add them to the list.

Online platforms for data analysis

Online platforms for data analysis have arrived, as illustrated by the recent Tech Stack/Tech Dive presentations. The webinar page has links and recordings for The Pangeo Project (an open-source big data science platform), and the National Data Service Labs Workbench (a scalable platform for research data access, education, and training).

Tidy data and more

The Data Management Working Group has covered several topics including: Publishing metadata to the Science Data Catalog and Data Management Challenges (Jan 2018); Tidy data, Biological Analysis Packages, and Volunteered Geographic Information (Feb 2018).

Bonus: Read the original Tidy Data paper (Wickham, Journal of Statistical Software, 2014).  Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.


Is your data tidy?

Linked data to protect cultural heritage 

In February, the Semantic Web Working Group tuned in to a National Park Service presentation on the use of linked data to protect cultural heritage resources in the national parks from climate change, using the Digital Index of North American Archaeology.


The Digital Index of North American Archaeology interface.

All things microbiome

In February, the Bioinformatics group covered All things microbiome. Many different groups within the USGS have some element of microbiome research - check out the USGS Fact Sheet on microbiome research for more information.


Image from the USGS Fact Sheet on microbiome research.

The USGS EDGE program and more

Open Source Coffee Talks decided to combine forces with the Software Development Cluster as of February 2018. The Software Development cluster discussed the topics of the USGS HPC/HTC Workshop, the USGS EDGE (Equipment Development Grade Evaluation) program (more info), and 508 Compliance (IT Accessibility) for websites and web applications. In March, Chris Johnson will give a presentation with further information on the EDGE Program.

New Groups! Citizen-Centered Innovation, eDNA, and GIS

The Citizen-Centered Innovation group held its inaugural call on February 21, 2018. Anyone interested in crowdsourcing, citizen science, civic hacking, and challenge & prize competitions are encouraged to join. Contact Sophia B Liu at sophialiu@usgs.gov for more information.

The eDNA Community of Practice held its inaugural call on January 16, 2018. They will be held every other month at the same time slot as the Bioinformatics group (3rd Tuesday from 2-3p ET). Contact Pete Ruhl (pmruhl@usgs.gov) for more information.

The USGS GIS community followed up on their inaugural call with a message about next steps. You can reply to this short form to log your interest in future talks and topics, including ArcGIS Pro, Serving GIS Data with ESRI, Open-source GIS topics, GIS on the cloud, and Global mapper. You can also suggest a new topic!


Whew. My next goal: Update the blog with collaboration area news in less than two months! 

View all CDI Collaboration Areas

--
More CDI Blog posts

On February 14, 2018, the CDI Monthly Meeting started off with an introduction to the FAIR data principles and what you can do about it. We learned about "F" for findable and "A" for Accessible.

Kyle Enns and Cristiana Falvo from the USGS gave a presentation on "Using Python to Bring Geophysical Data to the Surface", showing the CDI another example of a way to officially share python scripts for reproducibility. Kyle and Cristiana also shared the documents they use for Pre-review quality control, Releasing accessible python code, and their Technical peer review checklist (log in at the meeting page to view).


The feature presentation was "Semantic web for scientific information: streamlining how we write, find, link, and reuse data and models" by Ferdinando Villa of the Basque Centre for Climate Change. After describing the challenge of data and model integration and reuse, and a project he is working on to address the problem (The Integrated Modelling Partnership, www.integratedmodelling.org), he invited us all to come join in the adventure of working together in partnership to build an integrated information landscape! You can contact him at ferdinando.villa@bc3research.org.


Ferdinando's presentation was followed by a panel discussion on the semantic web and the USGS with Ken Bagstad, Dalia Varanka, Julia Moriarty, and hosted by myself. It was clear that we need more time to learn from each other about the challenges and opportunities of the semantic web!

You can view the recording and slides on the February 2018 monthly meeting page.

--
More CDI Blog posts


Kevin Gallagher opened the January 10, 2018 meeting by announcing the CDI FY18 Request for Proposals, which was released on December 18, 2017. This year, there is a topical focus on Risk Assessment and Hazards Vulnerability.

In our Reproducible Notebook Series, Chris Sherwood presented his experience with officially publishing a Jupyter Notebook on code.usgs.gov as part of an official code release. You can see the finished product at https://code.usgs.gov/usgs/whcmsc-rdc/tree/v1.0.


Our reproducible notebook series highlights repeatable, executable, and documented methods in Jupyter Notebooks.


Brian May, who manages the USGS FOIA (Freedom of Information Act) program, presented "How the Freedom of Information Act impacts Data." Did you know that the USGS receives and processes over 200 FOIA requests a year? Brian’s talk touched on some of the more routine questions posed to the FOIA program, however he is happy to provide more detailed trainings or discussions on the topic to a smaller group. You can reach him at foia@usgs.gov.



Finally, I gave a brief overview of the CDI Request for Proposals Process: Past, Present, and Future. This presentation was an opportunity for me to emphasize some of the unique features of our RFP, such as the community commenting and voting, encouragement to make new connections and discuss and promote in-progress ideas, and the benefits of participating in the voting process. Your vote counts, and as CDI members, you are registered voters!


The CDI has funded over 80 projects since 2010.

All slides and the recording are available on the January 10, 2018 Meeting Page.

--
More CDI Blog posts

12/19/17, USGS GIS Community

The USGS GIS Community had a discussion on 12/19/17 about how GIS users and enthusiasts at USGS can share information and tools as a community. The importance of the topic was illustrated by the fact that the call was so well attended that we ran out of phone lines (sorry about that - recording linked below.) Shane Wright and Roland Viger led the discussion, including the current state of USGS Enterprise GIS Help. CDI helped to facilitate the call.

Participants answered polls about what open source GIS tools they use, what technical support mechanisms seemed most promising, and what are the most important needs of the GIS community over the next 5 years. This was the start of a community of practice that will help to communicate and advance GIS capabilities at the USGS. To get involved in the conversation, contact Shane (wright@usgs.gov) or Roland (rviger@usgs.gov).

See the notes and recording at this page

See the TSWG GIS Focus Group Forum


--
More CDI Blog posts

This post rounds out the 2017 CDI Collaboration Area Activity. It's been such a full year, I'm looking forward to more great topics in 2018!

Some of these topics do not really lend themselves to images, but we must have an image. So here is last month's ball of CDI Collaboration Area words:


11/30 Software Development Cluster - future topics

The group discussed goals to help guide how this group could collaborate and benefit from each other (in order of priority and likelihood):

  • Share awareness of what is going on (software efforts, tool exploration, best practices, metadata standards)

  • Share lessons learned

  • Share configurations (software, tools, architectures, ...)

  • Share data, services, and/or maybe even code

Let the group leads, Michelle Guy (mguy) and Blake Draper (bdraper), know if you have specific topics or goals you’d like to see addressed. Software Development Cluster Page

12/4 Metadata Reviewers - data quality information

The group talked about a specific field in the USGS data release metadata: That pesky data quality information. The Data Quality field is challenging because many metadata creators and reviewers are not sure what to put there, many times there is no useful content in that field. Madison Langseth brought up a current effort to compile Data Quality Documentation Examples. See the rest of the discussion at the Metadata Reviewers page.

12/5 DevOps - streaming data, agile contracts, Cloudfront

DevOps had three presentations, two in Project Management and two in SysAd and Developer.

  • SCAPE (Secure Cloud Analytic Processing Environment): A Framework for adaptable and secure analysis of streaming data. (Ginny Cevasco - Booz Allen Hamilton)

  • GHSC (Geologic Hazards Science Center) experience with an Agile Contract (Lynda Lastowka, USGS). Shared link on agile contracts in government.

  • CHS (Cloud Hosting Solutions) Cloudfront/WAF service (Jonathan Russo - CHS)

DevOps wiki page.

12/11 Data Management Working Group - the business of science

The focus was the Data Management Theme: Acquire. Brian Reece spoke on the topic "Data integration, fiscal accountability, and the 'business of science.'" He presented an evolving suite of web services and procedures that improve the availability to access and integrate data from Bureau systems such as BASIS+ (used to track projects and financial info), FBMS (tracks agreements and sales), and IPDS (used to track publications). Data Management WG page

12/14 Semantic Web Working Group - integrated modeling

The group discussed plans for possible future work on an integrated modeling project and a data dictionary database, as well as content for a future CDI Monthly Meeting. Semantic Web WG page.

12/14 Tech Stack Working Group - more Jupyter widgets

"Mini-Hack-Session: Developing and extending Jupyter Widgets": Jason Grout, Bloomberg. Jason walked through the thought and technical processes involved with developing new widget capability. See the recording. Tech Stack WG page.

12/27 Open Source Coffee Talks - code inventories and metadata

The December Open Source topic was code inventories and metadata. Eric Martinez has been working on leverage open APIs to aggregate code.json files from individual USGS projects into a software inventory compatible with code.gov. Eric was unavailable at this months call. Alternatively, Cian Dawson volunteered to talk about the Water Mission Area activities and the Software IM. The Software IM is currently under heavy revision by the Fundamental Science Practices Advisory Committee and any feedback is welcome. (See details at the first comment on this page.) Open Source Coffee Talks page.


--
More CDI Blog posts

Here's another installment of all the topics being explored in the CDI Collaboration Areas. I'll get up to date yet!

11/2/17 Software Development Cluster - How do we use version control?

The Software Development group discussed how people use github or other version control, for example, regarding release schedules and when in the dev cycle do releases begin? Eric Martinez led this conversation with a presentation on how the GHSC (Geologic Hazards Science Center) is using gitlab. Slides available to Dept. of Interior users.

Examples of using GitLab

11/6/17 Metadata Reviewers - learning about different type of specialized metadata - the Biological Data Profile

The Metadata Reviewers group had earlier decided to learn together about different types of specialized metadata. Pai and Erika shared examples of using the Biological Data Profile for data from Sea Otter Surveys. (See Western Ecological Research Center Approved Data Releases) Read more.

Where to go to find Sea Otters... Figure from a publication supported by Sea Otter Survey Data that is documented with the Biological Data Profile.

11/7/17 DevOps - USGS Git Hosting News, Content Delivery Networks

The DevOps meetings continue to bring us explanations of new and evolving capabilities available to groups in the USGS, as well as opportunities for me to learn new acronyms.

    • Intro USGS Git Hosting Platform (Eric Martinez). Slides available to Dept. of Interior users. Questions Comments Feedback: GS Help GIT: gs_help_git@usgs.gov.

    • Announcing CHS CDN/WAF Service  (Cloud Hosting Solutions) (Content Delivery Network) (Web Application Firewall) (Jonathan Russo). This is a managed service intended for people who have a public facing internally hosted site that want to utilize Cloudfront.

    • GIT Hosting and Version Control (George Rolston). George presented code.chs.usgs.gov and gitlab-ci which is currently running and available for use. If you are not aware of what gitlab-ci is, it is a great time to learn how you can automate your builds with nothing more than a commit to master on code.chs.usgs.gov (CI = Continuous Integration)

Access more information at the DevOps wiki page.

11/9/17 Semantic Web Working Group - a USGS triple store

This is the place you can go to learn about user stories for a USGS triple store, picking a system of persistent identifiers for linked data components, and choosing between 303 URIs and hash URIs. We are all learning together!

11/9 Tech Stack Working Group - Jupyter Widgets

"Jupyter Widgets": Jason Grout, Bloomberg. (aka ipywidgets) enables building interactive GUIs for Python code using standard form controls (sliders, dropdowns, textboxes, etc.), as well providing a framework for building complex interactive controls such as interactive 2d graphs, 3d graphics, maps, and more.

See the recording.


11/13 Data Management Working Group - "Plan"

The focus was the Data Management Theme: Plan, and the group welcomed speakers on three topics:

    • Guidance on how to release USGS model output files – Fran Lightsom

    • Examples of building data management plans as code – Sky Bristol

    • Data Management activities in the Water Mission Area – Linda Debrewer

See the slides at the DMWG meeting page.

11/29 Open Source Coffee Talks - estimating software development tasks

Estimating Software Development Tasks. Discussion: "What approach has worked to best determine when a (software development) task will be completed on time, within scope, and within budget? Single point estimating? Three Point Estimating? Story Point Estimating? 50%-90% Estimating? Padding your initial thought by a factor of 2,4,8 estimating?" The group discussed these options and also created a new #projectmanagement slack channel on USGS slack. (If you do not have a Slack account, email Paul Moreland (pmoreland@usgs.gov) and he will get you set up.)

Learn more about the group at their wiki page.


--
More CDI Blog posts


Announcements included some teasers for the FY18 CDI Request for Proposals. We hope to release the guidance for the proposals process in December, you can check out the current proposals page to prepare.

Sophia Liu, who is on Mission Assignment to FEMA, presented on Leveraging Crowdsourcing in FEMA-led Response Efforts.


Colin Talbert showed us some awesome notebook capabilities in the Reproducible Notebook Series: Notebooks as a Data Management Superpower. These included examples of batch metadata propagation and upload, and a way to visualize a summary of a Science Center’s records in the USGS internal publication system.


Demo of an app that shows a timeline and different status for publications in the USGS internal publication system.

Michelle Guy presented on National Earthquake Information Center: Overview real-time data acquisition, processing, and archive.


NEIC data flow.

Lynda Lastowka presented on National Earthquake Information Center: Data-first concept for presentation and delivery.

The data-first approach (providing quality data that can be then used in a variety of ways by users) supports minimum viable products and early adopters. Data is presented for both human and programmatic users. 

See more at https://earthquake.usgs.gov/

Highlights from Q/A:

  • Regarding official USGS data release: Some of the earthquake data presented qualify as web data services. This category was designed for ‘data in motion’. Individual approved data releases are not required.
  • Some advice for other groups looking to improve their data flow for rapid response: Decouple process and presentation. All of their data streams come in and then go into the Derived Product Data Distribution Layer, which is a common communication mechanism for all other processes. The one common communication mechanism is key.
  • What is CI/CD? Continuous Integration / Continuous Deployment. In software development, an automated suite of tests and deployment. If test passes it is automatically pushed to production environment.

Visit the November meeting wiki page for slides, reproducible notebook links, and recording.


--
More CDI Blog posts


I’m a bit behind on showing off all of the different topics being explored in CDI, but here is the next installment!

Wait, what’s a collaboration area? Collaboration Area is just our new term that includes both our familiar working groups and other groups with different communication formats (like Slack and Google Hangouts). We won’t get upset if you keep saying Working Group.

You can always email cdi@usgs.gov for more information on a particular group or to request to join a group’s announcement list. Also, let us know if we missed your activity!

10/2 Metadata Reviewers Community

The group is planning metadata training for the USGS. They discussed the results of their metadata training priority survey, typical metadata shortcomings, and ideal activities and outcomes from the viewpoint of metadata reviewers. Contact: Fran Lightsom. Read more!


Some results from the metadata training priority survey.

10/3 DevOps Syncs

Open Shift Demo (Chuck Svoboda, OpenShift Practice Lead, Public Sector)

A quick baseline on DevOps and containerized platforms, an overview of OpenShift, and how Red Hat solutions enable DevOps through a trusted software supply chain.  The presenter also compared and contrasted capabilities/features between OpenShift and PCF (Pivotal Cloud Foundry).

What's OpenShift? Develop, Deploy, and Manage Your Containers


Automating ESRI Services with Jenkins (Robert Djurasaj)

Presentation on using Jenkins to automate complex workflows for delivering and publishing latest data and service updates.

What is Jenkins? A self-contained, open source automation server which can be used to automate all sorts of tasks related to building, testing, and delivering or deploying software.

Access the recordings from the DevOps Page. Contact: Brian Fox

10/12 Semantic Web Working Group

The group looked at the user stories developed in September and discussed paths forward. They discussed the concept of a data dictionary element database: It provides descriptions that you could use in metadata for data fields for the things you have measured, or are going to measure.

SWWG Meetings. Contact: Fran Lightsom

10/12 Tech Stack Working Group

"Research Workspace: A web-based tool for data sharing, documentation, analysis, and publication": Rob Bochenek, Axiom Data Science.

Research Workspace a web-based tool designed to support collaborative science and data management tasks throughout the data lifecycle.

See the video. Contact: Rich Signell


10/16 Data Management Working Group

Managing Data with Partners – Donn Holmes (Western Ecological Research Center; San Diego Field Station)

Data Management Planning in NCCWSC – Emily Fort (NCCWSC; Reston)

Presentations included discussion of basic questions for the data management planning stage: Who are the users? What direction is the content flowing? What security is needed?

Meeting page. Contacts: Viv Hutchison and Cassandra Ladino


There are many considerations during the data management planning stage.


10/17 Bioinformatics Community of Practice

The group heard about ongoing efforts led by the Alaska Science Center to provide templates and guidance for genetics data release. Speaker: Barbara (Bobbi) Pierson).

Meeting notes. Contacts: Robert (Scott) Cornman, Denise Akob, Chris Kellogg.

10/25 Open Source Coffee Talks

The group didn’t hold a meeting in October, but tried out the Tricider app for voting on new ideas each month, setting reminders and deadlines, and providing a cleaner presentation of ideas. Tricider doesn't require authentication to vote or suggest ideas.

Learn more about the Open Source groupContact: Cassandra Ladino


--
More CDI Blog posts


At the October 11, 2017 Monthly Meeting, we had our first episode of the Reproducible Notebook Series. These notebooks, rather than being college-ruled and spiral bound, are a web-based interactive computing platform where you can execute blocks of code and view results. In these segments we find examples of notebooks doing useful things (e.g., accessing data from a database and visualizing them) and give you a demo.

Rich Signell, demonstrated his Dust Bowl Notebook. At the link you can click on the .ipynb file to see the notebook, or click the "launch binder" button to execute the notebook!


Daniel Pearson, Joe Vrabel, and Ramona Neafie from the Texas Water Science Center presented "From API to Apps: USGS Texas Water Science Center Web Development Approaches and Analytics."


Find out more about their group's products, including the Texas Water Dashboard, Water-On-the-Go, and Graphing Water Information System (GWIS) at https://webapps.usgs.gov/.

--
More CDI Blog posts

Check out the CDI calendar for future group meetings, groups are open to all.

Data Management Working Group, 9/11

September’s DMWG topics were:

    • DMWG updates and introduction to data management for integrated science, Cassandra Ladino, USGS

    • Overview of DMBOK v2 (Data Management Book of Knowledge), Lowell Fryman, Collibra

See more at their meeting page.


 Updated Knowledge areas in the Data Management Book of Knowledge v.2.


DevOps, 9/12

Topics discussed at the DevOps group calls on September 12 included:

    • Cloud Hosting Solutions Docker Managed Service (Jonathan Russo)

    • Cloud Hosting Solutions Overview and Road to a Test/Dev Environment (Courtney Owens, Eric Larson, Emma Sirr)

    • Terraform (Ivan Fetch)

    • Automating SSL Certificate Creation (Shawn Noble, WMA)


Semantic Web, 9/14

This month SWWG developed user stories for at least two potential future projects: a permanent USGS triple store and a USGS database of data dictionary elements. Next steps: clean up the stories, identify interested development team members and real customers, look for resources.

More details: https://my.usgs.gov/confluence/display/cdi/2017+SWWG+Meetings

Tech Stack, 9/14

In September the group heard a presentation on JupyterHub and JupyterLab Developments by Brian Granger of Cal Poly.

View the recording

Highlights:

Jupyter lab will one day replace the current jupyter notebook interface. It’s in alpha preview.

Cool features

    • Real-time markdown updates

    • Automatic block detection

    • Ability to working with .csv with 1.3 million rows (smooth scrolling)

    • Drag and drop cells from one notebook to another

    • “Hide all code” in a notebook

    • Single document mode: cmd-shift-enter to FOCUS on one document

    • Command palate

    • Extension: integration with Google Drive - double click to open, will have full real-time editing options of Google Drive

    • chat


Are you ready for the new features in JupyterLab?

Software Development, 9/28

The Software Development Cluster had its first meeting with an informal "coffee-talk" style of a gathering via webex and phone on September 28. Everyone was welcome to bring questions and topics of interest on anything related to software development and operations at the USGS.

Topics covered included:

    • Software repository requirements and recommendations

    • Software releases and DOIs

    • Credit for original authors, especially as we work in public domain? We can request, encourage, we cannot enforce.

    • Updates on Code.usgs.gov


Want to receive more information about a particular CDI group?

Join any CDI Collaboration Area Group using this form.

Find out more about CDI Collaboration Areas on our wiki.

Q: What's a "collaboration area"?

A: "Collaboration area" is just a broader way to describe all of the subgroups in CDI that have formed around member interests. The groups have a wide range of goals and meeting styles, so we are refraining from calling them all "working groups." However, if you are used to thinking of all of the CDI subgroups as working groups, this is essentially what collaboration areas are.


--
More CDI Blog posts


The September 13, 2017 CDI monthly meeting happened in the wake of major hurricanes and earthquakes, and in the midst of a severe wildfire season in the western U.S. We heard from USGS speakers that lead efforts and applications that work with hazards data.

Subduction zones! Earthquakes, tsunamis, landslides, and volcanic eruptions

Joan Gomberg presented on the USGS plan to reduce risk where tectonic plates collide, and posed the question of how her group could engage with the CDI. (See the USGS circular.) We are now exploring the best way to help facilitate their group in the CDI Earth Science Themes Working Group

Wildfires!

Elizabeth Lile and Jodi Riegle gave an overview of the GeoMAC wildfire application, a multi-agency project that displays near real-time information on fire perimeters.

Hurricanes and floods!

Blake Draper demoed the USGS Flood Event Viewer which allows users in the public to access flood data associated with events like specific hurricanes.

 

In the opening Scientist’s Challenge, we brought up the topic of getting started with reproducible notebooks and R Shiny apps, two tools that are helping to improve the way processes are documented and visualizations are shared! We welcome suggestions on specific topics about these tools - make a note on our forum or send an email to cdi@usgs.gov.

In our interactive segment, we heard from the audience about our preferred learning style for new tools - there wasn’t an overwhelming winner, but most respondents preferred to learn in a group then work alone, a close second was the group that preferred hands-on sessions with experts, followed by a smaller group that prefers to learn it themselves. We’ll try to have a variety of methods when offering training resources. It’s always great to hear from the community!



Check out more details, including slides, on the September Monthly Meeting wiki page