The USGS GIS Community had a discussion on 12/19/17 about how GIS users and enthusiasts at USGS can share information and tools as a community. The importance of the topic was illustrated by the fact that the call was so well attended that we ran out of phone lines (sorry about that - recording linked below.) Shane Wright and Roland Viger led the discussion, including the current state of USGS Enterprise GIS Help. CDI helped to facilitate the call.
Participants answered polls about what open source GIS tools they use, what technical support mechanisms seemed most promising, and what are the most important needs of the GIS community over the next 5 years. This was the start of a community of practice that will help to communicate and advance GIS capabilities at the USGS. To get involved in the conversation, contact Shane (email@example.com) or Roland (firstname.lastname@example.org).
This post rounds out the 2017 CDI Collaboration Area Activity. It's been such a full year, I'm looking forward to more great topics in 2018!
Some of these topics do not really lend themselves to images, but we must have an image. So here is last month's ball of CDI Collaboration Area words:
The group discussed goals to help guide how this group could collaborate and benefit from each other (in order of priority and likelihood):
Share awareness of what is going on (software efforts, tool exploration, best practices, metadata standards)
Share lessons learned
Share configurations (software, tools, architectures, ...)
Share data, services, and/or maybe even code
Let the group leads, Michelle Guy (mguy) and Blake Draper (bdraper), know if you have specific topics or goals you’d like to see addressed. Software Development Cluster Page
The group talked about a specific field in the USGS data release metadata: That pesky data quality information. The Data Quality field is challenging because many metadata creators and reviewers are not sure what to put there, many times there is no useful content in that field. Madison Langseth brought up a current effort to compile Data Quality Documentation Examples. See the rest of the discussion at the Metadata Reviewers page.
DevOps had three presentations, two in Project Management and two in SysAd and Developer.
SCAPE (Secure Cloud Analytic Processing Environment): A Framework for adaptable and secure analysis of streaming data. (Ginny Cevasco - Booz Allen Hamilton)
GHSC (Geologic Hazards Science Center) experience with an Agile Contract (Lynda Lastowka, USGS). Shared link on agile contracts in government.
CHS (Cloud Hosting Solutions) Cloudfront/WAF service (Jonathan Russo - CHS)
The focus was the Data Management Theme: Acquire. Brian Reece spoke on the topic "Data integration, fiscal accountability, and the 'business of science.'" He presented an evolving suite of web services and procedures that improve the availability to access and integrate data from Bureau systems such as BASIS+ (used to track projects and financial info), FBMS (tracks agreements and sales), and IPDS (used to track publications). Data Management WG page
"Mini-Hack-Session: Developing and extending Jupyter Widgets": Jason Grout, Bloomberg. Jason walked through the thought and technical processes involved with developing new widget capability. See the recording. Tech Stack WG page.
The December Open Source topic was code inventories and metadata. Eric Martinez has been working on leverage open APIs to aggregate code.json files from individual USGS projects into a software inventory compatible with code.gov. Eric was unavailable at this months call. Alternatively, Cian Dawson volunteered to talk about the Water Mission Area activities and the Software IM. The Software IM is currently under heavy revision by the Fundamental Science Practices Advisory Committee and any feedback is welcome. (See details at the first comment on this page.) Open Source Coffee Talks page.
Here's another installment of all the topics being explored in the CDI Collaboration Areas. I'll get up to date yet!
The Software Development group discussed how people use github or other version control, for example, regarding release schedules and when in the dev cycle do releases begin? Eric Martinez led this conversation with a presentation on how the GHSC (Geologic Hazards Science Center) is using gitlab. Slides available to Dept. of Interior users.
Examples of using GitLab
The Metadata Reviewers group had earlier decided to learn together about different types of specialized metadata. Pai and Erika shared examples of using the Biological Data Profile for data from Sea Otter Surveys. (See Western Ecological Research Center Approved Data Releases) Read more.
The DevOps meetings continue to bring us explanations of new and evolving capabilities available to groups in the USGS, as well as opportunities for me to learn new acronyms.
Announcing CHS CDN/WAF Service (Cloud Hosting Solutions) (Content Delivery Network) (Web Application Firewall) (Jonathan Russo). This is a managed service intended for people who have a public facing internally hosted site that want to utilize Cloudfront.
GIT Hosting and Version Control (George Rolston). George presented code.chs.usgs.gov and gitlab-ci which is currently running and available for use. If you are not aware of what gitlab-ci is, it is a great time to learn how you can automate your builds with nothing more than a commit to master on code.chs.usgs.gov (CI = Continuous Integration)
This is the place you can go to learn about user stories for a USGS triple store, picking a system of persistent identifiers for linked data components, and choosing between 303 URIs and hash URIs. We are all learning together!
"Jupyter Widgets": Jason Grout, Bloomberg. (aka ipywidgets) enables building interactive GUIs for Python code using standard form controls (sliders, dropdowns, textboxes, etc.), as well providing a framework for building complex interactive controls such as interactive 2d graphs, 3d graphics, maps, and more.
The focus was the Data Management Theme: Plan, and the group welcomed speakers on three topics:
Guidance on how to release USGS model output files – Fran Lightsom
Examples of building data management plans as code – Sky Bristol
Data Management activities in the Water Mission Area – Linda Debrewer
See the slides at the DMWG meeting page.
Estimating Software Development Tasks. Discussion: "What approach has worked to best determine when a (software development) task will be completed on time, within scope, and within budget? Single point estimating? Three Point Estimating? Story Point Estimating? 50%-90% Estimating? Padding your initial thought by a factor of 2,4,8 estimating?" The group discussed these options and also created a new #projectmanagement slack channel on USGS slack. (If you do not have a Slack account, email Paul Moreland (email@example.com) and he will get you set up.)
Learn more about the group at their wiki page.
Announcements included some teasers for the FY18 CDI Request for Proposals. We hope to release the guidance for the proposals process in December, you can check out the current proposals page to prepare.
Sophia Liu, who is on Mission Assignment to FEMA, presented on Leveraging Crowdsourcing in FEMA-led Response Efforts.
Colin Talbert showed us some awesome notebook capabilities in the Reproducible Notebook Series: Notebooks as a Data Management Superpower. These included examples of batch metadata propagation and upload, and a way to visualize a summary of a Science Center’s records in the USGS internal publication system.
Demo of an app that shows a timeline and different status for publications in the USGS internal publication system.
Michelle Guy presented on National Earthquake Information Center: Overview real-time data acquisition, processing, and archive.
NEIC data flow.
Lynda Lastowka presented on National Earthquake Information Center: Data-first concept for presentation and delivery.
The data-first approach (providing quality data that can be then used in a variety of ways by users) supports minimum viable products and early adopters. Data is presented for both human and programmatic users.
See more at https://earthquake.usgs.gov/
Highlights from Q/A:
I’m a bit behind on showing off all of the different topics being explored in CDI, but here is the next installment!
Wait, what’s a collaboration area? Collaboration Area is just our new term that includes both our familiar working groups and other groups with different communication formats (like Slack and Google Hangouts). We won’t get upset if you keep saying Working Group.
You can always email firstname.lastname@example.org for more information on a particular group or to request to join a group’s announcement list. Also, let us know if we missed your activity!
The group is planning metadata training for the USGS. They discussed the results of their metadata training priority survey, typical metadata shortcomings, and ideal activities and outcomes from the viewpoint of metadata reviewers. Contact: Fran Lightsom. Read more!
Open Shift Demo (Chuck Svoboda, OpenShift Practice Lead, Public Sector)
A quick baseline on DevOps and containerized platforms, an overview of OpenShift, and how Red Hat solutions enable DevOps through a trusted software supply chain. The presenter also compared and contrasted capabilities/features between OpenShift and PCF (Pivotal Cloud Foundry).
What's OpenShift? Develop, Deploy, and Manage Your Containers
Automating ESRI Services with Jenkins (Robert Djurasaj)
Presentation on using Jenkins to automate complex workflows for delivering and publishing latest data and service updates.
What is Jenkins? A self-contained, open source automation server which can be used to automate all sorts of tasks related to building, testing, and delivering or deploying software.
Access the recordings from the DevOps Page. Contact: Brian Fox
The group looked at the user stories developed in September and discussed paths forward. They discussed the concept of a data dictionary element database: It provides descriptions that you could use in metadata for data fields for the things you have measured, or are going to measure.
SWWG Meetings. Contact: Fran Lightsom
"Research Workspace: A web-based tool for data sharing, documentation, analysis, and publication": Rob Bochenek, Axiom Data Science.
Research Workspace a web-based tool designed to support collaborative science and data management tasks throughout the data lifecycle.
See the video. Contact: Rich Signell
Managing Data with Partners – Donn Holmes (Western Ecological Research Center; San Diego Field Station)
Data Management Planning in NCCWSC – Emily Fort (NCCWSC; Reston)
Presentations included discussion of basic questions for the data management planning stage: Who are the users? What direction is the content flowing? What security is needed?
Meeting page. Contacts: Viv Hutchison and Cassandra Ladino
The group heard about ongoing efforts led by the Alaska Science Center to provide templates and guidance for genetics data release. Speaker: Barbara (Bobbi) Pierson).
Meeting notes. Contacts: Robert (Scott) Cornman, Denise Akob, Chris Kellogg.
The group didn’t hold a meeting in October, but tried out the Tricider app for voting on new ideas each month, setting reminders and deadlines, and providing a cleaner presentation of ideas. Tricider doesn't require authentication to vote or suggest ideas.
Learn more about the Open Source group. Contact: Cassandra Ladino
At the October 11, 2017 Monthly Meeting, we had our first episode of the Reproducible Notebook Series. These notebooks, rather than being college-ruled and spiral bound, are a web-based interactive computing platform where you can execute blocks of code and view results. In these segments we find examples of notebooks doing useful things (e.g., accessing data from a database and visualizing them) and give you a demo.
Rich Signell, demonstrated his Dust Bowl Notebook. At the link you can click on the .ipynb file to see the notebook, or click the "launch binder" button to execute the notebook!
Daniel Pearson, Joe Vrabel, and Ramona Neafie from the Texas Water Science Center presented "From API to Apps: USGS Texas Water Science Center Web Development Approaches and Analytics."
Find out more about their group's products, including the Texas Water Dashboard, Water-On-the-Go, and Graphing Water Information System (GWIS) at https://webapps.usgs.gov/.
Check out the CDI calendar for future group meetings, groups are open to all.
September’s DMWG topics were:
DMWG updates and introduction to data management for integrated science, Cassandra Ladino, USGS
Overview of DMBOK v2 (Data Management Book of Knowledge), Lowell Fryman, Collibra
See more at their meeting page.
Updated Knowledge areas in the Data Management Book of Knowledge v.2.
Topics discussed at the DevOps group calls on September 12 included:
Cloud Hosting Solutions Docker Managed Service (Jonathan Russo)
Cloud Hosting Solutions Overview and Road to a Test/Dev Environment (Courtney Owens, Eric Larson, Emma Sirr)
Terraform (Ivan Fetch)
Automating SSL Certificate Creation (Shawn Noble, WMA)
This month SWWG developed user stories for at least two potential future projects: a permanent USGS triple store and a USGS database of data dictionary elements. Next steps: clean up the stories, identify interested development team members and real customers, look for resources.
In September the group heard a presentation on JupyterHub and JupyterLab Developments by Brian Granger of Cal Poly.
Jupyter lab will one day replace the current jupyter notebook interface. It’s in alpha preview.
Real-time markdown updates
Automatic block detection
Ability to working with .csv with 1.3 million rows (smooth scrolling)
Drag and drop cells from one notebook to another
“Hide all code” in a notebook
Single document mode: cmd-shift-enter to FOCUS on one document
Extension: integration with Google Drive - double click to open, will have full real-time editing options of Google Drive
The Software Development Cluster had its first meeting with an informal "coffee-talk" style of a gathering via webex and phone on September 28. Everyone was welcome to bring questions and topics of interest on anything related to software development and operations at the USGS.
Topics covered included:
Software repository requirements and recommendations
Software releases and DOIs
Credit for original authors, especially as we work in public domain? We can request, encourage, we cannot enforce.
Updates on Code.usgs.gov
Q: What's a "collaboration area"?
A: "Collaboration area" is just a broader way to describe all of the subgroups in CDI that have formed around member interests. The groups have a wide range of goals and meeting styles, so we are refraining from calling them all "working groups." However, if you are used to thinking of all of the CDI subgroups as working groups, this is essentially what collaboration areas are.
The September 13, 2017 CDI monthly meeting happened in the wake of major hurricanes and earthquakes, and in the midst of a severe wildfire season in the western U.S. We heard from USGS speakers that lead efforts and applications that work with hazards data.
Joan Gomberg presented on the USGS plan to reduce risk where tectonic plates collide, and posed the question of how her group could engage with the CDI. (See the USGS circular.) We are now exploring the best way to help facilitate their group in the CDI Earth Science Themes Working Group.
Elizabeth Lile and Jodi Riegle gave an overview of the GeoMAC wildfire application, a multi-agency project that displays near real-time information on fire perimeters.
Blake Draper demoed the USGS Flood Event Viewer which allows users in the public to access flood data associated with events like specific hurricanes.
In the opening Scientist’s Challenge, we brought up the topic of getting started with reproducible notebooks and R Shiny apps, two tools that are helping to improve the way processes are documented and visualizations are shared! We welcome suggestions on specific topics about these tools - make a note on our forum or send an email to email@example.com.
In our interactive segment, we heard from the audience about our preferred learning style for new tools - there wasn’t an overwhelming winner, but most respondents preferred to learn in a group then work alone, a close second was the group that preferred hands-on sessions with experts, followed by a smaller group that prefers to learn it themselves. We’ll try to have a variety of methods when offering training resources. It’s always great to hear from the community!
The joint CDI Tech Stack / ESIP Tech Dive group held a special bonus session on 8/31/17 that showcased ERDDAP examples. (See their August 10th webinar for an introduction to ERDDAP: Easier access to scientific data.)
Speakers included: Jenn Sevadjian, Jim Potemra, Conor Delaney, Kevin O'Brien, John Kerfoot, Stephanie Petillo, Charles Carleton, Eli Hunter
One example: ERDDAP makes it possible for oyster farmers to view data dashboards of real-time environmental conditions that may affect their crop.
Humboldt Bay Oyster Conditions dashboard shown by Jenn Sevadijian
Jenn Sevadijian posted some example ERDDAP code snippets on jsfiddle.net:
You can test out using your own data by just editing a few lines of code.
Watch the full video for more examples and tips!
On Sept 5, 2017 the Metadata Reviewers Community of Practice worked on improving some of the data management resources that the USGS shares on its Data Management website, including the Data Review Checklist and the Metadata Review Checklist.
The user needs when checking proper encoding of xml metadata records are not documented now, so Fran Lightsom will be leading a use case process to clarify those needs. Contact Fran if you are interested in participating in the use case process, firstname.lastname@example.org.
See the recommended revisions and more detail on their meeting notes page.
See the current information for reviews of data and metadata on the Data Management Website Data Release Page.
Image of the Data Release process in IPDS (Information Product Data System) from the Data Management Website Data Release Page.
On August 15, the Bioinformatics Community of Practice hosted a presentation on KBase, a large-scale bioinformatics system that enables bench biologists and bioinformaticists to upload their own data, analyze it alongside collaborator and public data, build increasingly realistic models, and share their workflows and conclusions.
Ben Allen from Oak Ridge National Laboratories gave an overview and demo.
At the August 9, 2017 Monthly Meeting, we heard about several topics that help when designing apps, software, and websites to communicate about our work, including information architecture, software development, and user experience. See the recording and slides at the meeting page.
Kevin Gallagher opened with thanking everyone the ideas for integrated science that came in since his call for ideas at the July CDI meeting. He also let us know that the CDI Request for Proposals is going to be different this year, in light of the USGS Executive Leadership Team’s ongoing discussions about Integrated Predictive Science Capacity. We won’t have the same process or timeline as the past few years with the CDI RFP Guidance going out in September. The ELT is focusing in on some topics and projects that will influence the RFP process - we’ll let you know more as soon as we have more information.
I gave a brief review of some of the Information Architecture (IA) topics we’ve been touching on recently, including not just the organization and design of websites or apps, but also factoring in user experience, usability, and information design. I recommended the site abbytheia.com for lots of IA tips. Asking the audience members if they managed something that takes into account design, usability, or information architecture, it turned out that 70% of us manage a web application or tool that takes this into account!
Blake Draper introduced the CDI Software Development Cluster to help spread knowledge among software developers. (A cluster is an informal working group in CDI, borrowing from the ESIP terminology.) To join, email email@example.com and we’ll get you on the list. Check out their wiki page to learn more.
Eric Martinez introduced a proposal for a simple, standardized way of documenting software so it can be easier to discover, and therefore easier to re-use existing solutions. He presented the metadata recommended by code.gov, a government platform facilitating code discovery and re-use.
The best practice is to create a code.json file for each software project.
See an Example: https://usgs.github.io/best-practices/code.json
A few other resources:
USGS Best Practices: https://usgs.github.io/best-practices/software/metadata.html
Creating your enterprise code inventory: https://code.gov/#/policy-guide/docs/compliance/inventory-code
Federal Source Code Policy: https://code.gov/#/policy-guide/policy/introduction
Rachel Volentine from the User-eXperience Lab at the University of Tennessee, Knoxville, gave an overview of activities, equipment, and lessons learned from her lab. She showed some results and findings from the DataONE search interface testing and the USGS Science Data Catalog. Some of her UX Lessons Learned is “We are not the user” so we must develop trust, work with users’ existing habits, and don’t make the user think too much!
You can see more about the lab at http://cics.cci.utk.edu/user-experience-lab
Finally, we ended with another poll: “What kind of content would you like to see more of in CDI Monthly Meetings? (choose up to two). So yes, it was mentioned that the choices were not completely exclusive of each other, but in any case we ended up with a three way tie, meaning we will be continuing to see a variety of different formats in our monthly meetings.
See you at the next monthly meeting on September 13, 2017!
On August 10, 2017, as on each second Thursday of the month, CDI has a double-header with the Semantic Web Working Group and Tech Stack Working Group calls.
In the Semantic Web Working Group (SWWG) call, Madison and I brainstormed with the SWWG team on a conceptual model for a CDI Knowledge Base. Such a knowledge base would help us to visualize and query for connections between groups, projects, people, and other resources made available by the CDI.
Using the CMap tool, we had a fun conversation, led by Alan Allwardt and Dave Govoni, about a draft map and how we might take the next steps in using such a map. If you have ideas about such a map, don't hesitate to join in the conversation, you can email firstname.lastname@example.org to reach us and learn more.
More notes are available on the SWWG meetings page.
Results of brainstorming so far. DRAFT!
On the Tech Stack call, Bob Simons, an IT specialist with NOAA’s Environmental Research Division, presented on ERDDAP. ERDDAP is a data server that gives you a simple, consistent way to download subsets of scientific datasets in common file formats and make graphs and maps.
ERDDAP was developed to address challenges in finding and downloading data to your favorite client software.
ERDDAP works with gridded and tabular data.
There are many different installations of ERDDAP, here is one with oceanographic data (e.g. data from satellites and buoys) and more information: https://coastwatch.pfeg.noaa.gov/erddap/index.html
Fun fact: "ERDDAP" used to be an acronym (Environmental Research Division Data Access Program), but it outgrew that original description. Now, please just think of it as a name, not an acronym. (from the ERDDAP > Information page)
The Metadata Reviewers Community of Practice had a presentation from Lisa Zolly about Metadata Tips for Better Discoverability of Data in the USGS Science Data Catalog on August 7, 2017.
Using USGS keywords for metadata is very critical to achieving browsability to your data!
The USGS keywords, from the USGS Thesaurus, are not meant to be discipline specific, they are like an index to the corporate holdings. You can always use another vocabulary to add other, more specific keywords to your data.
When you are filling out metadata, putting the dataset DOI in the <onlink> field will properly access it from the “Get Data” link.
It's important to have keywords that serve both coarse-level data browsers and more granular-level advanced users.
Thanks, Lisa, for the tips on achieving better discoverability of data, and the mechanics behind it.
The next meeting of the Metadata Reviewers CoP will be on Tuesday, Sept 5 - note the new date to avoid Labor Day.
CDI's virtual meeting on July 12, 2017 drew a big crowd to learn about uses of augmented and virtual reality in the Earth Sciences. View the recording and slides on the July meeting page.
In his introductory remarks, Kevin Gallagher urged the community to continue discussing integrated science pilot projects that will speak to the USGS FY18 Bureau Priority of building an integrated predictive science capacity. We were encouraged to suggest projects we would like to augment, projects we would like to share with the community, and projects we want to learn more about, at http://bit.ly/2v5niZf or by email to email@example.com.
In the Scientist’s Challenge segment, Jordan Read and Lindsay Carr presented on Data-driven web design with A/B testing and experimentation. They started the discussion about obtaining quantitative information for improving science communication through our websites and apps. "Use data, not opinion!"
Pete Cinotto from the USGS Indiana-Kentucky Water Science Center demonstrated uses of augmented reality at the Indiana-Kentucky Water Science Center, including to communicate stream gage information, and to demonstrate contour lines and water flow in a cool augmented reality sandbox. Augmented reality adds information to the surrounding world (locations, images, objects, etc.) as opposed to virtual reality, which creates a new reality.
David Krum and Ryan Spicer from the University of Southern California Institute for Creative Technologies spoke about Drone Based Terrain Capture and Virtual Reality and the potential for collaborating with them on projects that are using virtual reality to explore terrain elevation and related characteristics like line of sight and slope. They can be contacted at firstname.lastname@example.org and email@example.com.