The frequency of exciting CDI collaboration area meetings is far greater than the frequency of my writing about them. Here are some highlights from the past two months:
The DevOps group has started a Cloud Training Resources page. (Example: Amazon Web Services "What is Cloud Computing?") If you find other training opportunities, please let Brian Fox (firstname.lastname@example.org) know, so that he can add them to the list.
Online platforms for data analysis have arrived, as illustrated by the recent Tech Stack/Tech Dive presentations. The webinar page has links and recordings for The Pangeo Project (an open-source big data science platform), and the National Data Service Labs Workbench (a scalable platform for research data access, education, and training).
The Data Management Working Group has covered several topics including: Publishing metadata to the Science Data Catalog and Data Management Challenges (Jan 2018); Tidy data, Biological Analysis Packages, and Volunteered Geographic Information (Feb 2018).
Bonus: Read the original Tidy Data paper (Wickham, Journal of Statistical Software, 2014). Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
In February, the Semantic Web Working Group tuned in to a National Park Service presentation on the use of linked data to protect cultural heritage resources in the national parks from climate change, using the Digital Index of North American Archaeology.
In February, the Bioinformatics group covered All things microbiome. Many different groups within the USGS have some element of microbiome research - check out the USGS Fact Sheet on microbiome research for more information.
Open Source Coffee Talks decided to combine forces with the Software Development Cluster as of February 2018. The Software Development cluster discussed the topics of the USGS HPC/HTC Workshop, the USGS EDGE (Equipment Development Grade Evaluation) program (more info), and 508 Compliance (IT Accessibility) for websites and web applications. In March, Chris Johnson will give a presentation with further information on the EDGE Program.
The Citizen-Centered Innovation group held its inaugural call on February 21, 2018. Anyone interested in crowdsourcing, citizen science, civic hacking, and challenge & prize competitions are encouraged to join. Contact Sophia B Liu at email@example.com for more information.
The eDNA Community of Practice held its inaugural call on January 16, 2018. They will be held every other month at the same time slot as the Bioinformatics group (3rd Tuesday from 2-3p ET). Contact Pete Ruhl (firstname.lastname@example.org) for more information.
The USGS GIS community followed up on their inaugural call with a message about next steps. You can reply to this short form to log your interest in future talks and topics, including ArcGIS Pro, Serving GIS Data with ESRI, Open-source GIS topics, GIS on the cloud, and Global mapper. You can also suggest a new topic!
Whew. My next goal: Update the blog with collaboration area news in less than two months!
Kyle Enns and Cristiana Falvo from the USGS gave a presentation on "Using Python to Bring Geophysical Data to the Surface", showing the CDI another example of a way to officially share python scripts for reproducibility. Kyle and Cristiana also shared the documents they use for Pre-review quality control, Releasing accessible python code, and their Technical peer review checklist (log in at the meeting page to view).
The feature presentation was "Semantic web for scientific information: streamlining how we write, find, link, and reuse data and models" by Ferdinando Villa of the Basque Centre for Climate Change. After describing the challenge of data and model integration and reuse, and a project he is working on to address the problem (The Integrated Modelling Partnership, www.integratedmodelling.org), he invited us all to come join in the adventure of working together in partnership to build an integrated information landscape! You can contact him at email@example.com.
Ferdinando's presentation was followed by a panel discussion on the semantic web and the USGS with Ken Bagstad, Dalia Varanka, Julia Moriarty, and hosted by myself. It was clear that we need more time to learn from each other about the challenges and opportunities of the semantic web!
You can view the recording and slides on the February 2018 monthly meeting page.
Kevin Gallagher opened the January 10, 2018 meeting by announcing the CDI FY18 Request for Proposals, which was released on December 18, 2017. This year, there is a topical focus on Risk Assessment and Hazards Vulnerability.
In our Reproducible Notebook Series, Chris Sherwood presented his experience with officially publishing a Jupyter Notebook on code.usgs.gov as part of an official code release. You can see the finished product at https://code.usgs.gov/usgs/whcmsc-rdc/tree/v1.0.
Our reproducible notebook series highlights repeatable, executable, and documented methods in Jupyter Notebooks.
Brian May, who manages the USGS FOIA (Freedom of Information Act) program, presented "How the Freedom of Information Act impacts Data." Did you know that the USGS receives and processes over 200 FOIA requests a year? Brian’s talk touched on some of the more routine questions posed to the FOIA program, however he is happy to provide more detailed trainings or discussions on the topic to a smaller group. You can reach him at firstname.lastname@example.org.
Finally, I gave a brief overview of the CDI Request for Proposals Process: Past, Present, and Future. This presentation was an opportunity for me to emphasize some of the unique features of our RFP, such as the community commenting and voting, encouragement to make new connections and discuss and promote in-progress ideas, and the benefits of participating in the voting process. Your vote counts, and as CDI members, you are registered voters!
The CDI has funded over 80 projects since 2010.
All slides and the recording are available on the January 10, 2018 Meeting Page.
The USGS GIS Community had a discussion on 12/19/17 about how GIS users and enthusiasts at USGS can share information and tools as a community. The importance of the topic was illustrated by the fact that the call was so well attended that we ran out of phone lines (sorry about that - recording linked below.) Shane Wright and Roland Viger led the discussion, including the current state of USGS Enterprise GIS Help. CDI helped to facilitate the call.
Participants answered polls about what open source GIS tools they use, what technical support mechanisms seemed most promising, and what are the most important needs of the GIS community over the next 5 years. This was the start of a community of practice that will help to communicate and advance GIS capabilities at the USGS. To get involved in the conversation, contact Shane (email@example.com) or Roland (firstname.lastname@example.org).
This post rounds out the 2017 CDI Collaboration Area Activity. It's been such a full year, I'm looking forward to more great topics in 2018!
Some of these topics do not really lend themselves to images, but we must have an image. So here is last month's ball of CDI Collaboration Area words:
The group discussed goals to help guide how this group could collaborate and benefit from each other (in order of priority and likelihood):
Share awareness of what is going on (software efforts, tool exploration, best practices, metadata standards)
Share lessons learned
Share configurations (software, tools, architectures, ...)
Share data, services, and/or maybe even code
Let the group leads, Michelle Guy (mguy) and Blake Draper (bdraper), know if you have specific topics or goals you’d like to see addressed. Software Development Cluster Page
The group talked about a specific field in the USGS data release metadata: That pesky data quality information. The Data Quality field is challenging because many metadata creators and reviewers are not sure what to put there, many times there is no useful content in that field. Madison Langseth brought up a current effort to compile Data Quality Documentation Examples. See the rest of the discussion at the Metadata Reviewers page.
DevOps had three presentations, two in Project Management and two in SysAd and Developer.
SCAPE (Secure Cloud Analytic Processing Environment): A Framework for adaptable and secure analysis of streaming data. (Ginny Cevasco - Booz Allen Hamilton)
GHSC (Geologic Hazards Science Center) experience with an Agile Contract (Lynda Lastowka, USGS). Shared link on agile contracts in government.
CHS (Cloud Hosting Solutions) Cloudfront/WAF service (Jonathan Russo - CHS)
The focus was the Data Management Theme: Acquire. Brian Reece spoke on the topic "Data integration, fiscal accountability, and the 'business of science.'" He presented an evolving suite of web services and procedures that improve the availability to access and integrate data from Bureau systems such as BASIS+ (used to track projects and financial info), FBMS (tracks agreements and sales), and IPDS (used to track publications). Data Management WG page
"Mini-Hack-Session: Developing and extending Jupyter Widgets": Jason Grout, Bloomberg. Jason walked through the thought and technical processes involved with developing new widget capability. See the recording. Tech Stack WG page.
The December Open Source topic was code inventories and metadata. Eric Martinez has been working on leverage open APIs to aggregate code.json files from individual USGS projects into a software inventory compatible with code.gov. Eric was unavailable at this months call. Alternatively, Cian Dawson volunteered to talk about the Water Mission Area activities and the Software IM. The Software IM is currently under heavy revision by the Fundamental Science Practices Advisory Committee and any feedback is welcome. (See details at the first comment on this page.) Open Source Coffee Talks page.
Here's another installment of all the topics being explored in the CDI Collaboration Areas. I'll get up to date yet!
The Software Development group discussed how people use github or other version control, for example, regarding release schedules and when in the dev cycle do releases begin? Eric Martinez led this conversation with a presentation on how the GHSC (Geologic Hazards Science Center) is using gitlab. Slides available to Dept. of Interior users.
Examples of using GitLab
The Metadata Reviewers group had earlier decided to learn together about different types of specialized metadata. Pai and Erika shared examples of using the Biological Data Profile for data from Sea Otter Surveys. (See Western Ecological Research Center Approved Data Releases) Read more.
The DevOps meetings continue to bring us explanations of new and evolving capabilities available to groups in the USGS, as well as opportunities for me to learn new acronyms.
Announcing CHS CDN/WAF Service (Cloud Hosting Solutions) (Content Delivery Network) (Web Application Firewall) (Jonathan Russo). This is a managed service intended for people who have a public facing internally hosted site that want to utilize Cloudfront.
GIT Hosting and Version Control (George Rolston). George presented code.chs.usgs.gov and gitlab-ci which is currently running and available for use. If you are not aware of what gitlab-ci is, it is a great time to learn how you can automate your builds with nothing more than a commit to master on code.chs.usgs.gov (CI = Continuous Integration)
This is the place you can go to learn about user stories for a USGS triple store, picking a system of persistent identifiers for linked data components, and choosing between 303 URIs and hash URIs. We are all learning together!
"Jupyter Widgets": Jason Grout, Bloomberg. (aka ipywidgets) enables building interactive GUIs for Python code using standard form controls (sliders, dropdowns, textboxes, etc.), as well providing a framework for building complex interactive controls such as interactive 2d graphs, 3d graphics, maps, and more.
The focus was the Data Management Theme: Plan, and the group welcomed speakers on three topics:
Guidance on how to release USGS model output files – Fran Lightsom
Examples of building data management plans as code – Sky Bristol
Data Management activities in the Water Mission Area – Linda Debrewer
See the slides at the DMWG meeting page.
Estimating Software Development Tasks. Discussion: "What approach has worked to best determine when a (software development) task will be completed on time, within scope, and within budget? Single point estimating? Three Point Estimating? Story Point Estimating? 50%-90% Estimating? Padding your initial thought by a factor of 2,4,8 estimating?" The group discussed these options and also created a new #projectmanagement slack channel on USGS slack. (If you do not have a Slack account, email Paul Moreland (email@example.com) and he will get you set up.)
Learn more about the group at their wiki page.
Announcements included some teasers for the FY18 CDI Request for Proposals. We hope to release the guidance for the proposals process in December, you can check out the current proposals page to prepare.
Sophia Liu, who is on Mission Assignment to FEMA, presented on Leveraging Crowdsourcing in FEMA-led Response Efforts.
Colin Talbert showed us some awesome notebook capabilities in the Reproducible Notebook Series: Notebooks as a Data Management Superpower. These included examples of batch metadata propagation and upload, and a way to visualize a summary of a Science Center’s records in the USGS internal publication system.
Demo of an app that shows a timeline and different status for publications in the USGS internal publication system.
Michelle Guy presented on National Earthquake Information Center: Overview real-time data acquisition, processing, and archive.
NEIC data flow.
Lynda Lastowka presented on National Earthquake Information Center: Data-first concept for presentation and delivery.
The data-first approach (providing quality data that can be then used in a variety of ways by users) supports minimum viable products and early adopters. Data is presented for both human and programmatic users.
See more at https://earthquake.usgs.gov/
Highlights from Q/A:
I’m a bit behind on showing off all of the different topics being explored in CDI, but here is the next installment!
Wait, what’s a collaboration area? Collaboration Area is just our new term that includes both our familiar working groups and other groups with different communication formats (like Slack and Google Hangouts). We won’t get upset if you keep saying Working Group.
You can always email firstname.lastname@example.org for more information on a particular group or to request to join a group’s announcement list. Also, let us know if we missed your activity!
The group is planning metadata training for the USGS. They discussed the results of their metadata training priority survey, typical metadata shortcomings, and ideal activities and outcomes from the viewpoint of metadata reviewers. Contact: Fran Lightsom. Read more!
Open Shift Demo (Chuck Svoboda, OpenShift Practice Lead, Public Sector)
A quick baseline on DevOps and containerized platforms, an overview of OpenShift, and how Red Hat solutions enable DevOps through a trusted software supply chain. The presenter also compared and contrasted capabilities/features between OpenShift and PCF (Pivotal Cloud Foundry).
What's OpenShift? Develop, Deploy, and Manage Your Containers
Automating ESRI Services with Jenkins (Robert Djurasaj)
Presentation on using Jenkins to automate complex workflows for delivering and publishing latest data and service updates.
What is Jenkins? A self-contained, open source automation server which can be used to automate all sorts of tasks related to building, testing, and delivering or deploying software.
Access the recordings from the DevOps Page. Contact: Brian Fox
The group looked at the user stories developed in September and discussed paths forward. They discussed the concept of a data dictionary element database: It provides descriptions that you could use in metadata for data fields for the things you have measured, or are going to measure.
SWWG Meetings. Contact: Fran Lightsom
"Research Workspace: A web-based tool for data sharing, documentation, analysis, and publication": Rob Bochenek, Axiom Data Science.
Research Workspace a web-based tool designed to support collaborative science and data management tasks throughout the data lifecycle.
See the video. Contact: Rich Signell
Managing Data with Partners – Donn Holmes (Western Ecological Research Center; San Diego Field Station)
Data Management Planning in NCCWSC – Emily Fort (NCCWSC; Reston)
Presentations included discussion of basic questions for the data management planning stage: Who are the users? What direction is the content flowing? What security is needed?
Meeting page. Contacts: Viv Hutchison and Cassandra Ladino
The group heard about ongoing efforts led by the Alaska Science Center to provide templates and guidance for genetics data release. Speaker: Barbara (Bobbi) Pierson).
Meeting notes. Contacts: Robert (Scott) Cornman, Denise Akob, Chris Kellogg.
The group didn’t hold a meeting in October, but tried out the Tricider app for voting on new ideas each month, setting reminders and deadlines, and providing a cleaner presentation of ideas. Tricider doesn't require authentication to vote or suggest ideas.
Learn more about the Open Source group. Contact: Cassandra Ladino
At the October 11, 2017 Monthly Meeting, we had our first episode of the Reproducible Notebook Series. These notebooks, rather than being college-ruled and spiral bound, are a web-based interactive computing platform where you can execute blocks of code and view results. In these segments we find examples of notebooks doing useful things (e.g., accessing data from a database and visualizing them) and give you a demo.
Rich Signell, demonstrated his Dust Bowl Notebook. At the link you can click on the .ipynb file to see the notebook, or click the "launch binder" button to execute the notebook!
Daniel Pearson, Joe Vrabel, and Ramona Neafie from the Texas Water Science Center presented "From API to Apps: USGS Texas Water Science Center Web Development Approaches and Analytics."
Find out more about their group's products, including the Texas Water Dashboard, Water-On-the-Go, and Graphing Water Information System (GWIS) at https://webapps.usgs.gov/.
Check out the CDI calendar for future group meetings, groups are open to all.
September’s DMWG topics were:
DMWG updates and introduction to data management for integrated science, Cassandra Ladino, USGS
Overview of DMBOK v2 (Data Management Book of Knowledge), Lowell Fryman, Collibra
See more at their meeting page.
Updated Knowledge areas in the Data Management Book of Knowledge v.2.
Topics discussed at the DevOps group calls on September 12 included:
Cloud Hosting Solutions Docker Managed Service (Jonathan Russo)
Cloud Hosting Solutions Overview and Road to a Test/Dev Environment (Courtney Owens, Eric Larson, Emma Sirr)
Terraform (Ivan Fetch)
Automating SSL Certificate Creation (Shawn Noble, WMA)
This month SWWG developed user stories for at least two potential future projects: a permanent USGS triple store and a USGS database of data dictionary elements. Next steps: clean up the stories, identify interested development team members and real customers, look for resources.
In September the group heard a presentation on JupyterHub and JupyterLab Developments by Brian Granger of Cal Poly.
Jupyter lab will one day replace the current jupyter notebook interface. It’s in alpha preview.
Real-time markdown updates
Automatic block detection
Ability to working with .csv with 1.3 million rows (smooth scrolling)
Drag and drop cells from one notebook to another
“Hide all code” in a notebook
Single document mode: cmd-shift-enter to FOCUS on one document
Extension: integration with Google Drive - double click to open, will have full real-time editing options of Google Drive
The Software Development Cluster had its first meeting with an informal "coffee-talk" style of a gathering via webex and phone on September 28. Everyone was welcome to bring questions and topics of interest on anything related to software development and operations at the USGS.
Topics covered included:
Software repository requirements and recommendations
Software releases and DOIs
Credit for original authors, especially as we work in public domain? We can request, encourage, we cannot enforce.
Updates on Code.usgs.gov
Q: What's a "collaboration area"?
A: "Collaboration area" is just a broader way to describe all of the subgroups in CDI that have formed around member interests. The groups have a wide range of goals and meeting styles, so we are refraining from calling them all "working groups." However, if you are used to thinking of all of the CDI subgroups as working groups, this is essentially what collaboration areas are.
The September 13, 2017 CDI monthly meeting happened in the wake of major hurricanes and earthquakes, and in the midst of a severe wildfire season in the western U.S. We heard from USGS speakers that lead efforts and applications that work with hazards data.
Joan Gomberg presented on the USGS plan to reduce risk where tectonic plates collide, and posed the question of how her group could engage with the CDI. (See the USGS circular.) We are now exploring the best way to help facilitate their group in the CDI Earth Science Themes Working Group.
Elizabeth Lile and Jodi Riegle gave an overview of the GeoMAC wildfire application, a multi-agency project that displays near real-time information on fire perimeters.
Blake Draper demoed the USGS Flood Event Viewer which allows users in the public to access flood data associated with events like specific hurricanes.
In the opening Scientist’s Challenge, we brought up the topic of getting started with reproducible notebooks and R Shiny apps, two tools that are helping to improve the way processes are documented and visualizations are shared! We welcome suggestions on specific topics about these tools - make a note on our forum or send an email to email@example.com.
In our interactive segment, we heard from the audience about our preferred learning style for new tools - there wasn’t an overwhelming winner, but most respondents preferred to learn in a group then work alone, a close second was the group that preferred hands-on sessions with experts, followed by a smaller group that prefers to learn it themselves. We’ll try to have a variety of methods when offering training resources. It’s always great to hear from the community!
The joint CDI Tech Stack / ESIP Tech Dive group held a special bonus session on 8/31/17 that showcased ERDDAP examples. (See their August 10th webinar for an introduction to ERDDAP: Easier access to scientific data.)
Speakers included: Jenn Sevadjian, Jim Potemra, Conor Delaney, Kevin O'Brien, John Kerfoot, Stephanie Petillo, Charles Carleton, Eli Hunter
One example: ERDDAP makes it possible for oyster farmers to view data dashboards of real-time environmental conditions that may affect their crop.
Humboldt Bay Oyster Conditions dashboard shown by Jenn Sevadijian
Jenn Sevadijian posted some example ERDDAP code snippets on jsfiddle.net:
You can test out using your own data by just editing a few lines of code.
Watch the full video for more examples and tips!
On Sept 5, 2017 the Metadata Reviewers Community of Practice worked on improving some of the data management resources that the USGS shares on its Data Management website, including the Data Review Checklist and the Metadata Review Checklist.
The user needs when checking proper encoding of xml metadata records are not documented now, so Fran Lightsom will be leading a use case process to clarify those needs. Contact Fran if you are interested in participating in the use case process, firstname.lastname@example.org.
See the recommended revisions and more detail on their meeting notes page.
See the current information for reviews of data and metadata on the Data Management Website Data Release Page.
Image of the Data Release process in IPDS (Information Product Data System) from the Data Management Website Data Release Page.
On August 15, the Bioinformatics Community of Practice hosted a presentation on KBase, a large-scale bioinformatics system that enables bench biologists and bioinformaticists to upload their own data, analyze it alongside collaborator and public data, build increasingly realistic models, and share their workflows and conclusions.
Ben Allen from Oak Ridge National Laboratories gave an overview and demo.
At the August 9, 2017 Monthly Meeting, we heard about several topics that help when designing apps, software, and websites to communicate about our work, including information architecture, software development, and user experience. See the recording and slides at the meeting page.
Kevin Gallagher opened with thanking everyone the ideas for integrated science that came in since his call for ideas at the July CDI meeting. He also let us know that the CDI Request for Proposals is going to be different this year, in light of the USGS Executive Leadership Team’s ongoing discussions about Integrated Predictive Science Capacity. We won’t have the same process or timeline as the past few years with the CDI RFP Guidance going out in September. The ELT is focusing in on some topics and projects that will influence the RFP process - we’ll let you know more as soon as we have more information.
I gave a brief review of some of the Information Architecture (IA) topics we’ve been touching on recently, including not just the organization and design of websites or apps, but also factoring in user experience, usability, and information design. I recommended the site abbytheia.com for lots of IA tips. Asking the audience members if they managed something that takes into account design, usability, or information architecture, it turned out that 70% of us manage a web application or tool that takes this into account!
Blake Draper introduced the CDI Software Development Cluster to help spread knowledge among software developers. (A cluster is an informal working group in CDI, borrowing from the ESIP terminology.) To join, email email@example.com and we’ll get you on the list. Check out their wiki page to learn more.
Eric Martinez introduced a proposal for a simple, standardized way of documenting software so it can be easier to discover, and therefore easier to re-use existing solutions. He presented the metadata recommended by code.gov, a government platform facilitating code discovery and re-use.
The best practice is to create a code.json file for each software project.
See an Example: https://usgs.github.io/best-practices/code.json
A few other resources:
USGS Best Practices: https://usgs.github.io/best-practices/software/metadata.html
Creating your enterprise code inventory: https://code.gov/#/policy-guide/docs/compliance/inventory-code
Federal Source Code Policy: https://code.gov/#/policy-guide/policy/introduction
Rachel Volentine from the User-eXperience Lab at the University of Tennessee, Knoxville, gave an overview of activities, equipment, and lessons learned from her lab. She showed some results and findings from the DataONE search interface testing and the USGS Science Data Catalog. Some of her UX Lessons Learned is “We are not the user” so we must develop trust, work with users’ existing habits, and don’t make the user think too much!
You can see more about the lab at http://cics.cci.utk.edu/user-experience-lab
Finally, we ended with another poll: “What kind of content would you like to see more of in CDI Monthly Meetings? (choose up to two). So yes, it was mentioned that the choices were not completely exclusive of each other, but in any case we ended up with a three way tie, meaning we will be continuing to see a variety of different formats in our monthly meetings.
See you at the next monthly meeting on September 13, 2017!