Confluence Retirement

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is scheduled for retirement on January 27th, 2023. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

December 9, 2020: Software and Data Carpentries Online training and Telling data stories with Jupyter Notebooks and Pangeo

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.

Connection information

Connection information is sent to the CDI mailing list

Join Microsoft Teams Meeting

+1 719-733-3211   United States, Pueblo (Toll) – See more Local numbers at link below

Conference ID: 522 981 927#

Local numbers 


Meeting Recording and Slides

Recordings and slides are available to CDI Members approximately 24 hours after the completion of the meeting.

Log in to view the meeting resources. If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.

Agenda (in Eastern time)


11:00 am Welcome and Opening Announcements

11:15 am Working Group Announcements

11:25 am  Software and Data Carpentries: Transitioning a successful in-person workshop model to function online - Karen Word, The Carpentries

11:55 am Jupyter Data Stories with Pangeo - Richie Erickson and Ed Bulliner, USGS

12:30 pm  Adjourn

Abstracts

Software and Data Carpentries: Transitioning a successful in-person workshop model to function online

The Carpentries is a community-led nonprofit that provides training in introductory computational skills for researchers and librarians. We create curricula, train volunteer Instructors in evidence-based teaching practices, and organize workshops. Prior to COVID-19, only our Instructor Training workshops were held online. Transitioning our technical workshop program to function online was a major undertaking. In this talk, Dr. Karen Word will share background on The Carpentries training programs and teaching practices, as well as lessons learned from the community effort in adjusting to teaching online.

Dr. Karen Word serves as Director of Instructor Training for The Carpentries. Professionally trained as a biologist and educator, she supports The Carpentries community of Instructor Trainers through training and onboarding, curriculum maintenance, community management, and development of continuing education opportunities. She additionally oversees assessment efforts related to Instructor Training for The Carpentries, and has previously performed qualitative and quantitative evaluation for the community-based Data Intensive Biology Summer Institute (DIBSI) at UC Davis.


Jupyter Data Stories with Pangeo

Jupyter Notebooks allow people to tell stories with their data and code. Pangeo allows people to work with Jupyter Notebooks in the cloud and to work with large data in the cloud. We created curated examples of Jupyter Notebooks highlighting USGS Data. We also created annotated set of references for using Jupyter Notebooks inside of USGS including with Pangeo. We provide an overview of these resources and lessons learned from this CDI funded project. View our materials at https://code.usgs.gov/cdi/cdi-fy20/jupyter-data-stories (Note: Access is currently limited to people with code.usg.gov accounts until the project completes the USGS review process).

Richie Erickson is Research Quantitative Ecologist with the Upper Midwest Environmental Sciences. His current research focuses on developing population models of bighead and silver carp and modeling eDNA. 

Edward Bulliner (USGS-CERC) is an ecologist whose primary research focuses on relations between large-river and floodplain morphology and ecology using a variety of tools, including physical modeling, remote sensing, and hydroacoustic field measurements. Large portions of this research are based on creating automated workflows using the Python programming language, with an example including summarizing hydraulic model outputs into metrics of floodplain inundation based on ecological criteria. Other ongoing research includes assisting in CERC's comprehensive sturgeon research program for aiding recovery of the pallid sturgeon on the Missouri River.

Highlights

  1. 27 FY21 proposals have been invited to submit full proposals.
  2. Put in your nominations now for the USGS CDI Leadership and Innovation Award
  3. The FY21 CDI Workshop will be held from May 25-28
  4. Apply to be a Carpentries instructor
  5. See the code repository (link currently USGS only) for instructions and examples on using Jupyter Notebooks

Notes

  1. Welcome and Opening Announcements
    1. Link for easy animated GIFs and Sketching your science: https://blogs.agu.org/sciencecommunication/2020/12/02/how-to-sketch-your-science/ 
  2. Kevin Gallagher comments
    1. Kevin introduced the 27 FY21 proposal that were invited to submit full proposals.
    2. CDI FY21 Full Proposal Invitees: 2021 Community Voting Results
    3. If you did not make it the next stage, please re-group and resubmit!
    4. Nominations for FY21 CDI Leadership and Innovation Awards are due February 15, 2021: USGS Community for Data Integration Leadership and Innovation Award
  3. Tim Quinn comments
    1. CDI FY21 Workshop will be from May 25-28.
  4. Working Group Announcements
    1. Access all CDI Collaboration Areas: https://my.usgs.gov/confluence/x/yhv1I 
    2. See slides for more.
    3. Risk
      1. Jan 15, 2021 proposal deadline: https://my.usgs.gov/confluence/display/cdi/ETWG+Risk+Community+of+Practice
    4. #DataHelpDesk, happening NOW (this week in conjunction with AGU: https://twitter.com/search?q=%23datahelpdesk&f=live
    5. Tech Stack
      1. https://wiki.esipfed.org/Interoperability_and_Technology/Tech_Dive_Webinar_Series#10_December_2020:_.22Environmental_Data_Retrieval_API.22_EDR-API_Standard_Working_Group_Members
    6. Wanted: CDI Workshop Volunteers: Sign up: (Dept of Int access only)  http://ow.ly/iMTH50CDijH
  5. Software and Data Carpentries: Transitioning a successful in-person workshop model to function online - Karen Word, The Carpentries
    1. The Carpentries is a nonprofit community of practice
    2. In the intersection of software, data, and library carpentry
    3. Over 40 active lessons, collaboratively developed and community maintained. 30 lessons in development. 
    4. Instructor certification consists of a 2 day workshop, and participation in a lesson contribution, community discussion, and teaching demonstration.
    5. Anyone can self-organize a Carpentries workshop, or a centrally-organized workshop can be requested.
    6. In-person workshop features:
      1. participatory live coding - teaching slowly, and explaining as they type
      2. helpers circulate and help struggling learners
      3. Code of Conduct committee for handling issues
    7. Challenges in going virtual:
      1. limited screen space. Zoom/Team crowds out coding windows
      2. Hidden faces make it hard to read the room
      3. No sticky notes to indicate that the learner needs help
      4. side conversations need dedicated channels
      5. helpers can't take over a learners' computer
      6. socialization doesn't just happen
      7. prior to COVID-19, Carpentries recommended against online workshops
    8. community change
      1. community blogging allowed exchange of tips and reports on individual workshops
      2. task force recommendations emerged for Teaching Carpentries Workshops Online
        1. overview:
          1. live/synchronous, via video conferencing
          2. minimalist in demand for adoption of unfamiliar technology
          3. communication has to be structured
            1. chat: public and private, plus a back channel
            2. collaborative notes
            3. breakout rooms
          4. formative assessment is important
            1. response tools include: nonverbal feedback on Zoom/Teams, polling apps, chat responses, surveys
    9. supporting instructors (new role) can help with planning, facilitating video conferencing, and managing breakouts
    10. instructors should ideally be JUST teaching
    11. helpers assist learners during the workshop (minimal planning, etc.)
    12. So far...
      1. learner and instructor feedback remains positive
      2. communities want to keep online options post-pandemic
      3. online workshops are slower; curricula don't fit time frame
      4. online format seems less resilient to variation in instructional approach
    13. advancement in 2021
      1. recommendations update early in 2021, with a more prescriptive structure and fewer instructional choices
      2. working on incorporating networking, socializing, learning from one another
      3. development and expansion of offerings of 3-hour bonus modules
      4. technical curricula might need to be adapted
    14. USGS Carpentries Instructor Application: https://forms.office.com/Pages/ResponsePage.aspx?id=urWTBhhLe02TQfMvQApUlAxdiRifVmlAg0g-PN54QUVUNksxRzkwOVoyVEdQM0hDWlFWMzZVVFczOCQlQCN0PWcu 
  6. Jupyter Data Stories with Pangeo - Richie Erickson and Ed Bulliner, USGS
    1. Current problems
      1. USGS scientists use unique methods with data and would like to promote these and datasets, especially if you're not an expert in the subject matter, stat methods, or both
      2. Also, we have large datasets in the cloud that we cannot download onto a local computer
    2. Existing USGS products
      1. data releases focus on static data
      2. manuscripts focus on research (reports, articles)
      3. code releases
        1. software, focused on source code
        2. analysis code tends to be static, if released at all
      4.  How can we make code and data interactive?
        1. Jupyter Notebooks allow commenting and instructions within the notebook
      5. How do you start using Jupyter Notebooks?
        1. resources like Software Carepentry
        2. but USGS computing requires tweaks that are unique to us
      6. Online Repository
        1. https://code.usgs.gov/cdi/cdi-fy20/jupyter-data-stories (requires USGS login currently, will be public soon)
        2. includes examples, tutorials, and a curated list of resources
      7. Lessons learned
        1. core computing fluency required for advanced computing, regardless of HTC, HPC, or Cloud.
        2. "future-proof" or "future-resistant" workflows
          1. reproducible computer environment, using scripted environments like Conda or Docker
          2. when packages are updated often, it's useful to be able to go back to a previous computer environment or specific version
          3. version control (code.usgs.gov); can share with collaborators, and won't lose the code or lose track of it
      8. invitation to collaborate
        1. pull requests welcome for examples
        2. happy if others want to reuse/expand tutorials
      9. See recording for overview of code repository

Questions

  1. You have collaborations with several federal agencies. Do we need some sort of ethics approval to host Carpentries trainings? Any considerations?
    1. Karen Word: I have not come across any constraints in relation to teaching. The only constraints I've been familiar with have been not being able to use GitHub and other access to technology.
  2. Does Pangeo allow persistence of multiple conda environments?
    1. Richie Erickson: Yes. You do need to tweak a setting. There's a note on how to do that in the code repo.
  3. Can you install custom packages in USGS Pangeo?
    1. Richie Erickson: Yes, in your own environment. You can git clone it in.
      1. Can you share the personal environment with others?
        1. Richie Erickson: No, not directly. You could put everything into a conda file though.
  4. Without a PR account, how can you get, say, a docker container?
    1. Richie Erickson: Your local IT can help with this.
    2. Rich Signell: You do not need to build a docker container with Pangeo. You do need VPN or Amazon workspaces. If you want to access a big data set, you can spin up a cluster on CHS Pangeo