GHSC Software Release Process

Last Updated December 19, 2017

 

This describes the process to publish, or release, software to the public and includes links to supporting information. In some cases the software may accompany a reviewed paper/OFR or other kind of reviewed publication, and/or produce a data product also needing a release process. The following focuses on the software portion. Current expectations are that papers, software, and data sets are distinct information products, and thus each has its own entry in IPDS.

 

The motivation behind this process is to meet the USGS software release policy (which will hopefully be itself released by end of fiscal year 2016). The general goal is to ensure all USGS information products, of which software is an example, meet the USGS Fundamental Science Practices and result in reviewed, high quality products.

 

Please note the process is a work in progress and feedback is encouraged.

 

 

  1. All software is subject to some kind of license, often public domain for government written software, or sometimes a combination of licenses to cover 3rd party software packages used, and the USGS policy requires acknowledgement of this. Make sure your repository has a LICENSE.md file documenting your public domain licensing and any other software packages included in the repository, as well as direct software package dependencies (e.g. software that is required to compile/build your software). The LICENSE.md should go into the top level of your repository. A few examples
    1. https://github.com/usgs/earthquake-event-ws/blob/master/LICENSE.md
    2. https://github.com/usgs/nshmp-haz/blob/master/LICENSE
  2. Along with licensing, the code is either in a provisional (work in progress, not yet been reviewed and released) or approved state (a version has gone through USGS software release process, note the code can continue to be a work in progress beyond released versions). Add appropriate disclaimer for your code (provisional vs approved) into the LICENSE. md.
    1. Provisional http://www.usgs.gov/fsp/fsp_disclaimers.asp#11 (most software will start with this disclaimer)
    2. Approved http://www.usgs.gov/fsp/fsp_disclaimers.asp#5
  3. Code packages need a basic overview of what is the purpose of the code. This is easily accomplished by ensuring your repository has a README.md that is at a minimum a basic description of the code. This is a good place to provide references to any published/corresponding paper(s), supporting information such as any testing/validation/verification efforts, and dependencies on any other software, compilers, operating systems etc.. This should go into the top level of your repository. A few examples
    1. https://github.com/usgs/earthquake-eventpages/blob/master/README.md
    2. https://github.com/usgs/nshmp-haz/blob/master/README.md
  4. Per best practices, your code should be in a version control system, reviewed and accessible. A great way to accomplish all of this is to get your code out onto GitHub and through an administrative review (looking for security concerns and PII, etc). Note that members of the HazDev team (under Lynda Lastowka) can be resources to help with such a review. Detailed instructions and supporting links for all of these steps are at:
    1. Administrative review (looking for any sensitive information and/or PII) details are outlined, along with details on reviews types and other best practices (https://github.com/usgs/best-practices).  ( Admin review - https://github.com/usgs/best-practices/blob/master/software/reviews.md#administrative-security-review)
    2. To request a new repository on GitHub.com/usgs (public facing repository system) or code.usgs.gov (public or internal) create an issue on code.usgs.gov under the software-release project and indicate who will be doing the Administrative/security review as this is required prior to making the code publicly accessible  https://code.usgs.gov/software-release/reviews/issues (note you use the “USGS Login” button to authenticate using your GS domain (AD) credentials.
    3. A “snapshot” of the code must be made to be reviewed and eventually marked or identified as the released software version. To do this you can make a candidate release branch (see Branches in Git Commands ), with suggested naming convention of “vMAJOR.MINOR.BUGFIX-rc” (e.g. v1.0.1-rc). This branch is what peer review and final approval can be done from and then merged back into master.
  5. The code is now ready to go through a software peer review process (just as papers/abstracts etc do) looking for adherence to best practices. The review process is documented via the routing sheet similar to that for papers/abstracts. You can fill out the Information Product Review and Approval Sheet, available at: http://ghsc.cr.usgs.gov/science/RoutingSheet.pdf . In the Title section provide link to code repository. There must be at least two reviewers, where one does an administrative/security review (step 4) and the other is a peer, or domain expert (step 7). Please see code review guidelines in the development section of the USGS best practices for more details if needed. With thanks to Josh, an example routing sheet is pasted below under the “Supporting Information” section.
    1. Code or software review - https://github.com/usgs/best-practices/blob/master/software/reviews.md#code-review
    2. At least two reviewers are required, but more are allowed. And there are 3 basic things to be reviewed
      1. Handling of any sensitive/security information from item #4 above the “Administrative review” - https://github.com/usgs/best-practices/blob/master/software/reviews.md#administrative-security-review
      2. Software best practicies review or code review described here in item #5 - https://github.com/usgs/best-practices/blob/master/software/reviews.md#code-review
      3. Domain expert or scientific review described in the next item #6 below - https://github.com/usgs/best-practices/blob/master/software/reviews.md#subject-matter-expert-review

 

  1. The code, as papers are, needs to be reviewed by a scientific peer or domain expert who can look at the code and/or verification and validation efforts and/or inputs and outputs to ensure the code is doing what the documentation says it is doing. Ask for a scientific or domain review from someone(s) with expertise/context for the code, task(s) it performs (e.g. another scientist, user of your code, etc.) and/or verification/validation tasks (e.g. ipython notebook, test cases with results, etc). This kind of review may be a demo of the system, walking through use cases or scenarios, reviewing tests and results, viewing a graphs and outputs. Resolution of feedback from such a review needs to be documented and provided in IPDS. This can be a written document, a link to filtered issues on GitHub, etc. A benefit of managing the aggregated feedback as issues in github (via issues link just under the repository name/title on github) is any resolution, whether documentation, further testing, or code changes can be tracked and related directly with the software. Some examples can be found at:
    1. https://github.com/usgs/nshmp-haz/issues
    2. Domain expert review - https://github.com/usgs/best-practices/blob/master/software/reviews.md#subject-matter-expert-review
    3. Using tools such as github issues then the peer review feedback can be addressed and then tracked directly in GitHub over time via the Issues tool in GitHub (adding comments, making any code changes, marking issues as resolved etc.)
  2. Get a digital object identifier (DOI) for your source code. Note that if your software is with a paper going to SPN via IPDS then the paper will have a DOI created by SPN and you can use the same DOI if that is a good fit. If you needed or want a different DOI for the software (which is often the case) you can get a DOI via the USGS DOI Tool ( https://www1.usgs.gov/csas/doi/ ). Or use ScienceBase ( www.sciencebase.gov ) to serve as a landing page for your software release and generate a DOI there. There is more information about DOI’s and tools at the USGS Software Best Practices community site https://github.com/usgs/best-practices/blob/master/doi.md and you can find out more general DOI information at the USGS data management website https://www2.usgs.gov/datamanagement/preserve/persistentIDs.php
  3. Now the code is reviewed and can be released and made publically accessible. As appropriate, update the disclaimer in the license file from provisional to approved. This step is the most vague at this point and will evolve over time. There will likely become some involvement from Jill (the science center director) where you provide the review feedback and how it was addressed, similar to what is done for publishing a paper. Additionally, some description of any testing/verification/validation process should be documented (e.g. github wiki pages, part of README, labeled issues) and provided.
    1. Fill out the Information Product Review and Approval Sheet, available at: http://ghsc.cr.usgs.gov/science/RoutingSheet.pdf . In the Title section provide link to code repository. There must be at least two reviewers, where one does an administrative/security review (step 4) and the other is a peer, or domain expert (step 6). Please see code review guidelines in the development section of the USGS best practices for more details.
    2. Approved software disclaimer text - http://www.usgs.gov/fsp/fsp_disclaimers.asp#5
    3. Merge release candidate branch (e.g. 1.0.1rc) into master
    4. Tag Master (vMAJOR.MINOR.BUGFIX) (remove the release candidate (rc) notation).
    5. Make release

Supporting Information

      Overview on git, github and gitlab

      Common Git Commands

      Diagram of suggested Git workflow (e.g. so you can work on your own branch and not impact others as you implement new features)

      Starting out with a new repository on git

      How to set up SSH keys for Git (you can also just authenticate every time, but this is more convenient for repetitive access)

      USGS Software Release Policy

      Updated policy in progress, will post as soon as it is released

      Cloning a repository from GitHub

      Cloning a repository from GitLab

      An example iPython notebook demonstrating testing and validation for scientific review, thanks to Josh Rigler

      An example routing sheet , thanks to Josh Rigler

 

Template Routing sheet for software release:

https://drive.google.com/drive/u/1/folders/0B101UEJHbXWYUzlaNGl5U2JBMHc