Page tree
Skip to end of metadata
Go to start of metadata

 

Note: to edit this page, you will need to have a Confluence account (click on "Sign Up") and then get added to the member list for the CDI Tech Stack Working Group (instructions are here). 

There is a variety of management and collaboration tools available to the USGS scientific community to handle the mechanics of software "revision control," but includes other technology. This technology is tending to evolve faster than institutional policy, such as USGS Fundamental Science Practices for software publication and participation in more open-sourced models of development, with the result of unclear guidelines for the average project-level scientist.

Even if both of these aspects (revision control and policy) were stable, there really isn't a single "silver bullet" solution for all projects, which may be of different sizes both in terms of the number of people or institutions involved, have different levels of coding complexities, or involve different coding languages and environments (python, arcpy, R, SAS, etc.).  This page is a preliminary attempt to identify a few topics in this area that would be of interest for CDI members.

Another component for efficient software development is leveraging pre-existing libraries of software. This usually means installing extra libraries and making them work with your development environment. For example, getting the pandas module to work in python. Once this is done and you want to share your code with others, there's the problem of ensuring that users also have these extra pieces (that you didn't write) are also installed with your own code base. These types of things are collectively referred to here as environment configuration. There are a number of technologies out there for handling this, usually varying with the choice of programming language.

Please feel free to make comments or edits to this page! The hope is to improve this outline and then line up some presentations or panels to cover what CDI members want to hear about, maybe develop recommendations for "best practices".


Version Control, Code Sharing, Environment Configuration

  • Overview of the concept and some major examples (could just be some wiki pages and links)
    • generic: code development, code sharing, code documentation, code publication
    • basics: version control
    • Client tools: Command-line vs. GUI-based tools for accessing software repositories
      • Tortise SVN (Can be used as single-user Win Desktop tool with no DB)
    • USGS services
  • Open-source: what does it mean??? Can I choose a point along the spectrum?
    • developing in the open, sharing as you go
    • accepting/moderating non-USGS contributions
    • working w/non-USGS moderators of a repository
    • free-for-all
    • USGS review/publication process
  • Policy on Software Release
  • Release Management (seems language specific, needs to be integrated into other parts of outline)
  • Language Specific:
    • Python:
      • SVNs, Stash, github
      • USGS-specific communities
        • Training Course - GW1774: Python Programming Language and Groundwater Modeling
      • other considerations: 
      • distribution/documentation/communication using iPython Notebook, wakario.io, binstar.org
      • environment configuration
        • getting other packages into your python environment
          • pypi, install_setup, easy_install, pip, gohlke binaries
          • *.pth files, PYTHONPATH, etc.
          • Enthought Package Distribution (EPD), Anaconda/conda/wakar.io
          • getting things to work with arcpy
          • virtualenv, virtualenvwrappers
        • making code into packages
        • wakario.io, binstar.org
        • Working with IDEs
    • R programmers/users, USGS-specific considerations
    • Options for Java programmers
    • .NET
      • visualstudioonline.com – an online .git repository (free for <5 users).   MS-centric, requires MS visual studio 2012 or higher.
      • MS visual studio has local git capabilities and can be configured to use github/gitorious 
      • Microsoft source forge is MS' traditional control source technology (requires dedicated infrastructure + licensing)
    • Web / HTML / JavaScript
  • Topic/Discipline specific


10 Comments

  1. Some folks have been experiencing access issues. I'm thinking this is because the wiki only allows one editor at a time. Please be patient and keep trying to add your thoughts. Great that we're having competition to respond!

  2. Any discussion of tools needs to also consider applicable policy.

    For example, all Water software and is subject to Water Memo 2009.01,  Policy on the distribution of publically available USGS Water Resources Software on the Internet. This outlines the minimum review requirements and content for distribution of any water software online: http://water.usgs.gov/admin/memo/policy/wrdpolicy09.01.html

     

     

    1. Cian, good point. Added a policy branch to the outline. I know next to nothing about this end of things (and don't want to!). I'll leave it to others to populate this stuff. We could think about presentations, but could also point to a sub-page with a list of resources like the one you suggested. Go ahead and add things. We'll iterate over this stuff as we go (might make subpages later, etc).

  3. In order to grow the community of people who are able to use these technologies, we will need how-to's that are short and written well, minimizing jargon.  And I think we also need some way to foster the sort of peer-to-peer mentoring relationships that might help people adjust their own work processes to take advantage of the benefits these technologies promise.

    1. (as opposed to what's here. ha ha). Hopefully between this type of volunteer CDI stuff and some other funded support, we can crystallize what we want and how much we need it--likely needed to rationalize funding.

    2. We must obfuscate to ensure our positions remain highly esteemed

      1. Priestly, even. I got dibs on "Father Phoney-Baloney".

  4. The page title is Software Management, but it seems to be focused on revision control, mostly.  How much "other" stuff would we want to add here?  Code Review, Coding Style, Logging/Debugging, Security, Testing, Issue Tracking, Hosting, Metrics, etc., etc.  Do we want links to tools, libraries, etc?

     

    We actually still use CVS because we have lots of scripts developed on it and it works.  (Not broken - don't fix.)  Generally we use TortoiseCVS and WinCVS for a front-end on this.  We have Issue tracking in a system called GNATS.  The issue tracking and the source code control are not integrated.  We have been considering changing to a system that integrates source code control, issue tracking, scheduling, releases, etc. but the conversion would be pretty labor intensive so it has always been low-priority.

    1. You mention a number of additional important issues. I think a key concern for us now is managing our focus. There's a lot of directions we can go right now. I chose things related to code repositories because that seemed fundamental to a lot of other tech and best practices. That said, I think brainstorming (and recording) these ideas is great for plotting out a road map.

      I think things like issue tracking will be natural outgrowths of any code repository system that goes beyond the basics, although as you mentioned you can do this in a non-integrated way to good effect too. Might be helpful to have that as a later talk/topic (but could be integrated into the outline). Lots of options (including github, which is already in the mix). Jeff Falgout mentioned "release management", which could be tacked onto this kind of stuff.

      Code style, review, metrics (unit tests), debugging tools and approaches, and the rest could be more advanced topics, too. I would tend to lump those under a higher level label of "Software Carpentry". These would also be hugely valuable types of knowledge for us. If you or others think this stuff is needed sooner rather than later, definitely add to the outline. Figured I would make a straw man based on what I was most immediately interested in, but that the end product would indicate what we as a group are most concerned with...

       

       

  5. I think the first step is to aggregate an entire list of ALL the existing resources, where they are, how do you get to them, utilization, etc.,   that we already have in place (then eventually resources outside of USGS) to handle the mechanics of software "revision control," …python, arcpy, R, SAS, programming langueges, arcpy, GIS, java, adobe, vb, js, github, cool_help, talks,   etc…. ,to see what we have in place, and where and how we can expand upon those areas.