Python-Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. While it is not quite as a rich as R in terms of statistics, it's really getting there. One major benefit is that its data structures are python-native. This means that if you exploit a lot of the python specific functionality to clean up your data, you will not have to transform it for use in R. In some cases, this ca be a substantial bottleneck. There's a pretty decent tutorial video (~30 minutes) (sound cuts out 20-23mins), a class video (~3 hours), and a "tour" video (~10 minutes), among many others.


  • NumPy is a really important add-on for Python. It provide low-level functions for handling arrays. It comes with ArcGIS
  • SciPy is another really importantadd-on for Python. It's built on top of NumPy and provides higher-level (i.e., more user-friendly) functionality. It also comes with ArcGIS.
  • Python is distributed with a large standard library of modules that support various tasks, but many more are available online. An extensive collection of pre-compiled libraries are available in this collection posted by Christoph Gohlke. Key libraries of interest to scientific computing include NumPy, SciPy, matplotlib, and netCDF4.
  • Versions of the GDAL and OGR libraries are now available in Python, in a package called pypi.
  • Using Python with Fortran or C sub-page

In addition to a truly dizzying number of individual add-on libraries for Python, there are a few distributions of sets of python libraries that try to eliminate the hassle of pulling together lots of libraries. We should investigate these!