Scientific Computing Topics

Scientific Computing Environments

Statistical environments

S-Plus

USGS holds a license for a version of the S-Plus statistical package, and the USGS internal distribution includes USGS-developed statistical and graphics tools.

R

R is an open-source statistical analysis system built to be functionally equivalent to S. It is gaining in popularity, and has a body of GUI's (such as RCommander) and interfaces available for it. Since R has a command-line interface, it is fairly easy to connect with other software, for example, ArcGIS. and Java Python ("Jython"). Some have described R as a "statistical scripting language."

Although the user community is very good at answering questions, the volume of questions and answers may be overwhelming.

Python-Pandas

Python-Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. While it is not quite as a rich as R in terms of statistics, it's really getting there. One major benefit is that its data structures are python-native. This means that if you exploit a lot of the python specific functionality to clean up your data, you will not have to transform it for use in R. In some cases, this ca be a substantial bottleneck. There's a pretty decent tutorial video (~30 minutes) (sound cuts out 20-23mins), a class video (~3 hours), and a "tour" video (~10 minutes), among many others.

Python

Python, an interpreted, interactive, object-oriented, extensible programming language, supported on numerous computing platforms. In addition to being relatively easy to use and good for numerical analysis and web programming, it is the de facto scripting language for our corporate GIS platform, ArcGIS.

Packages for Python

Generally useful stuff

In addition to a truly dizzying number of individual add-on libraries for Python, there are a few distributions of sets of python libraries that try to eliminate the hassle of pulling together lots of libraries. We should investigate these!

More advanced stuff

Python and ArcGIS

Discussion topics

MATLAB

MATLAB is commonly used for data and compute-intensive scientific analysis.

Known USGS MATLAB users: Rich Signell, Ashley Van Beusekom

Microsoft Office

Although Microsoft Office is very useful for general-purpose computing widely used in science, it has also been also widely criticized by the scientific community (especially by statisticians). The largest problem by far is data import/export, and the misuse of the tools, for example the (far too common) use of Excel as a database, and errors in worksheet cell references.

USGS holds a site license for MS Office, through the Bureau Windows Technical Support Team (BWTST).

Geographic Information Systems (GIS)

Since much of USGS scientific computing involves spatial data, it is no suprise that more than half of the attendees of the 2011 CDI meeting were polled identified themselves as users of Esri's ArcGIS product.

USGS Core Science Systems supports the Enterprise GIS (EGIS) team, who supports GIS activities in the Bureau. EGIS supports USGS-wide site licenses for Esri's ArcGIS suite, and Global Mapper.

Contributors