CDI's February meeting featured a discussion on the value of CDI to you, and a deep dive into Pangeo.
Rich Signell, a Research Oceanographer at the Coastal and Marine Science Center in Woods Hole and member of the Pangeo Steering Council, presented an overview of Pangeo and examples of uses for Pangeo for several different types of USGS workflows. The Pangeo framework is deployed by Cloud Hosting Solutions (CHS) and funded by EarthMAP as a new form of cloud-based model data analysis. Community-driven, flexible, and collaborative, Pangeo is slowly building out a set of tools with a common philosophy. In one example, Rich used a Pangeo Jupyter Notebook to process a dataset in one minute that had previously taken two weeks. Cloud costs, skills, cloud-optimized data, and Pangeo development are issues that are currently being addressed.
Renee Pieschke, a Technical Specialist for the Technical Services Support Contract at the Earth Resources Observation and Science Center in Sioux Falls, SD, continued our Pangeo focus with some information on Landsat in the cloud. Renee and her team is looking to a spring release of collection two data, which will exponentially increase the amount of data available. Level 2 processing will be required for the collection two data (trying to get close to what it would be like if you were looking at the ground; taking out disturbances, clouds, etc).
The Landsat Look upgrade uses a cloud-native infrastructure and a cloud-optimized GeoTIFF format. It uses new SpatioTemporal Asset Catalog metadata to programmatically access the data. The new Landsat Look can filter pixels with a QA Band so that any clouds, shadows, snow, ice, or water is removed to produce the best possible image.
The SpatioTemporal Asset Catalog was developed to help standardize metadata across the entire geospatial data provider community, using a simple JSON structure. It normalizes common names, simplifies the development of third-party applications, and helps enable querying in Pangeo. Another in-progress goal is connecting with Landsat data in the cloud. Getting this Landsat data into the cloud involves converting the data to a cloud-optimized GeoTIFF format and this kind of data is already fueling the backend of Landsat Look.
USGS users can access Pangeo and some test notebooks through http://support.chs.usgs.gov/ and code.usgs. More information is available on the meeting slides.
A poll was administered on sli.do to participants to see what the value of CDI is to them. Some responses are below.
"I like to hear about (and share) the cool work folks are doing throughout the USGS! The Communities are valuable because they allow folks to share innovative research and discuss ways we can do so while following Department, Bureau, Mission Area policy."
"CDI provides relevant, useful, and timely data management related issues, projects, and tools."
"I learn about new technology applications and learn of colleagues I might collaborate with."
"The CDI helps me to get my work done in my daily job! I find the people who are part of the CDI are amazing to interact with - they are engaged, enthusiastic, and interested in making things better at USGS. CDI has made me feel like I am more in touch with the USGS - there is so much going on in this Bureau, and CDI keeps me informed and makes me feel like I am part of something bigger than just my daily job."
"Demonstrate that best practices in data sci/software/etc. is important to colleagues."
"Diverse community, wide range of experience and expertise."
More information, including notes, links, slides and video recordings on the meeting, are available here.