Skip to end of metadata
Go to start of metadata

Research organizations may deal in different topical areas and use varied tools and approaches in their day-to-day operations but one thing is true for everyone in the science and research domain: data are getting bigger. Whether it is the size of dataset inputs or the resulting output that a particular analysis produces, authors and data managers are often hastily playing catch up in response to today’s data storage and access demands. How is this affecting the USGS? What are the current trends and the latest developments in data storage available to our researchers? Does the ‘Click to Download’ model still work for the data we are producing, given the size of our products and the workflows of other researchers? This session will provide an opportunity for an update on the current capabilities provided by the SAS mission area to support USGS scientists, as well as an open discussion for anyone dealing with large data challenges. 

The working plan for this session would be for interested folks to present in a 10-15 minute time slot sharing experiences and lessons learned related to this topic.

Planned speakers are listed below. Please feel free to comment or ask questions below and stop by the talk in June!


Talk 1: Black Pearl - storing large data for use in High Performance Computing (HPC) systems - AND  - GLOBUS: large data transfer, sharing and publishing. (Jeff Falgout / Matt Davis)

Talk 2: ScienceBase integration with Black Pearl, Amazon S3, and brokered relationship with EROS EE for large data handling. (Drew Ignizio)

Talk 3: Using the Unidata THREDDS Data Server to Provide Access to Large Datasets - NCAR's Research Data Archive Perspective (Doug Schuster, NCAR)

Talk 4: Cloud-friendly data formats. (Rich Signell)


Science Support Framework Category: - Data Management

Author(s): Drew Ignizio (dignizio@usgs.gov) - USGS Science Analytics and Synthesis


Notes Document: https://tinyurl.com/CDI0605-Ignizio

6 Comments

  1. Hi Ignizio, Drew A. NCAR has experiences in netCDF file format and the THREDSS services as well. If you would be interested in hearing NCAR's perspective during your session, please feel welcome to let me know. Thanks.


  2. Sophie Hou Thanks for chiming in on the idea. Yes, I would love to hear about the work taking place at NCAR! Perhaps if you're interested, we could even touch base some time over the next few weeks to discuss more? If you know of other colleagues or teams that might be interested in sharing their experiences, please feel free to extend the invite as well.

    1. That sounds good, Ignizio, Drew A. . Please let me know when you would like to meet. Meanwhile, I will reach out to my colleagues, so that they can also be available for this meeting. Thanks!

  3. Hi Gordon, Janice : We spoke about dividing this up a bit to allow you speak about Black Pearl in relation to assisting researchers with storing and working with data as part of the HPC workflow.

    Davis, Matthew (Contractor) J or  Falgout, Jeff T.  : Would you be willing to speak briefly about GLOBUS and the status of this resource in the USGS for helping with large file handling? We can discuss more as needed (or determine if presenting is possible for either of you), but it might be valuable to cover this component or provide an update.

  4. Cross, VeeAnn A - please feel free to share notes on the EE case you've been working on! I think your experience and the work you've done would be a great contribution.

  5. One of the concerns in our office is the duplication of large datasets. The reality of the situation is that our network connection isn't always the best, so uploading a large dataset (20 GB or more) to somewhere can take several hours. That's problem number 1. Problem number 2 is data access. So even if we eventually get that large dataset uploaded, it's in a zip file. That means users have to download the whole dataset (or whatever we've placed in the zip), even if they're only interested in a small part of it. In our case, one situation in which this occurs is the release of drone imagery. Lots of images. Each individual image isn't that big, but when you look at the dataset as a whole, it's a lot of images (several thousand) and bundles as a large dataset (10's of GB). So we talked to ScienceBase and EROS about working together to help us alleviate this problem. EROS has EarthExplorer - a user interface that gives access to individual images. ScienceBase is our typical data release mechanism. Both entities are trusted digital repositories - so it seemed redundant to have the large zip files on SB and the same collection of the individual images at EROS available through EarthExplorer.