Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

In July's Scientist's Challenge, Kate Allstadt and Brennah McVey from the Landslide and Earthquake Hazards Programs present their challenge of making a diverse database accessible!

In this cross-disciplinary research, seismic wave data is applied to processes besides earthquakes, like landslides, debris flows, and floods. But what is the best way to catalog these diverse data types for efficient discovery and analysis?

There are many ways you can contribute to this challenge: suggest ideas or recommendations directly as a comment to this forum (you must be logged in), email Kate or Brennah directly at kallstadt@usgs.gov and bmcvey@usgs.gov, or email cdi@usgs.gov and the CDI coordinators will pass on the information.

 

5 Comments

  1. Depending on what your team has in mind for the end product, the current sqlite database could remain or be updated then build an api around it. That would be great for accessibility. There could also be different views for exploration and basic searching. The miscellaneous data could be addressed by updating the current database by either adding related attributes or links?

  2. I'm wondering what is the approximate volume of data for the different data types?

    I knew of another project that used PostGIS to be able to store points, multipoints, lines, and polygons in a SQL database, not sure if that is something you would consider.

     

  3. From Ray Obuch:

    Hi Kate and Brennah

    This is the approach I would take on selecting a database to meet your requirements gathered from
    the presentation.

    Summary:

    I would use Linux for the Server OS. I use RedHat
    I would use EnterpriseDB for Structured Data piece
    I would use MongoDB or Hadoop for the Unstructured Data piece.

    Why use EnterpriseDB (PostgreSQL on steroids) vs PostgreSQL


    PostgreSQL vs EnterpriseDB Postgres Advance Server
    http://www.enterprisedb.com/products-services-training/products/postgres-plus-advanced-server?quicktabs_advanceservertab=3#quicktabs-advanceservertab

    Requirements:

    1. You have a mix of structured and unstructured data.
    2. Some of your unstructured data is approaching "Big Data". When I see seismic data, imagery data = Big Data
    3. The Database will be serving public facing web portals that may also support middleware ESRI mapping software or
    other middle ware applications as well.

    Cost considerations:

    The big driver here is selecting software where the licensing costs are not astronomical. Its the usually issue associated with
    Proprietary Databases vs. Open Source.

    Enterprise Databases such as SQLServer and Oracle require processor based software licensing since they support
    public access. You can't licensed named users for general user access. Also, you user count varies over time, hence, processor
    based licensing. Oracle licensing for a Dual Processor Quad Core server can cost $55000.00 for the license alone. Enterprise SQLServer isn't far behind.

    So for a single server running EnterpriseDB and RedHat Linux, you'r looking at about $2000/yr per node.

    Data Structure Considerations

    For Structured Data - I would look at supported versions of PostgreSQL such as EnterpriseDB,
    Two Integrator come to mind. Basically EnterpriseDB is like what RedHat Linux is to Linux.
    Some costs for these supported services : $1000.00 / node regardless of processors and public or internal access.


    Supported by Integrators such as:

    Carahsoft
    http://www.carahsoft.com/vendors/enterprisedb

    and

    Oteemo

    Oteemo, Inc.
    Chris Scheich
    Email: chris@oteemo.com
    Phone: 703-282-1636
    Website: www.oteemo.com
    Office Address: 1765 Greensboro Station Place, 9th Floor
    McLean, VA 22102

    *I think Oteemo is on the NASA SEWP contract. Check for Carahsoft as well.

    For the Unstructured Data Piece which can be linked up to PostgreSQL EDB

    I would look at MongoDB and Hadoop

    Both of these Big Data Solutions have integrators that can help.
    This software requires a big more technical expertise around java script etc.

    You guys can build this locally if you have the Linux experience and can get the hardware.
    Cloud is an option, but you will need help in getting this into the cloud.

    One thing that worries me about cloud is .. how long is your cloud lifecycle?
    What happens if you need to move between cloud providers . Who helps with this. Need good
    service level agreements for the cloud.


    What does Energy Program use?


    Internal:
    Oracle Enterprise on RedHat Linux for Internal Use:
    ArcSDE middleware stack for the map services piece.

    Public:
    Oracle StandardEd 2 and RedHat Linux along
    with ArcSDE.

    If our data requirements grew forcing faster processors on the public side, I would consider migration to
    EnterpriseDB on Linux.


    Hope this Helps.

    Ray

    Raymond C. Obuch
    U.S. Geological Survey
    Energy Program Data Management - Oracle DBA
    USGS OEI Oracle Technical Support
    USGS Open Data

    this may be interest as well

    http://www.enterprisedb.com/resources-community/webcasts-podcasts-videos/webcasts/using-postgres-integrate-mongodb-hadoop-and--0

  4. Update from Kate Allstadt:

    After the CDI talk, the hazdev group here who created the USGS earthquake catalog became interested and proposed adapting the infrastructure from the earthquake catalog to make a similarly structured landslide catalog, but with metadata specific to landslides instead of earthquakes. If this goes forward, it would be for more general purposes than just the seismogenic landslide catalog I talked about, it would also be a way to host landslide inventories (e.g. seismically induced landslide inventories), data from individual landslide investigations, and potentially one day real-time data from seismically detected landslides. The intention is to make a universal metadata format like QuakeML so that other agencies and scientists could also contribute their data.

    I think we may try to go the route of that www.sciencebase.gov/drip example you sent for our seismically induced landslide inventories, I'm exploring options for how that would actually happen.

  5. Here is an Open File Report describing the DRIP database in more detail:

    Dam Removal Information Portal (DRIP)—A map-based resource linking scientific studies and associated geospatial information about dam removals
    Open-File Report 2016-1132

    By: Jeffrey J. Duda, Daniel J. Wieferich, R. Sky Bristol, J. Ryan Bellmore, Vivian B. Hutchison, Katherine M. Vittum, Laura Craig, and Jonathan A. Warrick

    From the abstract:

    We created a database visualization tool, the Dam Removal Information Portal (DRIP), to display map-based, interactive information about the scientific studies associated with dam removals. Serving both as a bibliographic source as well as a link to other existing databases like the National Hydrography Dataset, the derived National Dam Removal Science Database serves as the foundation for a Web-based application that synthesizes the existing scientific studies associated with dam removals. Thus, using the DRIP application, users can explore information about completed dam removal projects (for example, their location, height, and date removed), as well as discover sources and details of associated of scientific studies. As such, DRIP is intended to be a dynamic collection of scientific information related to dams that have been removed in the United States and elsewhere. This report describes the architecture and concepts of this “metaknowledge” database and the DRIP visualization tool.