Skip to end of metadata
Go to start of metadata

For the April 2017 challenge, Peter Schweitzer (pschweitzer@usgs.gov) is presenting on Cooperative distributed spatial search for scientific data: A practical plan to support finding scientific data from many sources inside USGS.

The problem

  •  You're concerned about a specific geographic location, it could be anywhere.
  •  What scientific information does USGS have about that location?
  •  You haven't said you need specific types of information or studies, you just want to know what is available there.

Read more about this challenge and proposed solution 

Watch the recording


CDI would like to host a follow-up conversation at a near-future date convenient for the participants. if you have a question and/or are interested in learning more or participating in making data available to such a search, please leave a comment here or send a note to pschweitzer@usgs.gov or cdi@usgs.gov.


An example map interface:

Caption reads: You are here. Nearby Scientific Data



4 Comments

  1. Peter, I think web service oriented approaches are great, and RESTful APIs are awesome.

    You've proposed a custom search & API approach, so is this because you looked at alternative approaches, e.g. an OGC Catalog Servie for the Web (CSW) search on ScienceBase to return standardized metadata containing standardized service endpoints (e.g. WMS, SOS, WCS, WPS) and found them lacking in functionality or performance?

  2. CSW doesn't tell you data records, it gives you data sets, then you have to delve into the metadata to find links that might take you to any interfaces or downloads that would give you the data.  All that has to happen before you know whether you should care.  I see a need to help people make a decision about whether to explore further without requiring them to work hard to get that information.

    It's quite possible to use a WFS as the back-end service that provides information to this, but you would filter out some of the feature information because, at this point in the process, you only need a little bit of the data.  So WFS gives you geometry, but at this point in the user's experience, they don't need the geometry.  I think it's also valuable to express this information using plain language as much as is possible, because the users won't necessarily be specialists.

    And that's really where I'm going with this–at a particular stage in a user's exploration, what do they actually need?  Let's give them just enough information to make the decision to investigate further or to move on to something else.  Saving them time makes them happier with their experience of our information.  

  3. I think this is a great idea. And you are right that this is a simple REST/JSON wrapper on what WFS is intended to support such that WFS could be a backing service that makes implementation of something like this possible.

    We've implemented something similar where we use the river network as the index / link set... not a more general spatial search. https://cida.usgs.gov/nldi/about

    In that, we try to follow the pattern of a search engine. People expose sets of data they have as big datasets that we can "crawl". We crawl and index things and people can use the search services to find how their own data relates to the hydrographic network. I think doing this at a bit more general level would be great! Maybe Sky's work for a spatial feature registry could provide some of the backing functionality for it too?

     

  4. Yes.  One of the side effects here is that if each data record has a URL, it would not be hard to make a web page that gives links to all of the records (probably to their human-readable formats), and from that, the data records could be crawled by external search engines.  Then Google searches can turn up hits on individual records which can lead people to the database home, the project home, the science center home, the program home, if the web presentation of those records includes that contextual information.  Really this is a way to cast a wider net by which people who might be interested in our data can encounter it even if they didn't know about each of our data sets or collections.

    But I really wanted people to see this as having both low cost of entry and, especially, that it doesn't subsume your data into somebody else's system.  That's the "nobody owns it" part–I'm going for "nobody resents it" (wink)