Excerpt |
---|
Semantic Technologies for Integrating USGS DataLightsom, Frances L., Varanka, Dalia E., Schweitzer, Peter N., and Gordon, Janice Problem StatementSemantic Web technologies represent a promising approach for integrating data from multiple USGS data systems to address interdisciplinary scientific questions. The proposed project will test and demonstrate this approach, by integrating data from five USGS data systems into an information foundation appropriate for research on aquatic habitats. A task such as this will encounter numerous challenges, particularly the integration of data with variable formats, characteristics, and meanings of data terms. Theoretical and applied solutions to resolve these problems have been proposed by an emerging community building semantic technology (Berners-Lee and others 2001). Semantic technology is based on a data model for specifically and unambiguously describing data subjects and their relation to other entities. Specific nodes of information are programmed to link to each other according to formal semantic rules provided by an ontology (see Noy and McGuinness 2001). These automatically created networks of knowledge can access any part of their structure so that information users can query and customize the data. These functions serve to more precisely integrate data and convert information from one form to another, and thus allow a more complex context of meaning to develop around data. When connected over the Internet, these networks are often called the Semantic Web or linked data. ObjectiveThis proposal aims to develop and test the semantic approach to data integration by focusing on the problem of fish habitat modeling. Effective prediction of the abundance of particular species at particular locations is a primary objective of both ecology and natural resource management. Better knowledge of aquatic fish ecology and habitat requirements and improved tools for assessment and planning are needed to help conserve and rehabilitate populations throughout their native range. USGS scientists working on the National Fish Habitat Action Plan (http://www.fishhabitat.org) and aquatic aspects of the GAP Analysis Program (http://gapanalysis.usgs.gov) have these goals: (1) develop empirical species--habitat models that effectively predict the potential of specific stream reaches as habitats for important fish species, (2) describe the predicted distribution of habitats of various qualities, and (3) compare predictions with observed fish abundances. The resulting models, data, and tools will help managers assess the status of their stream habitat resources and prioritize conservation efforts. Evaluation of the model structure and predicted habitat distribution will also provide insight into the suite of conditions that best support important fish species and how those conditions vary within and between watersheds. Currently the research is conducted by discovering and collecting data, converting it to compatible formats, and using GIS systems to combine the data and create a model. We propose to investigate whether semantic techniques could automate and expedite the data discovery and integration, producing an information foundation for project scientists. The proposed semantic demonstration project will produce an information foundation for fish habitat research that will be a “mashup” of data from multiple USGS data systems that are fragmented among the former USGS Divisions:
MethodsThe proposed approach to semantic system development follows prototypes being implemented for Data.gov by researchers from Rensselaer Polytechnic Institute and Stanford University (see http://www.data.gov/semantic). The approach is iterative, with the stages diagrammed in Fig. 1.
Stages in the prototype development
We propose to complete one cycle of the iterative process by undertaking the following tasks:
Anticipated Outcomes1. Access points for querying integrated use case data sets ImplicationsIf successful, the prototype will bring insights to the USGS science community regarding advantages and disadvantages of using semantic technology for scientific monitoring, modeling, and research. BudgetConsulting Costs: $4,000 --Fees for leadership at a workshop and preparation. --Travel/Mileage Costs Project Team Costs: $12,000 --Travel Costs --Salaries: in-kind Hardware + System Admin Time: in-kind from CSAS Total Costs = $16,000 ReferencesBerners-Lee, T., Hendler, J., and Lassila, O. 2001. The Semantic Web. Scientific American, May 17, 2001, available online at http://www.scientificamerican.com/article.cfm?id=the-semantic-web Brady, S.R., Sinha, A.K., and Gundersen, L.C., editors, 2006, Geoinformatics 2006 - Abstracts: U.S. Geological Survey Scientific Report 2006-5201, 60 p. Section 1 (p. 1-5) have a number of abstracts semantics and ontologies for geosciences. Noy, N., and McGuinness, D.L. 2001. Ontology development 101: A guide to creating your first ontology. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and Stanford Medical Informatics Technical Report SMI-2001-0880, available online at http://www.ksl.stanford.edu/KSL_Abstracts/KSL-01-05.html Sinha, A. K., Malik, Z., Rezgui, A., Barnes, C.G., Lin, K., Heiken, G., Thomas, W.A., Gundersen, L.C., Raskin, R., Jackson, I., Fox, P., McGuinness, D., Seber, D., and Zimmerman, H. 2010. Geoinformatics: Transforming data to knowledge for geosciences. GSA Today, v. 20, no. 12, p. 4-10., available online at http://www.geosociety.org/gsatoday/archive/20/12/article/i1052-5173-20-12-4.htm |