Skip to end of metadata
Go to start of metadata

Metadata Precedence

The Precedence of Different Levels of Metadata within the Stack Developed Around Unidata Technology

There are a fistful of ways to specify metadata when using things like NetCDF, CF conventions in a NetCDF File, NcML, THREDDS and ncISO, which can lead to confusion. As it stands now, there is substantial overlap in the possible semantics that can be expressed in different places in the technology stack and no clear precedence model (or even a naming scheme/controlled naming conventions for the different metadata tags). Folks are working on it at Unidata, NOAA, USGS, and other places, but this is not a done deal. This wiki page lists some of the more common methods, discussing what the end-user will see if one method overrides another within the stack of technology.

  • One can specify metadata about data content within a NetCDF file as part of the header of the NetCDF file, via attributes. How one specifies metadata within a NetCDF file is relatively open. One can choose to follow the Unidata and/or CF conventions for this, which dictates that some pieces of information (metadata) need to appear. Conflicts between the two forms of specification are possible, and precedence of one form over the other is not well established. One can use a mix of CF and non-standard conventions, which can create really difficult problems.
  • The data in the NetCDF file is defined using variables which can themselves specify metadata-like details. Further, some software dynamically generates metadata based on its reading of the actual data content.
  • One can specify metadata about data content in an NCML file. This metadata is often used to aggregate data content in different NetCDF files into a single "virtual" dataset, but can also be used to override the metadata within a NetCDF file.
  • When one serves data via a THREDDS server, this is done by specifying the identity of a dataset within the catalog.xml file associated with the THREDDS server.
    • The catalog.xml file can point to a single NetCDF file's data content, defining metadata that potentially overrides metadata given in the header of the NetCDF file.
      • This can be done with NcML tags
      • This can be done with THREDDS tags
    • The catalog.xml file can also point to an NCML file, referencing all or part of the data content alluded to therein--again associating new metadata that potentially overrides the NCML-given metadata.

The folks at NOAA Geo-IDE have done a lot of work and thinking (hopefully not in that order) on this type of issue, with a particular eye towards automating the translation of metadata from one standard of expression to another (such as from NetCDF CF to ISO 19115-2). See their wiki article on the subject for more. There's a figure at the bottom of this page that's particularly relevant. Note that the figure references CF (Climate and Forecast conventions) at the bottom of its precedence sequence. CF is a set of conventions for specifying metadata in general and is used in the context of their wiki to refer to the metadata within the NetCDF file (I think).


Cross-walks between Metadata Standards

An important benefit of clearly establishing a precedence model for metadata is to enable the relation of one format/model of metadata to another. Unidata has made recommendations for defining metadata that allow its tools (and those of others in this field) to automate translation into other metadata formats, which greatly enhances discoverability of the data content. The metadata tags in these recommendations are drawn from those that are part of their own NetCDF conventions and from the CF conventions. Other metadata formats that these tools can translate into include

A somewhat old and incomplete table shows correspondance between metadata tags from different standards.

probably need to get into ncISO/THREDDSISO/UDDC and all that here. not sure if that should be a separate page.

Best Metadata Writing Practices for Data Use vs. Data Discovery

The NOAA Geo-IDE wiki points out that the CF conventions are optimized for data use (access and processing). The metadata, although relatively sparse, is sufficient to automatically derive numerous other pieces of metadata relatively easily. See this section for an example. The upshot is that while many of us are using CF as a great starting point for describing our data content, we might step back and think about augmenting that metadata to help improve discoverability of our data content.

In addtion, one can specify metadata content in any of a number of places, some better than others depending on the context. It would probably be good to find or develop some guidance on best practices about this for our data producers. NOAA Geo-IDE is probably a good place to start. There are a number of large data production projects (UAF, IOOS) that have aligned themselves with the operational principles developed within Geo-IDE. Unidata also provides guidance (PowerPoint, web site) on best practices for NetCDF in particular.

How does OGC Catalog Services for the Web (CSW) relate to all this?

By developing metadata that can at ultimately be expressed according to ISO 19115, we ensure that any CSW-compliant client software can discover and access our data services (see this subsection of the Geo-IDE wiki . This is in addition to ensuring to say nothing of all the client software that can deal with data servers that converse according to ISO 19115, or any of the other metadata formats that one can cross-walk to from there.