Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact Thank you for your prompt attention to this matter.
Page tree
Skip to end of metadata
Go to start of metadata

Adding individual taxonomic classifications for each species in a metadata record often ranks up there with entering Entity & Attributes as the least favorite task of the metadata author. When a dataset involves multiple species, the task is more time consuming; when the dataset includes dozens to hundreds of individual species, it's downright daunting. Does anyone want to read an XML metadata record that could easily surpass 10,000 or 50,000 lines with individual classifications? Can downstream metadata catalogs even index a record that long? Can OME perform that many individual classifications and retain the information? The answers: 'highly unlikely'; 'often not'; and 'it depends.'

Generally speaking, it's probably not a great idea to generate a Biological Data Profile (BDP) record that includes dozens to hundreds of full taxonomic classifications for individual species from the dataset. Doing so makes the XML, or its tranformed representation, unwieldy to scroll through and an unpleasant reading experience for an online user. Many metadata catalogs cannot properly index high numbers of individual taxonomic classifications, as the fields overwhelm their indexes; often, when these records are indexed, the taxonomic information will be dropped out in order to prevent indexing errors or system crashes. Finally, the OME can often handle 35, 40, or even 50 individual taxonomic classifications, but performance issues often arise, because the XML being generated behind the scenes for these very large records is traversing the internet with every save and auto-save. Given the investment of time entering these classifications into a metadata record, and the odds that doing so may not result in a user ever seeing them, an alternative approach may yield better results.

A better option might be a two-phased approach that satisfies both the BDP's requirement for classifications, and a more readable version of fuller taxonomic information that won't crash, or be dropped from, downstream metadata catalogs.

Phase 1: Providing a higher-level, shared taxonomic classification in the metadata record

If your dataset deals with taxa from a shared grouping higher up in the taxonomic tree, you can do a single/one-time classification at the highest shared level. You could still use the ITIS automated classification for this, and just retrieve it at a higher taxonomic level (Genus, Family, Order) to get fewer classifications in your record.

For example, if your dataset focuses on bluebirds, you could do this:

Taxon_Rank_Name: Kingdom
Taxon_Rank_Value: Animalia

Taxon_Rank_Name: Subkingdom
Taxon_Rank_Value: Bilateria

Taxon_Rank_Name: Infrakingdom
Taxon_Rank_Value: Deuterostomia

Taxon_Rank_Name: Phylum
Taxon_Rank_Value: Chordata

Taxon_Rank_Name: Subphylum
Taxon_Rank_Value: Vertebrata

Taxon_Rank_Name: Infraphylum
Taxon_Rank_Value: Gnathostomata

Taxon_Rank_Name: Superclass
Taxon_Rank_Value: Tetrapoda

Taxon_Rank_Name: Class
Taxon_Rank_Value: Aves

Taxon_Rank_Name: Order
Taxon_Rank_Value: Passeriformes

Taxon_Rank_Name: Family
Taxon_Rank_Value: Turdidae

Taxon_Rank_Name: Genus
Taxon_Rank_Value: Sialia
Applicable_Common_Name: bluebirds

Similarly, if the dataset looks more broadly at most perching birds, you could just take the classification down to the level of Passeriformes.

If your data deal with species interactions - maybe it's a dataset on aquatic invertebrate preferences of Colorado cutthroat trout - you could do the single classification down to species for the trout, and then do a more general classification down to Order Ephemeroptera for the mayflies, another one down to Order Trichoptera for the caddisflies, and a fourth one down to Order Plecoptera for the stoneflies. That gives the user enough information to know which specific aquatic invertebrates she might find in the dataset, and which are excluded (e.g., sorry, no dragonfly species will be in this dataset). Four classifications as opposed to several thousand individual Species level classifications for the invertebrates is manageable for the metadata author and the tool, harvestable and indexible by catalogs, and digestible by the end user.

Phase 2: Create a simple text file with the individual Genus + Species, and relevant common names, for each species in the dataset

Your metadata has addressed the fuller taxonomic classification requirements of BDP at higher shared levels of classification. Now, you can provide a data user with the full species coverage of your dataset by creating a machine-readable text file (.txt, or, alternatively, .xml) that contains the individual Genus + Species names, and relevant common names, for each species in your dataset. (If using ITIS, providing the Taxonomic Serial Number (TSN) for the species is also very useful). You save that text file and zip it up with your data. Now, anytime a user discovers your dataset via its metadata, s/he can see the general taxonomic coverage, and obtain the full list of individual species included in the data by opening your text file. As a final step, in the 'Range of species addressed' section in OME, you write something like this in the 'brief, general description statement':

With this, a reader of your metadata record will know to consult the .txt file that is zipped with your data to extract the full extent of species covered in your dataset. As an added bonus, you get back 3 hours of your life that would have been spent wrestling 73 species classifications into a metadata record.

  • No labels