Skip to end of metadata
Go to start of metadata

(Like sedimentary layers: the most recent meeting is on top, then reverse chronological to oldest meeting at the bottom. No folds or faults so far.)

Mon. November 4, 2019, 2pm - 3 pm EST

Topic: identifying different types of metadata.

Participants in the FAIR Roadmap workshop once again reminded ourselves that "metadata" is a word that refers to a variety of things. When each of us says "metadata" we know exactly what we're talking about, but listeners might think we're talking about something else. That miscommunication can make it hard to collaborate.

It would be good for USGS to develop accepted terms for different kinds of metadata. If the Metadata Reviewers Community agreed on those terms and their meanings, we could lead USGS just by the way we talk and write. Let's see what we can agree on!

And, no, the FAIR Roadmap workshop report isn't ready to be shared yet. 

Links suggested at the meeting

ESIP work: 

https://github.com/NCEAS/metadig-checks/wiki/Clarify-Nomenclature-and-Revise-Check-Names

https://blog.datacite.org/metadig-recommendations-for-fair-datacite-metadata/

https://github.com/NCEAS/metadig-checks/issues


http://jennriley.com/metadatamap/ (metadata visualization)
Also from the Digital Curation Centre: http://www.dcc.ac.uk/resources/metadata-standards/list%20?page=1 

Example record from GenBank: https://www.ncbi.nlm.nih.gov/genbank/samplerecord/


Example of the ISO 19110 (collection level) - 19115 (item level) relationship that may help bridge the separate-but-related persistent identifier issue: https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19110_(Feature_Catalog)


Highlights of discussion

Yes, there are many kinds of metadata, and many opportunities for miscommunication. We need to simply be clear about what we are talking about everytime we talk about metadata. One handy way of being clear is to say "standard compliant metadata."

Examples: metadata in the DOI (digital object identifier) record; XML format records for FGDC or ISO metadata; ScienceBase metadata that appears on a landing page; publications metadata that goes into a Pubs Warehouse database; the version of metadata used by google data search or data.gov; encapsulated metadata inside data records, like in netCDF or GenBank (not really self-documenting because you have to put the metadata into the data record); use metadata VS discovery metadata VS administrative metadata; metadata in SDC that is used to give us credit for having released data; data dictionaries!

We had a long conversation about the desirability of identifiers for metadata, since a single DataCite DOI might lead to a landing page with multiple metadata records. The use case is keeping track of whether revised (maintained, updated, improved) metadata records refer to a new data set or the same one that a previously harvested metadata record was for. This also can help with the need for an authoritative source, when down-stream metadata users are creating their own versions of our metadata records. We're not sure if using the ISO format will solve this problem. It might be something that we could do in-house, building on the expertise of the DOI tool and ScienceBase.

Persistent identifiers would be very useful for other things as well as metadata.

Mon. October 7, 2019, 2pm - 3 pm EST

We'll talk about the new FSP guidance, a revision of Guidance on Documenting Revisions to USGS Scientific Digital Data Releases.


Note: there was a request for examples of revised data releases in ScienceBase. Here are links to a few examples: 

https://doi.org/10.5066/F7Q23XDH
https://doi.org/10.5066/P9RRBEYK
https://doi.org/10.5066/F77M076K
https://doi.org/10.5066/F79C6VJ0
https://doi.org/10.5066/P9Q8GCLM 

Mon. August 5, 2019, 2pm - 3 pm EST

Madison will lead a discussion about the proposed page on the Data Management Website about reviewing metadata.

Reviewed user stories for Reviewing Metadata page on DM Website:

    • As a technical reviewer of someone else’s metadata, I need to know where to start, how to approach the review, what to look for, so that I can  know that I’ve done a good enough job in my review.
    • As a dataset creator who is writing metadata for my dataset, I need to know what kinds of things the reviewer will be looking at, what they will want to see in the metadata, and how they will judge what I’ve written, so that I can do a good job and so that the review process doesn’t take too long.
    • As a manager who needs to make sure the data releases in my center are reviewed and approved well and in a timely fashion, I need to understand what “reviewing metadata” entails so that I can assign the right people to the job and so that I can have realistic expectations about how much time and effort that job will take.
    • As a member of a group developing policy for my organization, I need to know what aspects of metadata are consistent enough that they can be reviewed with rigor and which are harder to review, so that our policy can support a realistic work process for both producers and reviewers of metadata.

Current resources on Peter's site:

Discussion:

  • Group has experience with people needing reviewers and guidance on reviewing
  • Paul: Considered developing more specific checklist for sections, with boxes to tick
    • Ie. Is the date in the right format?
    • Dennis: Has to stay general - identify common things to address and include
    • Tom: Concerns that the checklist would get too long
  • Would it be worth it to put examples/checklists more specific to individual centers/mission areas on the website or should we keep it general?
    • Public facing website keeping it general, thematic topic lists “behind closed doors” on the confluence site
    • If some mission areas want to develop additional standards, they could be supplemental documents to be included on the wiki
  • Checklists for review on Confluence have not been updated since 2016 (at least for Woods Hole)
  • How to find reviewers/time taken is too specific for public-facing website
    • Agreed, and data review takes a substantial amount of time - try to emphasize that since the process can be lengthy, the scientist/reviewer should start early
    • Include this information in the Plan and Metadata Review section of the DM Website
  • Concern that content under data release is currently buried (result of Druple/wret migration)
    • Creating a new Reviewing Metadata page should help with making some of this content more accessible.
  • Different processes for metadata authors that are new and those that are experienced. Depends on the type of data release as well - different scientific subjects will contain different elements.
  • The level of detail of the review depends on the experience of the author
  • Providing tools/example workflows for metadata review?
    • Peter - converts XML to TXT and then pastes in a word processor and adds comments there
    • Sofia - looks in XML at text editor and metadata wizard - use screenshots to show errors and where edits need to be made
    • Colin - in the review tab of metadata wizard, a button will generate a “pretty print” (word document that is exported, opens in word directly from metadata wizard) of the XML, has a list of the schema errors. This is usually the document that goes into XML view. Combines structural review with content review.
    • Mikki - Review checklist (word document) has a place to make comments that looks like review memo, goes into IPDS along with word doc of metadata with comments
    • Andy: Preview of metadata wizard is copied and pasted into a word doc, have places to put comments and replies (reconciliation).
    • Kitty: Uses OME - when you download it comes out as a couple different formats, likes outline format that it creates. Writes comments and scans it back to the creator. Outline format is HTML.
    • Dennis: Copy/past one of the formats out of metadata parser and copy/paste into a doc, includes validation as part of it. Nice to provide mp as an option that you don’t have to download
  • Other tips/tricks for new reviewers
    • Something that could be noted on the dm website
    • Notes for finding a reviewer - note that the reviewer should have written a metadata record and be familiar with the format
    • (difference between content and structural review)
    • Someone new might want to write a metadata record as a training method
    • People who had never reviewed metadata were asked at speaker’s center - can have teams on a release that has people with different skill sets - some people review the GIS component, someone else with water quality experience, etc.
    • Emphasize that the metadata review can be split up by expertise (someone familiar with metadata, someone familiar with subject matter, someone familiar with GIS)

Mon. July 1, 2019, 2pm - 3 pm EST

What did we learn from our breakout session at the CDI Workshop? The notes page is here: https://tinyurl.com/CDI0605-Lightsom

We discussed the answers to the first question from the breakout session, and decided that (1) some clean-up is needed before this is a FAQ, and (2) we have at least two FAQ's, one for beginners at writing metadata, one for experienced metadata writers who are starting to review metadata. The information for beginning metadata authors should be on the USGS Data Management Website, but we're not ready to provide it yet. We will begin by collaboratively developing the FAQ for metadata reviewers in the forum section of our Confluence place. Leslie agreed to put in a first topic as an example, and to invite others to work on it.

Another topic was the frequent need to coax people into writing good metadata, or metadata at all. Fran was reminded that the requirement for metadata comes not from Reston but from OMB, the White House Office of Science and Technology Policy, and probably also the National Archives and Records Administration. Fran wants to look into those policies to see if they are useful for coaxing metadata authors, perhaps because they spell out the purposes of metadata.

Other resources we use: "Ten Common Mistakes" was useful but probably needs updating. Tom Burley has materials from the NBII metadata training that he can share. Several of us like the graphical representations at the FGDC website.

Mon. May 6, 2019, 2pm - 3 pm EST

This month we will test the technology for virtual participation in our breakout session at the June CDI Workshop

Join Zoom Meeting
https://zoom.us/j/472209309

One tap mobile
+16699006833,,472209309# US (San Jose)
+14086380968,,472209309# US (San Jose)

Dial by your location
        +1 669 900 6833 US (San Jose)
        +1 408 638 0968 US (San Jose)
        +1 646 876 9923 US (New York)
Meeting ID: 472 209 309
Find your local number: https://zoom.us/u/acLWIyw37O

We will also have a presentation by VeeAnn Cross and Peter Schweitzer about how the USGS Science Data Catalog could use the keywords in metadata records to improve data discovery, and what that means for those who are authoring, reviewing, and revising USGS metadata records.

Mon. Apr. 1, 2019, 2pm - 3 pm EST

Sheryn demonstrated the metadata collecting system used by MonitoringResources.org to encourage discussion of how it might be simpler and easier to use, as well as good ideas that the rest of us can copy. Sheryn's slides are available.

MonitoringResources.org is part of the Pacific Northwest Aquatic Monitoring Partnership (PNAMP) and uses the metadata to provide an index of monitoring activities, especially the ecology of streams of the U.S. Pacific Northwest, and the procedures, protocols, and monitoring designs that are in use. Currently Sheryn reviews the metadata that are submitted through the site and used in the index. The site could be used for other types of monitoring and other regions, but there are not currently enough metadata reviewers to handle a larger volume of submissions. 

Community discussion included questions about connections with the USGS Quality Management System (QMS) and for using the MonitoringResources.org metadata elements to build ISO standard metadata records. Community members are welcome to email Sheryn with additional ideas about ways the site could be made simpler and easier to use.

Mon. Mar. 4, 2019, 2pm - 3 pm EST

We are satisfied with the answers we received from FSPAC and glad they are posted on the FSPAC FAQ pages. We might ask some more questions later.

Related to our long-term goal of providing more complete guidance for data and metadata review, as well as tips and tricks for data and metadata authors, we agreed to host a breakout session at the 2019 CDI Workshop. We hope participants will bring questions that we can answer or at least discuss, which will be useful in the future for developing responsive online guidance. A fall-back agenda would be to step through the review checklists and talk about how we address each item on the list. Many of our members will be unable to travel to the workshop, so a virtual participation option is important. Fran agreed to put the session proposal on the Wiki, immediately, since it was due last week.

We discussed what location parameters need to be in a metadata system as opposed to being in the data itself, and came to no answer that fits every case. One guideline is that a metadata system needs to provide the parameters users need to locate the data in the associated database.

Ed mentioned a Jupyter Notebook that he, Erika, and Stephanie have developed for quick evaluation of large data files. The tool is available for others to use, and will be demonstrated at a future meeting, and at the CDI workshop. If you would like to try it sooner, contact Ed Olexa.

The ISO Content Specs project will be hosting workshop sessions on Friday of the CDI Workshop. The sessions will focus on collecting requirements for metadata specification modules, most likely modules for experimental data, computational data, and observational data. We are encouraged to plan to stay through Friday, if we can travel to the workshop.


Mon. Feb. 4, 2019, 2pm - 3 pm EST

Two major questions came up at today's meeting that we would like to pass along to the FSPAC subcommittee and/or the BAOs for guidance. 

Question 1: Is there updated guidance on the volume of data necessary to trigger a separate data release?

Discussion Notes:

  • Some authors have been caught off guard when they are told they need to do a data release during the publication phase of their project. They were under the impression that having the data in the paper would be sufficient.
  • A couple of years ago, the idea was floated that if data could fit in a table within a manuscript that it wouldn't need a separate data release. Someone mentioned a 3 page limit, however, specific guidance never came out. 
  • There is likely a reason that this was left vague. Sometimes just because data could meet that threshold doesn't mean that it should. It is often necessary to have more knowledge about the specific data when determining if a separate data release is necessary. Even if the volume of data is small, they still might benefit from additional documentation that comes with metadata. We certainly don't want to bypass our due diligence with data by just stuffing it into a manuscript. 
  • Do we also need to ask SPN if they have a size threshold for what can be included in the main body of a publication?
  • Kevin Breen would always say to publish it and making it a data release. Anytime an author would ask if they needed to do a data release, Kevin Breen would say yes. Always wanted to see it as a data release. Never had anything really short (e.g. just a few lines). 
  • If an author just has a few samples, they could just put it in the paper. Often it is still a judgement call – you know if it is overkill to have a data release.
  • Best practice is to address this question during the proposal review process. Authors should just plan to do a data release from the start of the project.
  • Sometimes these things don't come out until the publication phase and then it's a hard argument to make when funds are all used to do a data release. Put a DR in the proposal bc most projects it would make sense to do one. 
  • Tom Burley: Our cooperators really like DRs because they are citable and they don't have to go into NWIS to get the information.

***UPDATE***

Answer from FSPAC: 

The original guidance about tables/pages has been removed, and more flexibility is now available to authors. There was, in the past, a conversation with OCAP that involved page numbers in relation to data. Now, there is an FAQ that addresses it - refer to the “with or without a data release” FAQ. Having the data in the paper is ok - however, if data is big enough to be moved into a supplemental section of the paper, it has to be a data release.


New FAQ (from FAQ): 

Is there a size cutoff for data tables within the body of a publication or in associated appendixes and supplemental files?

  • The size of a data table presented solely within the body of a publication depends on publisher requirements. Journals and other outside publishers generally have a size cutoff for within-article tables. For USGS series publications, authors should contact the Bureau Approving Official (BAO) or local Publishing Service Center Chief during the early stages of product development for guidance on the maximum sizes for tables and associated appendix and supplemental files. Although an appendix or supplemental file may contain a summary of the data that support the publication, the complete dataset may not be contained solely within an appendix or supplemental file, regardless of size.

________________________________________________________________________________________________________________

Question 2: How should authors reference data that is not publicly available when writing a manuscript?

Discussion Notes:

  • Collaborations between USGS scientists and private entities has caused some confusion about how to publish papers when the private entities are responsible for the data. If the private entity is responsible for publishing the data but they haven't been published at the time that the USGS scientist is set to publish a manuscript of the analysis of the data, how should they reference the data? Likewise, if the data will remain proprietary, what is the best way to reference them in the paper?
  • For sensitive data, we can reference it in manuscripts but can't release it. Same should be true to proprietary data and data that is the responsibility of other entities. Then, all of the data sources are cited and it becomes the responsibility of the user to negotiate with the external entities to get the data.
  • People on the call discussed the importance of establishing roles in the DMP. We may need better guidance for what questions to ask during the data management planning phase to address data sharing and private data.
  • Not only an issue with collaboration with private entities, also an issue with BLM collaborations since BLM doesn't have to put their data through peer review. 
  • Does FSP site have information on how to handle private data? People are having some trouble finding information as the site is being migrated to the new Drupal environment. There used to be an FSP page that had tangible scenarios related to this topic (e.g. if this is the case, do this...) Is this decision tree still available?

***UPDATE***

Answer from FSPAC:

Refer to https://www2.usgs.gov/fsp/guide_to_datareleases.asp for updated guidance.

For example, ‘data statements’ can be included in the manuscript. (In the FAQ, look for: “What statement(s) must be used to indicate the availability and, if applicable, the location of data that support the conclusions in a publication, and where should the statement(s) be placed?” for further information) 


New FAQ (from FAQ): 

What statement(s) must be used to indicate the availability and, if applicable, the location of data that support the conclusions in a publication, and where should the statement(s) be placed?

    • Below are examples of statements to be used in various cases to describe where the data reside or to clarify disposition of any data or reasons for partial release or lack of release. Add the applicable data statement(s) to the internal USGS Information Product Data System (IPDS) Notes Tab and the publication manuscript before peer review.

      Insert appropriate text for bracketed information and retain parentheses where indicated. See “Data Citation” and USGS Publishing Standards Memorandum 2014.03 for additional citation guidance.

      Case 1. Data are available from an acceptable repository (includes USGS data release products). 

      • IPDS: Data generated during this study are available from the [acceptable repository], [DOI URL].
      • Manuscript: Data generated during this study are available as a USGS data release ([author], [date]).
                            Data generated during this study are available from the [acceptable repository] ([author], [date]).

      Case 2. Data are partially available from an acceptable repository.

      • IPDS: Data generated during this study are partially available from the [acceptable repository], [DOI URL]. Funding for this study was provided by [responsible agency]. [Describe funding and responsibility for data release].
      • Manuscript: Data generated during this study are partially available from the [acceptable repository] ([author],[date]). Funding for this study was provided by [responsible agency]. [Describe funding and responsibility for data release].

      Case 3. Data are not available at time of publication.

      • IPDS and Manuscript: At the time of publication, data are not available from the [responsible non-USGS agency].

      Case 4. Data either are not available or have limited availability owing to restrictions (proprietary or sensitivity).

      • IPDS and Manuscript: Data either are not available or have limited availability owing to restrictions ([state reason for restrictions, such as proprietary interest or sensitivity concern]). Contact [third party name] for more information.

      Case 5. Data generated or analyzed are included in the main text of the publication.

      • IPDS and Manuscript: All data generated or analyzed during this study are included in the main text of this publication.

      Case 6. Data were not generated or analyzed for this publication.

      • IPDS and Manuscript: No datasets were generated or analyzed for this publication.


_______________________________________________________________________________________________________________________________________________

A few months ago, this group talked about ways to improve the metadata/data review guidance documents. What are the next steps to get things updated? Can we address this at a future meeting?


Mon. Dec. 3, 2018, 2pm - 3pm EST

We will GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. If we need to share screens, use Internet Explorer to go online at https://gstalk.usgs.gov/64914.

Proposed agenda:

  1. Introductions: Welcome new members and address any questions they bring with them. 
  2. Follow up on the Nov. 8 email thread about using links to publications in the Process Step. 
  3. Are there items from last month's discussion of checklists for data and metadata review that need follow-up?

Mon. Nov. 5, 2018, 2pm - 3pm EST

We will GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. If we need to share screens, use Internet Explorer to go online at https://gstalk.usgs.gov/64914.

Proposed agenda:

  1. Introductions: Welcome new members and address any questions they bring with them. 
  2. Review checklists for data review and metadata review and discuss how we review data and metadata in our science centers. The checklists are online at https://www.usgs.gov/products/data-and-tools/data-management/data-release#checklists.

Report from meeting:


Tamar started two new Google Docs here:




Comments about Guidelines for Metadata Review: 


https://docs.google.com/document/d/1g14C5fusPeGHP3mxERtBLUgxoz-sxqkbnehiJzD98cc/edit




Metadata review tips:


https://docs.google.com/document/d/1IqAl70nKGTK71KL1gLvihMrrmfErcBZKVefWozJx8E8/edit




Please feel free to add comments and content.


General Comments about the document “Guidelines for Metadata Review



  • Could be helpful for new reviewers: recommendations on how to start, what are the steps in the process. For example, copy/paste into Word doc for comments
    • Benefit of Word doc – can add comments, include doc in reconciliation materials in IPDS
    • Can be helpful to view with stylesheet as opposed to XML tags
  • A supervisor checking a data release wondered if there was a more specific checklist for supervisors
    • In addition: the supervisor wasn’t sure how to navigate multiple data and metadata files bundled together on landing page of ScienceBase data release.
    • Another meeting participant noted that supervisory review isn’t technically required, just data and metadata peer review.
  • A couple comments addressed the fact the doc is very general and high-level:
    • Meeting participant noted: splitting it into sections would make it FGDC-specific and document is meant to be broad and applicable to all metadata
    • MP validation is often the first step for metadata review – information about MP isn’t separated out from intro paragraphs or marked by a bullet – should it be more emphasized here?
    • Suggestion: maybe split bullets into categories by CSDGM section?
  • A participant’s example of their process: they don't reference checklist for most reviews – instead, read metadata as a whole, check for coherence, completeness etc.
  • Comment: it would be nice to be able to actively use this doc as a checklist, with spaces for checks next to the bullet points
    • Maybe have a box at the end that says “see attached sheet for additional checks”? Or a note that says this is the start of a checklist?
    • Maybe an editable PDF?
    • Participant response: this might suggest to users that the bullet list is a comprehensive check. The language in the doc is important to notice: “For example, verify that:”

















Specific Comments about bullet points in the document


  • Questions about recommended elements in the title and whether they are required
    • Note: important to keep in mind that there can be outliers
    • Example: can be helpful to include when there is cooperative work for various agencies
    • A few authors didn’t want to put the “when” in the title because they didn’t think it was necessary there (and might be unclear)
    • Why is “who” included in the title?
    • Possible solution: add “if applicable” to all the elements of the title (instead of just “scale”)
  • Question about coordinate system and datum – what to do if you have an aggregation over many years – do you generalize coordinate system information
    • Example from Woods Hole – data compilations of datasets going back 50 or 60 years. There’s a variety, can sometimes make guesses based on dates collected
  • Note about entity and attribute section review: you can use the Metadata Parser to output the entity/attribute section as a CSV table
  • Idea: create a 3rd document that contains tips and tricks that could be helpful for reviewers
    • Can contain: options of secondary validation in MP, different ways people can work on a review (e.g., Word doc methods)
  • Future discussion topic: keywords (wait until Peter Schweitzer can join us)
  • Should contact information check be included in the checklist? 
    • Content in these fields is variable, so may not fit in the list a bullet point check. 
    • Some centers have this standardized. 
    • Future topic of discussion?
  • Discussion topic: network resource name field – what do people use?
    • This can be helpful for large data releases with complex structures, multiple child items
    • Note: this is the default in the Metadata Wizard, so it's what folks often use
    • Woods Hole Coastal and Marine Science Center enters a direct download link to the data and the URL of the child item.
    • Many others use data release DOI (that resolves to landing page) for the networkr field
  • Question: how many child items do most data releases have?
    • key consideration: does adding child items/folders improve accessibility? Varies by case
    • Majority have none or only two/three
    • A few have subfolders, but this is rare


Mon. Oct.1, 2018, 2pm - 3pm EDT

We will GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. If we need to share screens, use Internet Explorer to go online at https://gstalk.usgs.gov/64914.

Proposed agenda:

  1. Introductions: Welcome new members and address any questions they bring with them.
  2. Discussion: What is the current state of metadata for data releases in USGS? Is there anything we could do as a community to improve the situation?
  3. News: Status of ISO Content Specs project, new effort to enable FAIR principles in USGS, what else do we know?


Report from meeting:

Items from the discussion: Generally, the quality of USGS metadata is much improved in the past two years. Several science centers are trying to enlarge the pool of qualified metadata reviewers. Data reviewers are generally required to be familiar with the data type (geospatial, for example) but might be experienced data users or data producers. Metadata reviewers need specific expertise, and the time required to develop this skill depends on multiple factors, such as the background of the new reviewer, their workload, and the degree of variety in the data they will need to review. Similarly, the time an experience metadata reviewer needs for a single job can vary from days to months, depending on the complexity of the data and metadata as well as their condition (number of errors). It is difficult to teach our scientists to write good metadata, even something as simple as consistently providing the necessary information in the data set title. Abstract and purpose are also hard to teach.

Community actions that would help:

  • Share a list of qualified reviewers for specific data types so that reviews can be shared among science centers when nobody qualified is available at the home science center.
  • Post a list of people who can answer questions, to help new reviewers get started.
  • Collect and share training materials. Several science centers have some to share. This is being done on a confluence page at this site.
  • Revisit our checklists for data review and metadata review and discuss how we do each element. (Planned for Nov. 5)
  • Sharyn will demonstrate the metadata collecting system used by MonitoringResources.org to encourage discussion of how it might be simpler and easier to use, as well as good ideas that the rest of us can copy. (Planned for Jan. 7)

Mon. July 2, 2018, 2pm - 3pm EDT

We're trying GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. To view the slides, go online at https://gstalk.usgs.gov/64914.

Proposed agenda:

This week we need to advise the USGS Web Re-engineering Team ("WRET") on the proposed metadata requirements for the old "legacy" data sets that have been traditionally released on USGS web sites. Lisa Zolly will introduce the topic.


Mon. June 4, 2018, 2pm - 3pm EDT

We're trying GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 55793. Also online at https://gstalk.usgs.gov/55793.

Proposed agenda:

  1. Questions, news or announcements?
  2. Ray Obuch will provide an overview of the new Department of the Interior Metadata Implementation Guide. It uses a lot of the same words we use, but perhaps for different meanings. Do we want to help USGS get the implementation right?
  3. I suggest that we look at the proposed FAIR metrics, which have a lot to do with metadata. These links from Leslie:

FAIR metrics: https://github.com/FAIRMetrics/Metrics

Leslie and I prefer this view: http://htmlpreview.github.io/?https://github.com/FAIRMetrics/Metrics/blob/master/ALL.html
A preprint: https://www.biorxiv.org/content/early/2017/12/01/225490

Report from meeting:
The meeting started with unfortunate delays caused by a typo in the calendar item. Fran apologizes.

Ray's presentation was very interesting, although the connection to the metadata review process was not clear.

After Leslie's overview of the FAIR metrics, Peter shared this link about a similar but different way of thinking about the problem, "5 Star Open Data".

Mon. May 7, 2018, 2pm - 3pm EDT

We're trying GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. To view the slides, go online at https://gstalk.usgs.gov/64914.

Proposed agenda:

Burning questions? Metadata nightmares? Brilliance to brag about?

Barbara Pierson will join us to continue our discussion of the USGS Genetics Metadata Working Group (wiki page, Genetics Guide to Data Release and Associated Data Dictionary).

The project to create metadata content specifications for easing USGS transition to the ISO metadata standard has started planning their workshop. We hope for an informal progress report.

Report from meeting:

GSTalk did not work for sharing desktops. We suspect that we need to be more conscientious about installing updates frequently.

We had a good discussion about the Genetics Guide to Data Release, and agreed to provide some comments to enable the working group to take this document to the next stage. Everybody, please try to do this before our June 4 meeting!

The content specifications project is having trouble finding a workable date for their workshop. They are thinking of a modular approach to the specifications, starting with a basic module that includes identification and discovery information, a biological module, a process steps module, and at least one geospatial module. Any given metadata record would use the set of modules that were appropriate. It seems that quality descriptions might be part of multiple modules.

Mon. April 2, 2018, 2pm - 3pm EDT

We're trying GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. Also online at https://gstalk.usgs.gov/64914

Proposed agenda:

Burning questions? Metadata nightmares? Brilliance to brag about?

Let's look together at this wiki page created by the USGS Genetics Metadata Working Group, especially the Genetics Guide to Data Release and Associated Data Dictionary.


Report from meeting:

Burning questions centered on review and release of "legacy data sets" that no longer are supported by the project that created them, yet still have scientific value. Some centers are using the IPDS process, while being careful in metadata to identify limitations of the data. Others are updating metadata that was published with old data sets, when possible with the participation of the originating scientists. Advice: be sure to add a process step to the metadata record when you modify it, and update the metadata date. Unresolved question: can legacy data go into the WRET Drupal environment?

About the Genetics Metadata Working Group materials: much of this seems more general, so that a similar document might be useful beyond the genetics community. Dennis will ask Barbara to meet with us next month.

Topic for a future meeting: How can will develop a collection of quality statements to suggest, similar to those in the Genetics guide? Do we want to provide examples, or the sample questions that the statements should answer? Madison is interested in this.

Mon. March 5, 2018, 2pm - 3pm EST

We're trying GSTalk again this month. 703-648-4848 (or 1-855-547-8255) code 64914. Also online at https://gstalk.usgs.gov/64914

Proposed agenda:

Burning questions? Metadata nightmares? Brilliance to brag about?

News: The ISO metadata project going for a full proposal to CDI, the CMGP meeting with USGS Thesaurus team, what else?


Report from meeting:

Leslie will be adding some email discussions about metadata topics to our community forum.

The project team will be submitting a full CDI proposal to create specifications for USGS data products so that ISO standard metadata records can be created in tools like the ADIwg metadata toolkit. The proposal is due at the end of March, so next week community members will have a chance to look at a draft of the proposal and suggest improvements.

Coastal and Marine Geology metadata specialists had a meeting last week with the USGS Thesaurus team to improve the usefulness of the Thesaurus as a source of metadata keywords that will improve data discovery.

The USGS data management website is improving its page about data dictionaries and would like our comments. You can comment on a draft of the page at https://docs.google.com/document/d/1140npvNsCb-ixQ-e-dDws2AOU7q2yHtk4pJ1_Ce5HHI/edit

Mon. February 5, 2018, 2pm - 3pm EST

Proposed agenda: Demo of the new ADIwg metadata editor by Dennis Walworth and Josh Bradley.

Report from meeting:

Thanks to the FWS WebEx, we had a great presentation and demonstration of the ADIwg metadata toolkit, which is finally fully functional and ready for widespread use. I had thought of it as a way to make ISO19115 metadata, but it can also be used to make the old CSDGM. Our metadata authors can keep using the same tool through the USGS eventual transition to the ISO standard. It seems like it would work well for metadata review, as well, because it can produce an html output that is easy to read. The slides are attached, but the demonstration – you had to be there. Thank you, Josh and Dennis!


Tue. January 9, 2018, 2pm - 3pm EST (note temporary change in schedule)

Proposed agenda: Demo of the new ADIwg metadata editor by Dennis Walworth and Josh Bradley.


Report from meeting:

We were unable to make GSTalk work for the demonstration. We will try again next month.

Eventually, we discussed working together to propose a CDI project to lay the groundwork for use of ISO metadata in USGS. Dennis, Lisa, Fran and Tara volunteered to work on this proposal.

Mon. December 4, 2017, 2pm - 3pm EST

Proposed agenda: That pesky data quality information.

  • Can you share an example of a correct and helpful data quality assessment?
  • Can you share an example of a data set for which you would like advice on how data quality should be stated?
  • How should data quality items be handled in a metadata tool like Metadata Wizard? (Is it counter-productive to suggest boilerplate answers?)

Report from meeting:

At the meeting we talked about items in Madison's collection of Data Quality Documentation Examples. We didn't finish talking about the collection.

Some things that were said include:

Are these all supposed to be good examples? In any case, they were good discussion starters.

Unanswered question: Should information be given only once in a metadata record, or is redundancy useful? Specifically, should quality control measures be described only as a data processing step, with the data quality elements providing only the quality standards used or the resulting accuracy/precision of the data?

It's important to give a definition of how the project identified "outliers" and what was done with them – flagged? deleted? replaced with interpolations? This could go in completeness report, attribute definitions, or logical consistency.

Completeness report should say what is known to be missing from the data, or what is missing intentionally.

Users like a lot of information to evaluate data before they use it.

One approach to a completeness report is a table that provides the number of missing values for each attribute, but it is a large table and some metadata tools might not allow the table formatting.

Logical consistency is a good place to include mismatches between data from different sources that were merged or compiled or tiled into a data set. Do the data always mean the same thing regardless of which record you look at? In some cases it might be more useful to include measures of data quality as data values associated with observations or measurements.

When metadata was mostly used for digital geospatial data, logical consistency was mostly used for topological correctness.

Some possible conclusions:

  • A set of "good examples" only makes sense if it includes information about what kind of data each example would be good for. There is no universal right answer to data quality descriptions.
  • There being no universal right answer to data quality descriptions, metadata tools probably shouldn't suggest any boilerplate.
  • Perhaps, instead of good examples, it would be more useful to provide additional, easier to understand, definitions of each field. What questions should be answered by the information in each field? (The Metadata Workbook did this, but is out of date.)

Mon. November 6, 2017, 2pm - 3pm EST

Proposed agenda: Discussion of the Biological Data Profile, led by Pai, and Erika, and Robin.

Report from meeting:

Pai and Erika shared examples of using the Biological Data Profile for data from Sea Otter Surveys. (See https://www.sciencebase.gov/catalog/item/55b7a980e4b09a3b01b5fa6f.)

Robin raised the question of how to format taxonomy data when the data involve hundreds of species. Lisa said that the metadata would be okay if you just do the taxonomy on a more generalized level and provide a complete listing of taxa that can be downloaded. Validation of metadata doesn't require that taxonomy are complete to the species level.

Robin said that her group is using the CAS registry for identifying chemical substances, which led to a discussion of the usefullness of similar authority files and codesets. We agreed to add the authority files and codelists that we find useful to a list that will be on the Data Management website.

Mon. October 2, 2017, 2pm - 3pm EDT

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Status report on new checklists
  3. Metadata training
    1. Summary of Google form survey results
    2. Do we know enough, or should we collect more data?
    3. What can we do for online training for Metadata Wizard and OME?
    4. What can we do to provide shadowing/mentoring for metadata review?
    5. What else do we want to do?

Report from meeting:

The new checklists were changed slightly by the FSPAC Scientific Data Guidance Subcommittee and sent to the webteam to be posted on the USGS Data Management Website.

Slides summarizing the survey results are attached.

Points made during the discussion:

NOAA training is available for ISO metadata, but it assumes willingness and capability to edit an XML file.

USGS people probably need two levels of training:

  • Beginners who never wrote metadata before.
  • Intermediate learners who need best practices.

Metadata Wizard is in the software release process. The plan is to provide live training at FORT then publish that as a tutorial. We offered to help, when the time comes.

OME was not represented at our meeting.

What do we notice that people need to learn?

  • What links go where.  (Lisa’s presentation)
  • What information goes in the different metadata elements.

Typical metadata shortcomings

  • Title doesn’t describe data.
  • Abstract is from paper, not about data.
  • Dates used inconsistently or wrong.
  • Keywords missing or misused
  • Data quality elements not robust. “It’s all described in the paper” or obtuse FGDC definitions.
  • How to communicate in plain language a description of their scientific work.
  •  … 25% of elements have misunderstandings, no gaping holes

What helps

  • Having written a first record, beginning to see how to use templates.
  • SOP’s for different data types.

What we would like as metadata reviewers

Our general idea is to meet in small groups that share experience with particular data or formats. What things in the metadata do we look for that lead to conversations with authors and better metadata. Or ways to go beyond the boilerplate offered by tools – recognize boilerplate responses and ask the authors if something more customized to the data might be possible.

  • How to create proper FGDC metadata for netCDF files, in addition to the CF information that is already there. But need to provide information to people who don’t know how to use netCDF, so they can evaluate whether to download or learn to use it. Votes:3
  • How to create metadata for SegY formatted seismic data.
  • What to do with records that include reference to or attributes from the lidar base specification (current version is 1.2 from November 2014). 
  • How to use the biological data profile.

Dennis, Lisa, Pai, and Erika volunteered to start off this new kind of "learning together" with a session about the biological data profile on Nov. 6.

Tue. September 5, 2017, 2pm - 3pm EDT

Meeting on a Tuesday because Labor Day is our regular meeting day, and because there is some urgency for us to recommend new checklists for data review and metadata review!

Proposed agenda:

 

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Discuss, possibly revise, and hopefully approve new USGS checklists for data review and metadata review.

Report from meeting:

We discussed the need to check xml metadata records to make sure the text is encoded in a way that will not cause trouble. We don't have an adequate tool for checking or converting to UTF-8 encoding, so we will engage in a use case process to clarify what the tool needs to do, and then likely one of us can develop it. Those interested in participating in this use case process should contact Fran Lightsom, flightsom@usgs.gov.

We reviewed the draft checklists and made some improvements. Community members have until September 12 to double-check the following documents and speak up (email Fran, or the whole group) about any remaining problems or omissions. The new versions are here:

Mon. August 7, 2017, 2pm - 3pm EDT

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Discussion with Lisa Zolly about metadata requirements for good functioning in the USGS Science Data Catalog.

Report from meeting:

One burning question: Will the new Metadata Wizard release be open source? Answer: Yes, it will be stand-alone (independent of ArcGIS).

Presentation by Lisa Zolly (CSASL): "Metadata Tips for Better Discoverability of Data in the USGS Science Data Catalog" (see attached PowerPoint)

The focus of Lisa's presentation was twofold:

1. Optimal use of theme and place keywords in Science Data Catalog (SDC)
       a. SDC browse function utilizes coarsely granular keywords from USGS Thesaurus
       b. SDC search function utilizes finely granular keywords from various disciplinary controlled vocabularies (CV)
       c. Controlled vocabulary resources: Controlled Vocabulary Server maintained by Peter Schweitzer and the USGS Data Management website 

2. Optimal placement of links related to data releases. Specifically, the preferred use of: <onlink> in <citeinfo>; <onlink> in <lworkcit>; <onlink> in <crossref>; and <networkr> in <distinfo>. See PowerPoint for more detail on the problems the SDC team has had in deciphering links (data link? publication link?), and how metadata authors and reviewers can help alleviate those problems.

The ensuing discussion continued for a second hour and addressed topics such as: the distinction between the USGS Science Data Catalog and ScienceBase; different strategies for constructing ScienceBase landing pages with child items; the ways that metadata records are added to the SDC; recommendations from Force11, DataCite, and other organizations that DOIs point to landing pages (not to XML metadata files); etc.

NEXT MEETING: Tuesday September 5 (to avoid Labor Day, September 4). See proposed agenda, above.

Mon. July 3, 2017, 2pm - 3pm EDT

Proposed agenda:

We anticipate a small group that will work on cleaning up comments that have been made on the google doc versions of the data review checklist and metadata review checklist.

Report from meeting:

It was a good, productive working meeting, with two additional working meetings to finish the job. Results of our work are attached.

Mon. June 5, 2017, 1pm - 2pm EDT (Note time change to accommodate Werkheiser Q&A WebEx.)

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Discussion of ideas arising from CDI Workshop, see notes below.
  3. Form a subcommittee to propose improvements to data and metadata review checklists
    1. Data review checklist and metadata review checklist on google docs for collecting our comments and suggestions.
    2. Metadata Review Checklist on USGS Data mgmt website (CDI, 2014)

Notes from meeting:

  1. Curtis Price had some announcements from EGIS:
    1. EsriUC Metadata SIG 7/12 will be on WebEx - Esri will give an update on ISO metadata at this mtg
    2. Updated metadata cookbook on draft EGIS website
    3. Metadata Wizard (the last version which works inside ArcGIS) was released in USGS with ArcGIS 10.5.
  2. The workshop idea of most interest is regular program of training and mentoring for both writing and reviewing metadata. (Cian Dawson suggestion at the CDI Workshop.) The purpose would be to provide assistance to new metadata writers and reviewers, although Peter reminded us that it would also be a learning experience for the teachers or mentors. Fran agreed to follow up on this idea, and will get back to the community for guidance on scheduling and syllabus.
  3. We did not form a subcommittee to work on the data and metadata review checklists, but instead will make progress through this alternate path:
    1. Community members will leave suggestions and comments on the google doc versions of the lists during June. Be careful to switch to suggesting mode. (Look in the upper right corner of the page, a pencil icon means you're still in editing mode!)
    2. Our next community meeting is scheduled for July 3, and looks like a small group. Those of us working that day will constitute a committee to deal with the suggestions and comments on the google doc lists and create clean documents that the community can fine-tune at our August meeting.
    3. We need to get Lisa Zolly involved in identifying metadata requirements because USGS metadata must be functional in the Science Data Catalog.
    4. Discussion lingered over the possibility of having multiple lists that were customized for different types of data, or lists of "submission guidelines" that would be provided to metadata or data authors so that submissions would be higher quality, there would be fewer surprises during review, and the review checklists could be much shorter. Water was mentioned as a source of good lists.
    5. We were in agreement that it would be a very good thing if the general quality of USGS metadata were uniformly excellent, but did not see a path forward to achieve that, with the current policy and management situations.
    6. We need to share information sources about specific disclaimers or similar statements that should be put in specific places in metadata records, and not in others. The ScienceBase data release checklist is one source.
    7. Andy LaMotte and Alan Allwardt are volunteers to help get this done.

Fri. May 19, 2017, 9am - 12pm MDT

In the "Open Lab" at the CDI Workshop

After a spirited discussion of the larger work citation and the best place in the metadata record for a citation of an associated publication, we had three demonstrations: the new stand-alone version of  Metadata Wizard (Colin Talbert), secondary validation of metadata records at https://mrdata.usgs.gov/validation/, and the Alaska Data Integration work group (ADIwg) metadata editor. 

Notes from the session:

Rose has code to pull entities and attributes from an Access database, which could be inserted into a Metadata Wizard record.

ADIwg will soon release mdEditor which works on ISO type metadata expressed as mdJSON rather than XML. (A mdJSON schema validator has already been created.)

mdEditor works for ISO metadata, which might be more compatible with data that doesn’t fit cleanly into the FGDC CSGDM.

Ideas:

  1. Make mdJSON the internal USGS standard for writing ISO metadata.
  2. Develop profiles for guidance about which ISO fields should be provided for different kinds of data.
  3. A controlled vocabulary service is needed for GNIS.
  4. Start investigating how we would review ISO metadata using mdJSON.
  5. Can contact items be available as a service for inclusion in mdJSON?
  6. Can data dictionaries be available as a service for inclusion in mdJSON?
  7. Metadata reviewers CoP have a session to experiment with mdEditor when it is ready.

Suggestion from Cian Dawson: interactive training using WebEx training center. Hands-on in small groups with instructor checking in. Separate tracks for metadata creation and metadata review.

Project idea:  A database and interface for a collection of data dictionaries or data dictionary items, for use in designing data collection, and then in metadata, and then in data integration.


Mon. May 1, 2017, 2pm - 3pm EDT

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. New developments with the Metadata Wizard (Colin Talbert)
  3. How much is enough for Data Quality Information? Are there good examples for different situations?

Notes from meeting:

  1. A community member expressed the opinion that the keyword list of the Global Change Master Directory (GCMD) "is a pain."  Painful aspects are the fact that the list is not monohierarchical and the question of whether one uses the whole string of terms or just the last term. The group seemed to be in general agreement with the pain, with one member saying that there were not useful terms in the GCMD list anyway, and suggesting that we need to bring back the USGS Biocomplexity Thesaurus.
  2. Colin Talbert presented a few slides (MetadataWizard2.0.pdf) and a demonstration of the new version of the Metadata Wizard. The Wizard will have many new features:
  • Users will no longer need ArcGIS installed. Instead, an installer will be provided to use Wizard as a stand-alone application
  • The entity and attribute builder works on CSV or Excel files.
  • The error report is friendlier.
  • Users can copy and paste whole sections from one record to another.
  • A map is provided for the spatial domain, which can be used to modify the domain in the record.
  • Users can easily switch between the biological profile and the basic FGDC CSDGM standard.
  • The components of the application can be used as a Python library.

Colin hopes to have an "early adopter" version of the new Wizard available at the CDI Workshop, with actual release in late summer.

3. The real question about Data Quality Information turned out to be about the "Logical Consistency" item. Peter clarified that when the metadata standard was only used in the GIS community, this item was used to state how much topology was enforced. In general, the idea is to state any inconsistencies between parts of the data that might arise from compiling data from different sources. Dennis further suggested that "Logical Consistency" is a good place to specify exceptions to the values that are expected in data fields. Drew shared the explanations of these fields that are offered by Metadata Wizard:

  • Do all values fall within expected ranges?
  • Has data been checked for omission or commission? Has topology been verified for geographic data to ensure data integrity?
  • What checks have been performed to ensure that the data set matches up with the description provided in the 'abstract' and 'purpose'?
  • Have you verified that features and data entries are not duplicated?
  • Do all values fall in a valid range (for example, a data set of precipitation values should not have negative values?) Provide as much information as possible.


Announcement: the AdiWG Metadata Editor will be displayed at the CDI Workshop.

Mon. April 3, 2017, 2pm - 3pm EDT

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Dealing with the suggestion that some data isn't worth the time and trouble to write complete metadata records. What is our response as individual reviewers, and as a community?
  3. Data review checklist and metadata review checklist: review suggestions from the group.


Notes from meeting:

  1. Question: I'm reviewing data that is associated with a publication, and the author says "all process steps are in the report."
    Responses:
    The bureau has said that the data release is a separate thing from the publication, so it must be able to stand alone.
    In particular, the metadata should allow future data users to know if their data download was successful.
    Metadata process steps might be either a summary of the method section of the report, or they might be more detailed than the method section of the report. The metadata process steps should be a succinct statement of how the data were developed. Use plain language but don't allow the plain language to reduce clarity.

    Question: A PI, after completing the data review step of the FSP process, decided to add additional fields and cells to the data. The metadata needs to be changed to reflect this, but the PI doesn't want to start the review process all over again.
    Responses:
    The PI can check with the reviewer to make sure that the metadata is still good after the changes, which the reviewer could document in the notes section of IPDS. Then the changes, to both the data and the metadata, can be considered responses to review, rather than a new data product. (Alaska has a formal data acceptance step before the product goes to the approving official.)

    Question: What do we do with data from projects that have ended – the scientist might even be gone – and the metadata record is incomplete. The problem is that the organization doesn't have support for anyone to complete the metadata, and existing people give the problem a low priority.
    Responses:
    If a new publication uses the data, the authors will be forced to improve the metadata.
    If the data were made available on a website, the visibility would encourage improvement of the metadata in order to look good.
    A case could be made of the value of data re-use, of taking pride in the work the organization has done in the past, and sustaining the value of that work into the future.

  2. Discussion of the author who suggests that their data is too insignificant to bother with a complete metadata record.
    Example given was a USGS coauthor who contributed 6 numbers to a much larger data table that was published by another organization.
    Responses:
    Could the data be released as part of a larger collection of similar data, which would obviously need metadata?
    Could a minimal metadata record provide only the information that is known?
    The Bureau says that we must make metadata. In IPDS, all released data is considered to be worth metadata.
    In data archives or scientific case files, there are likely to be data sets, such as versions produced during data processing, which do not need a formal metadata record. A question list such as the headers in the "plain English" metadata format would be a good way of collecting the information that will be needed about that data set in the future.
    If the data are not worth metadata, then why were they worth collecting?
    The metadata, at a minimum, need to tell people what the data are.
    The metadata, at a minimum, need to provide the necessary information for future scientists who will re-purpose the data. The goal is to help them do their jobs and achieve their goals.

  3. We ran out of time, again, and didn't deal with the revisions to the checklists.


Mon. March 6, 2017, 2pm - 3pm EST

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Metadata community face-to-face will be held during the CDI Annual Workshop. Current plan is a social gathering. Do we want to do something more substantial? If so, what and when?
  3. Ray Obuch's proposed Energy Program standards for metadata quality.
  4. Can we help science fields that don't mesh with FGDC: genomics, those who contribute to big integrated databases, others?
  5. Data review checklist and metadata review checklist: review suggestions from the group.

Notes from meeting:

  1. We had a question about position and projection and data information in a metadata record for data formatted as an ascii grid. Consensus that the information is important and should go into the same location as in a metadata record for an ArcGIS presentation of the data.
  2. At the Metadata community, we will try to have our social gathering on Tuesday evening so that we can use one of the Wednesday afternoon breakout times for discussing issues that we identify at the social gathering.
  3. A summary of Ray's proposal is attached. Discussion touched on the value of standard data dictionaries, a need to clarify the minimum required set of metadata elements, and what to do with data sets, for example laboratory experiments, for which no geographic coordinates are really appropriate. Is there a null value for spatial location? Some offices use the global domain.
  4. The groups that "don't mesh with FGDC" were not represented. We seem to have our hands full with our own metadata challenges.
  5. Checklist revision isn't moving forward, at least not fast.

 

Mon. February 6, 2017, 2pm - 3pm EST

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Geospatial Metadata Validation Service (online version of mp): new default setting for Upgrade function; see <https://mrdata.usgs.gov/validation/about-upgrade.php>.
  3. USGS Science Data Catalog to accommodate ISO metadata in near future: implications for USGS metadata reviewers?
  4. Data review checklist and metadata review checklist: review suggestions from the group.

Notes from meeting:

Alan Allwardt and Peter Schweitzer leading, in Fran's absence
Notes by Alan Allwardt

  1. Burning questions and comments: Janelda asked about the practice of creating an "umbrella" metadata record for a collection of datasets (each with its own metadata record). Madison presented an example from ScienceBase (http://dx.doi.org/10.5066/F7M043G7) in which the parent landing page links to child pages that provide the data (in this case, for different subregions); the parent page has a metadata record and so do the child pages. Generalizing this case: the parent page metadata might have an entity and attribute overview, whereas the child page metadata might have more specific entity and attribute information (especially useful if the child pages present different data types).

    This led to a discussion of other models for relating individual datasets to one another: the "Associated Items" feature in ScienceBase is one option (example: http://dx.doi.org/10.5066/F7GQ6VXX), but this kinship will not be reflected in the metadata records for those associated datasets.

    Dennis: his group uses Larger_Work_Citation to point to parent landing pages and Cross_Reference to point to related publications. (NOTE: Metadata Wizard currently does not accommodate Cross_Reference; Madison will bring this issue to the attention of the developers.)

    Dennis again: his group puts ORCID in the Originator value as follows: <origin>Dennis Walworth (ORCID:0000-0003-1256-5458)</origin>. Seems like a simple and effective solution, but the follow-up discussion centered on possible downstream impacts of this practice: in the USGS Science Data Catalog, for instance, "Dennis Walworth" with and without the ORCID would be listed as separate authors in the browsable sidebar. No deal-killers emerged from this discussion, however.

    Peter demonstrated his internal website for maintaining authority control of authors in Science Topics and Mineral Resources On-Line Spatial Data.

  2. Peter reviewed recent changes in the upgrade function for his Geospatial Metadata Validation Service (see https://mrdata.usgs.gov/validation/about-upgrade.php). The change in the default setting makes the online validator work in the same way as the command-line version of mp -- and, by changing the upgrade function from an opt-out procedure to an opt-in procedure, makes users aware of certain types of errors in the input file that used to be fixed without their knowledge.

  3. Alan reported this news from Lisa Zolly: the USGS Science Data Catalog will begin harvesting and indexing ISO metadata at the end of February (or so). The group discussed potential impacts on USGS metadata reviewers (primarily: lack of experience and relevant tools).

    Peter likes the practice of Dennis and his group: do as much writing and reviewing as possible before converting to XML (that is, use JSON as an intermediate step).

  4. Data review checklist and metadata review checklist: a few members of the group have begun reviewing and suggesting changes; the others were encouraged to take up the task.

    The importance of having the data reviewer also look at the metadata (and the metadata reviewer also looking at the data) was stressed -- we need to make sure that the checklists get this message across loud and clear. At Alan's request, VeeAnn described how this works in Woods Hole: for instance, when a metadata reviewer is not well-versed in a particular data type.

    For the data review, Janelda wondered if there might be a way for authors to indicate the expected range of values for any given parameter, so that the reviewers could easily identify outliers. Peter suggested using Range_Domain (while acknowledging that there is some difference of opinion about what this element should represent: the range of all conceivable values for the parameter, or the range of actual values within the dataset).

    Peter pointed to some of his handy tools for evaluating datasets: <https://geology.usgs.gov/tools/metadata/>. If you select the "Web services" tab on this page you'll see tools for analyzing DBF and CSV files.

    Finally, there Peter and VeeAnn have developed guidance that is too detailed to include in the metadata checklist, but should be referenced in the checklist:

    Primary validation using mp: <https://mrdata.usgs.gov/validation/how-to-review/>
    Substantive review of metadata elements: <https://mrdata.usgs.gov/validation/how-to-review/elements.html>

POSTSCRIPT: If you are using customized data and metadata review checklists or templates in your science center or program, please share your experiences here: <How have you adapted the data review and metadata review checklists for use in your science center?>

Mon. January 9, 2016, Report to CDI Data Management Working Group

This is not really a meeting of the Community, but the Data Management Working Group asked for a progress report. Attached are the slides prepared for that report.

Mon. December 5, 2016, 2pm - 3pm EST

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Look at suggested revision to the USGS Data Management website > Publish/Share > Data Release <https://www2.usgs.gov/datamanagement/share/datarelease.php> > Section 5.
  3. Are we ready to start reviewing the data and metadata review checklists? (Or wait until January?)
  4. Do we want to sponsor training at the CDI Workshop?
  5. Next meeting (not Jan. 2).

Notes from meeting:

  1. Colin reports seeing good metadata at FORT. Bill is working with data that will be contributed to the California environmental data repository, and reports that our standards for metadata and review are more stringent than theirs.
    There has been some recent email discussion of where data producers' ORCIDs should go in the metadata record. There seems to be no place to put ORCIDs where they would be immediately useful in systems like the Science Data Catalog or data.gov, but there are several places where they might reasonably be found and would not cause a record to fail standard validation checks. Peter Schweitzer will start a discussion at our community confluence site so that we can decide on a consistent approach.
  2. Alan introduced the revision to section 5 of the website, explaining that much of the information in the present section is off subject, for example, about publications that are not data. Fran added that the new USGS policy is that no data is interpretive, so we decided to drop the sentence about interpretive data from the revised section. We would like to add specifics of how IPDS is used for data and metadata reviews, and Fran made a suggestion for that in the Google document. We would also like the webpage to provide more easily found links to guidance and policy.
    Susie showed the IP record in progress for a Santa Cruz data release in ScienceBase. The record shows original metadata files and reviewed metadata files, as well as reviewed ScienceBase pages. This case does not have the metadata harvested from ScienceBase to Science Data Catalog. Conversation continued on the question of whether the short metadata records for ScienceBase project pages, which do not include data but provide a description of a collection of data, need to be compliant with metadata validation, for example, by mp. Tamar shared information that in the future such metadata will be harvested and thus will need to be validated. Peter said that a basic metadata record that only has sections 1 & 7 could be validated. ISO metadata more intrinsically accounts for relationships between collections and the items they contain.
    Decisions: We will leave the revision on Google docs and encourage community members to suggest improvements, using "suggesting" mode instead of "editing" (the mode choice is available in the upper right corner, under the Comments button). Also suggest guidance and policy links that should be provided on the webpage. Fran will negotiate the webpage changes with Viv Hutchison.
  3. Peter will put the data and metadata review checklists on Google docs so that community members can start suggesting modifications (see links below). Our goal is to have fairly generic checklists, helpfully grouped and chunked, with links to more detailed lists for particular kinds of data.
  4. We did not have time for discussion of the CDI workshop.
  5. We decided to skip the January phone call, since Jan. 2 is a holiday and the Data Management Working Group is likely to be meeting on Jan. 9.

Other discussion topics:

Briefly raised, what about data that is included in administrative reports and proprietary data? POSTSCRIPT – January 4, 2017: Alan spoke with Keith Kirk (FSP committee) and he says this issue is currently under consideration by FSP. He also said that the USGS report series called "Administrative Report" will be renamed/redefined in the near future. Stay tuned.

Briefly raised, how can we deal with the issue of links to files changing in ScienceBase, when the data is modified, and the challenge of keeping links correct in metadata?

Google docs for community review before our Feb. meeting:

Data Review Checklist is a copy of the existing checklist formatted as a Google Docs and shared for edit and comment.

Guidelines for Metadata Review of Data is a copy of the existing checklist formatted as a Google Docs and shared for edit and comment.

 

Mon. November 7, 2016, 2pm - 3pm EST

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Start reviewing the data and metadata review checklists on the Data Management Website.
  3. Any ideas about how we might get together at the CDI Workshop?
  4. Next steps?

Notes from meeting:

Metadata Reviewers Community
Meeting: 20161107

Peter Schweitzer leading, in Fran's absence, with input from Alan Allwardt, VeeAnn Cross, and the group
Notes by Alan Allwardt


Agenda Item 1. Burning questions

Peter Schweitzer: told story of someone asking him what to do about a non-geospatial dataset for which the metadata failed mp because there was no spatial domain information. In the past Peter would have recommended ignoring the mp error, but now he recommends entering a global extent to avoid validation errors in downstream catalogs like data.gov.

Lisa Zolly: confirmed that data.gov will flag and quarantine CSDGM records lacking a spatial domain (USGS Science Data Catalog will not).

Members of the group shared their strategies in dealing with metadata for non-spatial data: some create global spatial extents; others will use the bounding box of the parent project for non-geospatial, supplementary or lab data. It was generally agreed that using the coordinates for the science center where non-geospatial lab results were obtained is a BAD idea.

ACTION ITEM: Peter will add a paragraph to his "Substantive review of metadata" training page <http://geo-nsdi.er.usgs.gov/validation/how-to-review/elements.html> to deal with spatial domain conundrums.


Agenda Item 2. Revising the data review and metadata review checklists

Peter suggested stepping back from the checklists and look at the context in which they are presented: USGS Data Management website > Publish/Share > Data Release <https://www2.usgs.gov/datamanagement/share/datarelease.php> > Section 5.

ACTION ITEM: After extensive discussion, the group decided that the text of Section 5 -- which provides context for the checklists -- should be revisited and revised as necessary FIRST, and only then should we consider how to revise the checklists themselves. (Revising the text of Section 5 will inform the process of revising the checklists.) This plan met with general approval. Alan will begin revising Section 5 and get input from Peter, VeeAnn, and Fran before it is posted on Google Docs for the group to consider.


Highlights of the discussion leading to the action item above:

Peter: data review and metadata review not clearly separated (lots of agreement on that point from the group).

VeeAnn: noted that the revision dates of the checklists (March/April 2014) predate the OSQI IM on data management, data release and metadata (IM 2015-01 through 2015-04): <https://www2.usgs.gov/usgs-manual/95imlist.html>. We need to examine the checklists and, at the very least, bring them in alignment with these IM. NOTE: IM OSQI 2015-03, Section 5A <https://www2.usgs.gov/usgs-manual/im/IM-OSQI-2015-03.html> links directly to the checklists, so we are constrained to revising the checklists individually (we can't combine them, for instance).

Several members of the group shared how they've used the data review and metadata review checklists in their science centers: they've used the checklists as a starting point for creating more specific guidance documents for their particular science centers. Alan created a thread in the Metadata Reviewers Forum where members can share their experiences in adapting the checklists (with encouragement to upload examples of specialized checklists, review templates, etc.): <https://my.usgs.gov/confluence/pages/viewpage.action?pageId=558860218>.

Peter created another thread in the Forum for members to share their thoughts on how the data/metadata review process might be documented for IPDS: <https://my.usgs.gov/confluence/pages/viewpage.action?pageId=558860180>.

Peter: What about revising "Metadata in Plain Language" <http://geology.usgs.gov/tools/metadata/tools/doc/ctc/> so that it is less CSDGM-specific?

VeeAnn: noted that two reviews are necessary -- of data and metadata -- although they can be performed by the same person. She proposed another strategy: use two people. The first would emphasize the data review (but also look at the metadata), the second would emphasize the metadata review (but also look at the data).


Agenda Item 3. 2017 CDI Workshop

Brief discussion at the top of the hour, will continue next time.

Peter suggested considering hands-on training, in one of the following areas:

- Helping metadata reviewers who are new to the USGS
- Strategies for documenting the review process
- Keywords (utilizing controlled vocabularies)
- Strategies for integrating data and metadata reviews
- Sharing useful tricks of the trade

Mon. October 3, 2016, 2pm - 3pm EDT

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Keywords in Metadata, a presentation from the USGS Thesaurus Team
  3. Next steps?

Notes from meeting:

Question: How can we deal with metadata records that use the EML standard?

  • Lisa Zolly: EML metadata will need to be converted to the CSDGM standard, or in the future to ISO. There is a XSL transform to do that; email Lisa directly if you need it.
  • Peter Schweitzer: If you're worried about losing information in the format conversion, you can link to the original EML record.

Question: Metadata records are being written by project members who are not USGS employees but are students at a university, so they are unable to authenticate with OME which only uses USGS active directory credentials. Can we get them guest user permission?

  • Lisa Zolly: The OME team intends to do that some day, but does not have the resources to do it any time soon. OME relies on Active Directory for authentication; a separate module could be leveraged for external accounts, but the database would need to be built for it, and CSASL would have to dedicate staff to supporting management of non-AD accounts. 
  • Tom Burley: Could the students sign up as USGS volunteers? (Well, no, that is expensive.) Or use Metavist.
  • Aaron Freeman: Could the students use Metadata Wizard in the Esri context? The record could be exported and more information could be added in Metavist.
  • Follow-up: Isn't Metadata Wizard being de-coupled from the Esri environment? (That's the hope, but no resources to do it yet. You could import a CSV into a geodatabase, though.)
  • NOAA has shut down Mermaid, and EPA has also shut down its CSDGM metadata editor, because both are going to the ISO metadata standard.

Presentation, see Peter Schweitzer's outline linked in the agenda above.

  • Peter's concern is how well metadata works in situations where people are using it to find information. Because people often don't know what to ask for, what to call it, or who to ask, Peter was led to the use of controlled vocabularies.
  • He prefers keywords that say what the data are, instead of those that say what purposes the data could be used for.
  • He cautions that names can be interpretted in different ways, so more keywords are necessary to clarify what the data are.

Lisa Zolly shared the list of USGS Thesaurus terms that is being used in the USGS Science Data Catalog to provide a browse interface. SDC also allows full-text searches of metadata records, with some fields being weighted more heavily than others. As more metadata records provide one of the keywords on the browse list, the interface will be quicker and better.

General tips for metadata reviewers:

  • Some accurately spelled keywords from accurately identified thesauri are an important part of good metadata.
  • Metadata keywords should include some general terms that allow people to narrow down their search, and also specific terms that allow people to rule out the data sets that are not what they need, and rule in data sets that might be what they need.
  • USGS has some tools to help reviewers compare keywords to thesauri, and more thesauri could be added to them.
  • If you want your data to show up well in the USGS Science Data Catalog, make sure there is a keyword from Lisa's list.

First meeting, Mon, September 12, 3pm – 4pm EDT (after the CDI Data Management Working Group meeting).

Proposed agenda:

  1. Burning questions? Metadata nightmares? Brilliance to brag about?
  2. Our Community
    1. Focus: Review of USGS metadata
    2. Community: Share knowledge, questions, and puzzles
    3. Knowledge: Develop, share, and maintain know-how for review of USGS metadata
  3. Community Resources
    1. Confluence Site: Member list, link to training, examples, discussion “forum”
    2. Data Management Website
  4. Next steps?
    1. A session on keywords?
    2. Distinguish clear USGS requirements from matters for criteria and considerations?
    3. Help desk? Monthly meetings? Review & revise online checklist?

Notes from meeting:

Question: Will there be a similar group for data review, or does this group include data review?

  • We agreed to expand our scope to include data review, especially technical aspects of data release such as packaging, format, and documentation.
  • We agreed that a good metadata review requires looking at the data to ensure that the metadata represents it correctly.
  • USGS policy requires two reviews (metadata and data) and not two separate reviewers, but this is a minimum standard. Science center directors can require additional reviews before they approve the data release. Our group agreed that two different reviewers looking at a data release would be a good thing for ensuring quality, and for high-profile datasets more than two might be good – we might want to think about defining levels of review. The Alaska Science Center has a scientist peer review the data content and a data manager review technical aspects including metadata.

Observation: As USGS implements the new policy with unprepared reviewers, it’s almost inevitable that some “horror story” data will be released that will be embarrassing. It would be good for us to keep our ears to the ground – who needs help in reviewing data and metadata?

Issue: Metadata writing and reviewing are a significant time investment. How can we help our scientists and managers plan realistically?

  • We could “pass around notes” about how long it takes, producing a community estimate that could be shared more widely.
  • Data management plans will be helpful when they are required.
  • Metadata reviewers tend to also become metadata counselors, helping new metadata writers avoid difficult and time-consuming approaches, and even providing training.
  • Another way of helping research projects get started with metadata is to provide templates customized with appropriate contacts and disclaimers, which simplifies the project’s work, helps standardize their metadata, and makes review easier.
  • We agreed to enlarge the scope of our community to include metadata counseling, training, and resources.

Future community meetings.

  • We agreed to meet monthly at 2:00 Eastern Time on the first or third Mondays of the month. (This is the same time of the week as the CDI Data Management Working Group and the Science Center Points of Contact for the new policies, but different weeks.) If we meet on the first week, the third week might be used for subgroup meetings.
  • We agreed to have a session on keywords.
  • We can post possible discussion topics on the confluence forum.

Future community activities.

  • Similar to the library of recommended disclaimers, we could recommend wording that can be used in metadata records for referring to information that is documented somewhere else, for example in data dictionaries, “techniques and methods” publications, or NWIS documentation.
  • We could revise the checklists for data reviewers and metadata reviewers on the USGS, incorporating the separate list that VeeAnn and Peter provided as part of their training.
  • We will start a confluence forum topic on recommended resources, and start it off with the “green workbook” which several people recommend.

A question was raised about metadata for a geodatabase that includes multiple data sets. The discussion was diverted to one about acceptable data release formats for GIS data. SDTS has been withdrawn with no replacement, geodatabases are proprietary, shapefiles are said to have problems with spatial reproducibility. The discussion will continue on the forum page on our confluence site. The larger question is how we as reviewers should advise authors about data distribution packaging (convenience, clarity, longevity).






  • No labels