Skip to end of metadata
Go to start of metadata

Our second call will focus on Data Releases for Genetic Data/Bioinformatics. Please add questions that you might have for this topic below. We will try and dig up answers/info related prior to the call (anyone can respond/add information here related to the topic). During the call we hope that users will share their previous experiences with data releases. Optionally, please put your name if you asked the question. 

 

  1. What are acceptable data repositories for sequence data? (Denise Akob)
  2. Are there any data models for capturing and storing molecular data? Has anyone used a Laboratory Information Management System (LIMS) in the management of molecular data?  How are other labs collecting and storing molecular data in a systematic/standardized fashion.  Are there any existing data models in the USGS community for capturing the gamut of molecular data (i.e. extractions, amplifications, PCR, sequencing, analyses, etc.)?  (Neil Baertlein)

10 Comments

  1. Here is the list that I have been pointing people to.  https://www2.usgs.gov/fsp/acceptable_repositories_digital_assets.asp  NCBI is considered a trusted digital repository.

  2. I need the most basic information since I haven't published under the USGS fundamental science practices before.   If we put our metadata in USGS publications Warehouse or one of the other repositories what form should that be in?  My guess is an original OTU table prior to any modifications such as rarifying, removing unassigned sequences, etc?  Or, do I put the OTU table in the format that I'll be working with and notate the modifications that were made.  With very large data sets is there anything I should know before attempting to upload an excel or txt file?  Would we also put our sequences in NCBI GenBank?  

    1. I will try to put together some examples before the next call.

      1. Hi, I hope to join you all tomorrow morning for the call.  I'm at the Alaska Science Center and would say that we have a fairly well-developed framework for microsatellite data, but of course would also like to expand and standardize with other labs for our NGS data.

        Could we discuss releasing "original OTU's" vs. "filtered OTU's?"  Our preference is to release only sequences used for analyses in journal publications with references to "filtered/processed/excluded" sequences handled via the metadata Lineage/Process Steps or Logical Consistency/Attribute Accuracy.  We have not seen absolute guidelines regarding sequences not used in current journal publications, which very well might be saved for later analysis/journal publication.  Is there official guidance on reporting requirements of extraneous sequences?

  3. I sent an email to data contacts at all ECO MA centers to find out if they have a LIMS in place or have looked into it.  I will compile all the replies I get for the call we have about data

  4. USGS publication on Data Releases:

    Chase, K.J., Bock, A.R., and Sando, Roy, 2017, Sharing our data—An overview of current (2016) USGS policies and practices for publishing data on ScienceBase and an example interactive mapping application: U.S. Geological Survey Open-File Report 2016–1202, 10 p., https://doi.org/10.3133/ofr20161202. (https://pubs.er.usgs.gov/publication/ofr20161202)

    The authors also gave a WebEx on this publication in the Water Sceince Field Team seminar series.  You can view it here: https://collaboration.usgs.gov/wg/wsft/Shared%20Documents/A%20Hero%27s%20Journey%20170222%20Presentation%20only%20(4).mp4

    Currently the ScienceBase file size limit for uploads is 10 GB per file.  
  5. Unknown User (dcoykendall@usgs.gov)

     I've attached a couple of texts that may be relevant.  This may be outside the purview of the bioinformatics group, because it is more how to organize directories so that data is easy to find and analyses are easy to reproduce.  The Yale paper is a broad scope discussion by a group of bioinformaticians about best practices of researchers, funding agencies, journal editors, so that data are robust and results reproducible.  It might be nice to have these as a guideline.

     

     

  6. I had a great conversation with Tim Quinn from OEI about the development or procurement of a LIMS.  He is hoping I can put together some user need stories and some steps folks have already taken looking into LIMS.  If anyone is interested in helping me draft this document please hit me up at jcnelson@usgs.gov

  7. Unknown User (dcoykendall@usgs.gov)

    Here is another paper that is more concise than the Chapter I posted.  Good ideas for organization of projects.

  8. The discussion on 3/21/17 was very helpful but it sounds like there might be need for further discussions related to data releases and archiving in the future. Please continue to post questions!

    Also, we are thinking about inviting Keith Kirk, lead BAO, to a CDI Bioinformatics call to discuss unique data release issues related to genetics/sequence data.