Confluence Retirement

Due to the feedback from stakeholders and our commitment to not adversely impact USGS science activities that Confluence supports, we are extending the migration deadline to January 2023.

In an effort to consolidate USGS hosted Wikis, myUSGS’ Confluence service is targeted for retirement. The official USGS Wiki and collaboration space is now SharePoint. Please migrate existing spaces and content to the SharePoint platform and remove it from Confluence at your earliest convenience. If you need any additional information or have any concerns about this change, please contact myusgs@usgs.gov. Thank you for your prompt attention to this matter.
Skip to end of metadata
Go to start of metadata

 

Participants 

Log in with AD credentials (full email address) and click "edit" in the upper right to attach more information to your name)

 

Introduction

  • Points of Contact –intro, who, what, where
    • Denise Akob leads the Reston Microbiology Lab and has been working for past 1-2 years to increase the availability of bioinformatics tools for USGS (e.g., computing power, capabilities). Her goal for this call is to facilitate better communications among USGS scientists who use bioinformatics.
    • Scott Cornman is based in Fort Collins, CO and provides genomics support for data analysis.
    • Chris Kellogg leads the Coral Microbial Ecology Lab in St. Petersburg, FL and has been increasingly using bioinformatics tools in her work. She started a Google sheet to help USGS scientists self-identify as bioinformatics users and list the programs they use, so people would know who to reach out to for assistance or data/peer reviews.
    • Leslie Hsu is the coordinator for Community for Data Integration (CDI), attended a bioinformatics workshop in December, and is making CDI resources (like wiki pages) available to help coordinate developing a bioinformatics community of practice.
    • Roland Viger comes from a water background and has worked with Leslie in CDI for a long time. 

 

  • How this all got started (Denise)
    • For 2 years Denise and others have been in discussions with OEI (Office of Enterprise Information) about obtaining access to the cloud, e.g., CHS, or other computing resources (e.g., Yeti). Her group, mainly Adam, have been testing options with CHS for 1.5 years. This testing was to pilot a solution that could then be rolled out to the Michigan (Carrie Givens) and St. Pete (Chris Kellogg) groups.
    • A year ago we tried to send out a memo to gather info (e.g., determine who within USGS was working on bioinformatics) that got stuck in politics.
    • In the fall, the Innovation Center met and started doing what we tried to do a year ago (e.g., the workshop that Leslie was part of). 
    • NOW: we want to put our heads together and move bioinformatics forward for the whole USGS in an efficient way. The main impetus for the memo a year ago was to let people know as progress was being made with cloud negotiations, computing solutions, etc. We are now at a crossroads where we’ve been informed about CDI and the platform they can provide for communications.
    • Example of why we need to communicate: Chris is one of two DOI representatives on an Interagency Microbiome Working Group that is creating a Federal Strategic Plan (as part of the National Microbiome Initiative) to coordinate Federal agencies working on microbiome research. To best represent DOI, she had to track down who was doing microbiome research, since there is not a dedicated microbiome program like there is at NIH. Since microbiome work is intrinsically linked to bioinformatics, if we had a community of practice, she could have reached out and much more easily identified the subset of researchers doing microbiome work.

  • Cloud Hosting Solutions (CHS)
    • First round did not meet the needs for research
      • Adam: A year or so ago, we started working with OEI to bring CHS Amazon cloud resources to where we could use them, rather than owning our own clusters. It turned out to be a larger challenge than anticipated and after a year, we had to give up on the idea of each researcher having a private cloud space. CHS was architecturing more than we needed—instead of a private computing space to put data, analyze it using the tools, and then pull down the results, it was as if we were rolling out a public-facing tool like the National Map, and the security requirements were too onerous.
    • Alces Flight is the new option and is in beta testing

      • Since November a number of people (e.g., Adam, Dawn, Grace) have been beta testing Alces Flight. This platform is used as we’d use a university or internal computing cluster, has a large number of programs available, and access is moving in a positive direction. It is possible to add needed software to Alces. Grace asked for a program to be installed and it was done quickly.
      • Link to all of the applications available in the Alces Flight Gridware.

 

  • Yeti, Core Science Systems (CSS) cluster
    • Denise and Adam have been using Yeti to run analyses and have found the staff at CSS to be very responsive to changes needed. 
    • Yeti installed QIIME for our use, but this program is notoriously difficult to install, so it may be easier to use in an environment like Alces Flight where there are (theoretically) a developer team maintaining it.

 

  • Community for Data Integration–CDI (Leslie)
    • CDI was started in 2009 because it was recognized that they could coordinate across USGS for data integration needs. They facilitate formation of working groups, in-person workshops, etc., to help people work with their data. Leslie is the coordinator and her role is to get tools and resources out to groups to help them.
    • Roland has worked with Leslie on the Earth-Science Themes Working Groups (ETWG): Bioinformatics Community of Practice.
      • The purpose of this group is to develop a community of practice for people who work in Bioinformatics, in order to connect with each other and share ideas, expertise, information, and resources.
      • Possible overlaps with other groups for cloud computing, data management.
    • Supported at the Enterprise level, CDI has a communication platform with a large wiki space. USGS employees automatically have access but can also invite in non-USGS users.
    • Roland: Can limit access to areas of wiki to keep some areas for limited distribution until ready to release. Can always make it go away, so don’t be afraid to throw a wiki page up.
    • Leslie asked people on this call to follow a link (bit.ly/2kpFyqS) to identify the following information: Name, Role, Why on the call

 

Next steps/where do we go from here

    • Monthly call to discuss topics of interest (e.g., cloud tools, data release requirements)
    • Monthly presentations along the lines of “this is how I do my analysis” to encompass both scientific and technical aspects

    • Create wiki page of what we have so far?
      • Contact for Yeti and what’s available there
      • Alces Flight bullet points and link
        • Beta testers could add pros/cons to inform other users
    • Create wiki to discuss what bioinformatics needs there are, how people are meeting those needs, what better solutions might be available
    • Create a wiki about data release requirements. USGS appears to still be working through data management/data release requirements and different centers are taking different approaches.
      • Users could discuss experiences and compare notes
      • Could create a list of questions and try to find the answers
      • We could use this platform to open a discussion with BAOs (Bureau Approving Officials) and OSQI (Office of Science Quality and Integrity) to present examples and work towards consistency.
    • Decide if these resources are only for USGS or could be more broadly open (other Fed agencies, university collaborators, etc.
      • Grace has FWS colleague who would be interested
      • Denise pointed out university colleagues may be able to offer solutions outside the Fed box

 

Topic for next call (February) 

    • Alces Flight
      • Courtney gives 10 min summary of what it is and then beta testers describe their experiences
    • Data release requirements
      • Discussion about what degree of intermediate data, workflow, and executables need to be released in addition to raw data
        • Relevance to simulation modeling as well as bioinformatics (probably other Big Data types also)

 

  • No labels