The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.
Meeting Recording and Slides
Recordings and slides are available to CDI Members approximately 24 hours after the completion of the meeting.
These are the public slides. Log in as a CDI member to view ALL of the meeting resources, including recording.
If you would like to become a member of CDI, join at https://listserv.usgs.gov/mailman/listinfo/cdi-all.
Agenda (in Eastern time)
11:00 am Welcome and Opening Announcements - Virtual work and collaboration
11:20 am Collaboration Area Announcements
11:30 am Open-source and open-workflow Climate Scenarios Toolbox for adaptation planning - Aparna Bamzai, USGS
11:45 am Develop Cloud Computing Capability at Streamgages using Amazon Web Services GreenGrass IoT Framework for Camera Image Velocity Gaging - Frank Engel, USGS
12:00 pm Establishing standards and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database - Jason Ferrante, USGS
12:30 pm Adjourn
- Remote meetings resource: https://about.gitlab.com/company/culture/all-remote/meetings/
- Since the last call, we have added more security to our Zoom meetings, adding a password to Zoom calls.
- There has been an increase in virtual meeting attendees in the last couple weeks.
- ESIP Collaboration Areas Highlights Webinar on April 22: https://www.esipfed.org/webinars
- To join any of the CDI collaboration areas, see https://my.usgs.gov/confluence/x/JaapJg
- Kevin Gallagher
- The CDI is more important than ever in maintaining connection and communication.
- Virtual collaboration
- Do you have a "virtual water cooler"? Microsoft Teams and the CDI wiki are possible places for these kinds of conversations.
- Share notes and highlights after virtual meetings so others can benefit from your activity. CDI collaboration areas are great for these kinds of notes.
- Share your tips, tricks and ideas for working virtually with the CDI.
- Tim Quinn
- CDI is "collaboration on a massive scale", and very important in this time.
- Feedback from CDI has been passed onto the EarthMAP team and allowed the team to identify what aspects of EarthMAP are most exciting and most confusing.
- EarthMAP update from Sky Bristol
- Blog post (link for USGS employees)
- Intranet page (link for USGS employees)
- MS Team (link for USGS employees)
- Announcements from Collaboration Areas (see slides for full details)
- Town hall meeting: April 15th, Testing Usability with Users
- Resource review: May 20, Usability and Building Trust
- Have usability questions? Post them at: https://my.usgs.gov/confluence/x/yZCpJg Interested in being a usability tester? Sign up at: https://my.usgs.gov/confluence/x/ZMmpJg Want to stay in touch? Join Listserv via: https://listserv.usgs.gov/mailman/listinfo/cdi-usability
- Semantic Web
- Paper discussion, April 9
- "Best Practices for Implementing FAIR Vocabularies and Ontologies on the Web" by Daniel Garijo and María Poveda Villalón: https://arxiv.org/pdf/2003.13084.pdf
- 2020 SWWG Meetings
- Metadata Reviewers
- Last meeting, April 6
- Next meeting, May 4
- Meetings of the Metadata Reviewers Community
- Tech Stack
- Next meeting, April 9, "Unidata Science Gateway" https://science-gateway.unidata.ucar.edu/ http://wiki.esipfed.org/index.php/Interoperability_and_Technology/Tech_Dive_Webinar_Series#9_April_2020:_.22Unidata_Science_Gateway.22_Julien_Chastang
- Data Management
- Next event, April 13, Upcoming changes to the Science Data Catalog with Lisa Zolly
- Last event, March 9, Value Propositions with Science Gateways
- Software Development
- Next event, April 23
- Open Innovation
- April 17, COVID-19 Open Innovation Efforts: https://my.usgs.gov/confluence/display/cdi/COVID-19+Open+Innovation+Efforts
- The Opportunity Project (TOP) – Earth Sprint (Problem Statement Due Friday, April 10 – email me at firstname.lastname@example.org if you would like to help): https://opportunity.census.gov/sprints/
- TOP Earth Sprint Roundtable Notes: https://docs.google.com/document/d/1UE8cMjDL2_aJpwShHv7gn1hThrvpTQC9K5uBQ0zXadA/edit?usp=sharing
- FEMA PrepTalk on "Crowdsourcing & Citizen Science as Force Multipliers for Emergency Management" by Sophia Liu: https://www.fema.gov/preptalks
- Citizen Science Association Webinar: https://www.citizenscience.org/events/webinars
- Citizen Science Association COVID-19 Resources: https://www.citizenscience.org/covid-19
- Next meeting, April 16, Human-Centered Design Thinking with Impact360 Alliance (part 3)
- Risk Community of Practice Community Survey: https://tinyurl.com/vp3xla4
- Risk page: https://listserv.usgs.gov/mailman/listinfo/cdi-risk
- Annual meeting was March 17-18; ICEMM CDI website has all recordings here: Interagency Collaborative for Environmental Modeling and Monitoring
- Open-source and open-workflow Climate Scenarios Toolbox for adaptation planning: Aparna Bamzai-Dodson
- Link to website: https://www.earthdatascience.org/cst/index.html
- Scenario planning - a way to consider the range of possible outcomes; 3-5 plausible divergent scenarios. Managers and scientists can use this information for adaptation strategies.
- The Climate Scenarios Toolbox is attempting to take the pain out of working with climate data
- The Toolbox is open and usable, allowing other users to contribute open code. The Toolbox hopes to do the following:
- lower the barrier to entry
- automate common tasks
- reduce the potential for errors
- empower a larger user community
- The link above includes a getting started guide for the Toolbox.
- There is extra support for the National Park Service, as NPS was a partner for this project.
- Engaging CDI
- Install and use the Toolbox
- Provide feedback on issues/features
- Contribute to the package
- Develop Cloud Computing Capability at Streamgages using Amazon Web Services GreenGrass IoT Framework for Camera Image Velocity Gaging: Frank Engel
- Gaging (measuring water quanitity)
- Sometimes we can't measure
- flashy regimes
- indirect (post flood) methods aren't cheap
- How do we get past these issues?
- non-contact methods
- imagery combined with software - gets complicated; requires training; and some subjectivity is involved
- want to automate this process and take some of the pain out of it
- CHS/AWS IoT Cloud Processing Goal
- First required building a cloud infrastructure
- Auto-provisioning to the cloud
- MQTT Schema (in progress)
- Generating global actions (see something, do something)
- Initial time-lapse video Lambdas (for SSTL)
- Lessons learned
- Cloud computing knowledge takes a lot of work to acquire
- A lot of hands in the cookie jar
- In the short term, it can be difficult to sort through the differing needs of stakeholders
- Establishing standards and integrating environmental DNA (eDNA) data into the USGS Nonindigenous Aquatic Species database: Jason Ferrante
- eDNA is genetic material released by an organism into its environment (skin, blood, saliva, feces into surrounding air, water, soil, etc.).
- Why add a data layer to the NAS database specifically for eDNA?
- Want to combine the traditional specimen sightings and eDNA detections for a more complete distribution records to improve response time to new invasions.
- Aquatic invasive species data specifically are species of interest
- Need to establish strong community standards that will allow high-quality data that can be validated.
- What did we do?
- Experimental Standards
- eDNA literature review
- establish standard criteria regarding sampling design and collection, laboratory processing, and data analysis
- Stakeholder Backing
- Reviewing criteria among stakeholders
- Input by eDNA community of practice
- pre-submission form to vet data before it is included
- Teleconferences to gain consensus (ongoing process)
- Produce a white paper
- Integration into NAS
- Community standards
- Web submission form/template
- Prototype web viewer (map)
- Pre-submission survey
- Two blocks of questions, some that will require a "yes" in order to move forward, some that will vet the data better
- Quick start guide for the database became a need during the feedback process
- See slides or recording for mock-up of map view
- Expected challenges:
- Getting to consensus on submission form
- Staying organized and keeping lines of communication open
- Meeting the needs of managers and researchers (getting feedback)
- town hall style meetings to present ideas and garner feedback
- if you're interested, it will be Monday, April 13 - contact West Daniel if you'd like to attend
- Take aways/follow up
- Networking is very important. Use existing infrastructures (such as CDI!); Teams is also working very well
- Within the CDI group, many are looking for help developing new tools which use eDNA data. Working on a manuscript that provides insight about the process
- Based on recent USGS guidance, will this call be moved off of Zoom to Teams?
- Leslie: We are testing external participation now and will keep the CDI informed of tech choice. Anyone interested in testing or discussing further, get in touch with me!
- Would you be able to see if you have any "surprising" new users as a result of the tool, or do you have ideas of how to learn if you do?
- The package is not officially released on GitHub but we are working on it, and there will be a publication in Journal of Open Source Software. Hope we see people fork code, can incorporate user modifications back into the main branch. Hoping the user community picks this up and makes it into a bigger and better toolbox.
- What was your process for identifying your main users and their needs?
- Our center is part of a USGS network meant to work with natural resource management partners to help understand climate adaptation science. We have worked quite extensively with Fish and Wildlife Service and National Park Service over 7 years on supporting their science needs. We saw inefficiencies in the workflow, and commonalities existed in the data requested, but we were starting from the ground up whenever we had to provide it. Anyone doing research across continental US can use it, so hoping to expand past initial stakeholders.
- Is the climate reanalysis data included so that historical climate (and weather) can be downloaded too?
- Yes, it can do historical and future comparisons.
- Could you say more about the Journal of Open Source Software?
- Open review process; way to release new tools that are allowing people access to new data. Write-ups for publication are pretty short; description of the package/software, what problem it solves, and how you are contributing (not doing something that is already done). Goes through peer review process and is released as a publication. Nice way to get the tool out to a broader community.
- Is it possible to include a vignette that uses the software in a JOSS publication?
- Will find that out and get back to you.
- Is "edge computing" the same as "data proximate analysis"? Can you explain it a bit more?
- No (don't know what data proximate analysis is). Edge computing is putting some sort of internet-connected device (like RaspberryPIs) at the edge, in the environment its meant to be in. Camera on the bank of a river, phone in your pocket, etc.
- hrhouse : I am assuming the working definition of "data proximate analysis" just means you run the analysis of the data "near" where the data sits. For instance, running a cloud-based application against data that resides on-premise would not be data proximate analysis. Edge computing, which means you are running jobs "next: to devices in-situ where the data is collected would be "data proximate analysis". In other words, edge computing is a subset of data proximate analysis.
- Can a user trigger the camera remotely? Or, based on water surface elevation?
- Yes. You can trigger the RaspberryPI to record video based on an external trigger. If it's an internet-connected PI, you can query another streamgage or other sensor and trigger that way. In the process of testing some other methods.
- How do you handle security with IoT? Are these RaspberryPIs protected since they have AWS credentials to upload the data? Is there a DMZ for video uploads?
- We are developing a stig for RaspberryPIs. We enforce how people set up modems in the field so it is unreachable from the outside world.
- Are you using raspberri Pi camera?
- Yes. We use many different cameras. Rely on security web cameras that are NDAA compliant cameras.
- Can you elaborate a bit on Infrastructure as Code practices you're using for this IoT project? You mentioned that you had to create infrastructure first
- Foundry is the infrastructure. Code is Python-based.
- hrhouse : The Infrastructure as code environment just means that the environment is implemented typically via Cloud Formation scripts which can be easily be replicated as needed. This is a key principle in how the CHS environment is architected and presented. That is the only way we can support the environment at scale, and is really a best practice. The sensor processing system, like all other systems in CHS, are required to adhere to this basic design principle. So the good news is we can support such systems at scale, and the customer can also scale their own work as well. The bad news is as Frank mentioned, it does take a new level of skill set to understand what needs to be done to work within that framework. The CHS program provides a base environment, but the customers also have an obligation to build out systems without those boundaries in this manner. We recognize how this limits adoption, and are working to bring on some support engineers who can work with customers to help them.
- How many IoT RPis cameras are there. Do they provide constant video feed?
- Two cameras enabled IoT. 20-40 connected cameras not on IoT.
- Is that RPi stig available anywhere?
- What happens to the video stream after processing? Is it archived somewhere?
- You can do whatever you want with the video stream artifacts. That is up to the owner of the system.
- Can you comment about any "disagreements" that came up on the submission form when you got input from your community?
- Earliest iteration was just a CSV file, and more work would be done to vet data. Idea for pre-submission survey came up as input from the community. Lots of conversations about controls, that controls were in place, making sure we had questions that vetted the assay that was being run, making sure people were taking multiple samples from the field. Wanted to be as inclusive as possible, while maintaining a high level of quality.
- Is the eDNA data being used to validate/reinforce other species detection/occurrence data in NAS?
- Not specifically, but can work to the ability to do that. NAS does a lot of work to vet photos/data that come in.
- can you talk a bit more about spatial controls? links to NHD?
- This data layer is not going to be linked to anything, but this is one of the types of areas that might help to inform broader understanding of species distribution. We are interested in ways to pair eDNA with covariates.
- Have you looked at how your community standard will translate to the biological data standard: Darwin Core?
- Yes, they are similar. There's going to be a lot of overlap, and would like to make it overlap as much as possible.
Other comments from the chat
From Sophia Liu : I think we need a “Lending Lab of Low-Cost Instruments and Sensors” as an off shoot of what the USGS HIF provides but bringing together these low-cost sometimes disruptive innovations leveraging IOT and other sensors that we can maybe push out to the public for crowdsourcing or citizen science projects.
From Abby Benson : Jason you might consider looking at the Hydrolink Tool do link the eDNA occurrences to the NHD.
From Jake Weltzin : check out the hydro-link tool for snapping a sample location to NHD(+) network