CDI Software Development Cluster
Meeting Notes
Topic: Cloud and Big Data in the Cloud; an Open Session Discussion
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/696628840
US: +16699006833,,696628840# or +14087403766,,696628840#
Or Telephone:
Dial(for higher quality, dial a number based on your current location):
US: +1 669 900 6833 or +1 408 740 3766 or +1 646 876 9923
Meeting ID: 696 628 840
Note that we have switched from GSTalk to Zoom for the time being, due to a number of usability/compatibility issues with the GSTalk platform.
Meeting Notes in Google Drive: Shared Google Drive Folder:
https://docs.google.com/document/d/1tC4Pmmhax_CTL2-wsjKlmqBA8DziRfelvJgbLKWPD5I/edit?usp=sharing
Attendees
Name |
Email (if you are new) |
Michelle Guy |
mguy@usgs.gov |
Travis Harrison |
tharrison@usgs.gov |
Rob Miller |
rfmiller@usgs.gov |
Steven Predmore |
spredmor@usgs.gov |
Cassandra Ladino |
ccladino@usgs.gov |
Elizabeth McCartney |
emccartney@usgs.gov |
Colin Talbert |
talbertc@usgs.gov |
Courtney Owens |
clowens@usgs.gov |
Andy LaMotte |
alamotte@usgs.gov |
Jeanne Jones |
jmjones@usgs.gov |
Eric Martinez |
emartinez@usgs.gov |
Hans Vraga (late) |
hvraga@usgs.gov |
Sam Pecoraro (late) |
specoraro@usgs.gov |
Carl Schroedl |
cschroedl@usgs.gov |
Leslie Hsu (late) |
lhsu@usgs.gov |
Drew Ignizio |
dignizio@usgs.gov |
Please take the quick Sli.do poll... https://app.sli.do/event/vhjskdfk
Agenda
● Welcome and announcements
○ Please fill in name and email in the attendees table
○ We are still always looking for topics, and your input and participation!
○ We have created a form for submitting presentation proposals for future Software Dev Cluster meetings
■ https://docs.google.com/forms/d/e/1FAIpQLSccsoCmFH4aT1OQNKaMDG7-ngIAlyGgmqSRQwJc_uYFf_tVUQ/viewform
○ CDI bison connect google calendar of all the collaboration area meetings and events - name is “GS CDI” owner is gs_cdi@usgs.gov
○ At least one session on cloud - “ Let’s talk cloud ”
○ Topic for next month is this software dev community to come up with session proposal(s)
● Cloud and Big Data - Cassandra intro
○ Sli.do poll - Have you done anything in the cloud
○ Get into the cloud
■ Check out youtube videos
■ Online Training - learning tree, cloudera
■ Sign up for a CHS AWS Sandbox
● Information about the Sandbox
○ AWS infinite products available, but CHS has narrowed list of what they offer, overview provided
○ S3 buckets - fancy folder where data files get a URL, easy to use
○ Data Lake example how it has changed over the past few years, and now DocumentDB
○ Data types mapped to technology examples (Big Data slide)
○ Adding structure to unstructed data with things like Pig and Hive
○ Tex search with elasticsearch (clusters) and lucene
○ General information tends to be business intelligence oriented
○ Possiblilities
■ What if AWS DocumentDB applied to scientific data (e.g. released data in sciencebase)?
■ Graph DB’s?
● Open mic
○ Announcements?
○ Questions?
○ Lessons learned?
○ Fun projects under way?
● Next Month: ?
○ Coordinate on proposing CDI Session Topics [1] [2]
○ More cloud topics this summer
○ CHS could present on something
Discussion/Notes
● Apache Spark (Big Data Space)
https://databricks.com/try-databricks
https://www.cloudera.com/products/data-science-and-engineering/data-science-workbench.html
● Jeanne looking at graph DB’s and thinking about how they could be used
● Get info on usgs chs sandbox https://support.chs.usgs.gov/display/CHSKB/Help+Center
● Sign up for a CHS AWS Sandbox
○ Information about the Sandbox
○ Collaborators space
○ Services are more limited in sandbox
○ Sandbox is wiped clean quarterly
○ Sandbox is shutdown nightly (cost savings)
● Simple application can be an EC2 instance and a DB, doesn’t have to be complex suite of services
● Serverless? Carl - water mission area in early days of serverless, manually testing, exploring lambda; Standard deployment pipeline into ECS
● Jenkins instance running for CHS customers? How to access? Courtney - CHS is not running Jenkins anymore, most customers run their own, if there is enough customer interest then can explore setting it up, CHS has moved to other solutions/technologies (AWS Source Catalog)
● CHS user customer group meets monthly - topics include containers, ECS, openshift, next month (March 20th) is Tableau, and they are recorded: https://support.chs.usgs.gov/x/IgPv
○
Email
clowens@usgs.gov
if you are interested in attending and would like to be added to the calendar
[1] Just a note that after March 1 we will be trying to organize what has come in with hopes to have a draft agenda in early March!
[2] Ok, we might try to help out at our meeting next month and make some structure and organization out of the software suggestions. :)
Powered by a free Atlassian Confluence Open Source Project License granted to U.S. Geological Survey. Evaluate Confluence today.