Page tree
Skip to end of metadata
Go to start of metadata

CDI Conference Call - December 9, 2015

The Community for Data Integration (CDI) meetings are held the 2nd Wednesday of each month from 11:00 a.m. to 12:30 p.m. Eastern Time.

USGS/DOI Dial In Number: (703) 648-4848 (for USGS and DOI offices)
Toll Free Dial In Number: (855) 547-8255 (for other offices and telecommute locations)
Conference Code: 47919# (same for both numbers)

Webex Recording

Webex recordings are available to CDI Members. Please login to view the recording. If you would like to become a member of CDI, please email

Agenda (in Eastern time)

11:00a Welcome - Kevin Gallagher, Associate Director for Core Science Systems

11:10a  Engaging Citizens and Communicating Science through Open Innovation at the U.S. Geological Survey - Sophia Liu, U.S. Geological Survey  

Presentation: Slides are available to CDI Members. Please login to download the slides. If you would like to become a member of CDI, please email



Dr. Sophia B Liu will discuss the rise of open innovation through crowdsourcing and citizen science efforts at the USGS based on her experience with expanding the data analysis and visualization component of the Tweet Earthquake Dispatch (TED) project, coordinating the design and development of the "iCoast - Did the Coast Change?" citizen science project, and facilitating energy, minerals, and environmental health civic hacking projects related to data science and visualization. Dr. Liu will also discuss recent federal policies related to crowdsourcing, citizen science, and competitions and what role USGS can play in these emerging efforts in engaging the public through open innovation.

Presentation Q/A

Fran Lightsom: What is your one take away message for CDI from your presentation?

Sophia Liu: I really hoped to use this talk as opportunity to see to what extent CDI might be interested in becoming more involved in more of these efforts. The Citizen Science Working Group spawned from CDI and we discussed at the recent CDI Workshop how we might want to cross-pollinate our working groups. I think there is a lot of value that we can gain from the other working groups to make the OSTP HOldren Memo mandate realized in terms of figuring out all of the crowdsourcing and citizen science projects out there and retooling the myScience piece to enable that...making it easier for future USGS scientists to submit new projects and making that data of all our crowdsourcing and citizen science projects available through some kind of API, so that way it can be used easily by the folks at GSA through the OSTP and the White House who are trying to merge all of the projects across the Federal agencies.  So, I think as an immediate request that I have is to see who might be interested to work on this project to expand the myScience application and to see how we can make that user experience around approaching crowdsourcing and citizen science at the USGS much more accessible, both internally and externally.

Daniella Birch: When you were talking about the different Science Centers and how some of them have full Web teams and some just have one person or no one, what do you think is ideal?

Sophia: I think what I noticed is just what happened, which is the nature of it and we largely had produced a lot of our data and work in the Web, so I know that it is an emerging position. And it is great to see that we have hired some folks like the person I had worked with. I think it would be nice to have something like the SPN, but at an IT/data science support level across the agency because it’s just sort of a mix of support currently across the agency. It amazes me how varied it is and I think there needs to be an agency-wide function. You know, we have CDI, a community of folks or something where it is integrated as part of our work practice and it’s not something that is volunteered per se. And it being a function where we can all leverage our different expertise and skills and make that available across our agency and across the country. I don’t think it is as easy because we are all remote, but finding a way in which we can do it or find a function that we can make Bureau-wide to build that accessibility.

Rich Signell: You talked about APIs, which I totally am a fan of also. As you have worked with all these different groups that you have been involved with, do you sense any convergence of APIs? I remember one time on, I think they took it down, but they used to advertise something like 8,000 datasets and 10,000 APIs.  Are you seeing the OGC API being used or being regarded by developers as being too slow or they don’t have time? How do you see the API picture shaking out?

Sophia: That’s a good questions. I feel like I’m not in the position to answer that because I think that I’m still learning about that and ultimately my goal is to engage folks like yourself and other scientists who are interested and in some ways need to know, what are the standards that are emerging. I have been attending the weekly open data meetings and the mydata meetings that are hosted at GSA that is primarily facilitated by folks at OSTP and OMB. We have these discussions there and I think this meeting is open to anyone in the Federal government, so you can call in, but really raising awareness of these particular conversations so that people that are part of this community are learning about it and aware of the tools and resources particularly across the Federal government instead of just our own agency.  I thought I would also share what NASA has done: &

Rich: I remember when NASA announced this API and I, and I think others, were like, “what?” It was a totally new API without any discussion about what other APIs could have worked.

Sophia: Right, that is a good point. What I was trying to look for on their site is they have listed the total number of data sets and APIs that they have available. And I think it would be interesting for USGS to find a way to...and maybe there is someone who knows here in this community, but is there a way that we can find out how many data sets and APIs we have in total? We could start to calculate these numbers and externalize them to the public and making them accessible to others who might be interested in including work from the USGS in what they do. It would involve using some of these tools and making it a whole lot easier to look at.

Madison Langseth: Something that you were talking about just before this conversation about having a network of people that can help with data, I was reminded of the CDI Workshop that we had in May and the Data Management Working Group talked about developing this sort of SWAT Team for data management type things, which sounded very much along the lines of what you were describing in your vision.

Sophia: Yeah, and I hope we can do that. I was talking with Chris Garrity, who had suggested this as well and I think this is a shared interest and I would love to see movement toward that as well. It would help raise awareness for the need for this around the USGS and the opportunities and how we can provide that support.

Madison: We also have a question from the chat: “Have you met or worked with any staff internally that are implementing some of the workflows that you mentioned in your presentation? If so, can you provide any contact information?”

Sophia: No I have not, not directly. Maybe some of them were using it to some extent, but no, I have not done that as much. With this new role, what I’d like to do is get more involved with CDI and Core Science, and learning about what is happening in CDI and how, of the tools that are being developed in CDI, they are being used. I have generally been an ethnographer, I like to say, I use ethnographic methods. Throughout my research, and part of it was just being an observer, as well as a participant in various activities, to understand how people are working...what are people doing, what are they using? I love just going to people’s offices and understanding where they work. All of it matters and that is what I would like to do is learn more about how it’s being used. What’s interesting is that in the Citizen Science Federal Toolkit in the “Manage Your Data” section ( is largely based on the USGS Science Data Lifecycle Model. I think this is why we got an award and were recognized because we really do this quite well. It would be great to understand its use across the USGS as well as how it is being applied across other agencies, especially in this toolkit.

Lance Everette: I have a question about APIs and “publishing” them through the USGS system. I was just digging around on and noticed that I couldn’t quickly find any APIs for USGS. So, what is the process right now and how do we get started?

Sophia: So, I think there are some other folks within CDI who understand this process better than I do, but what I understand is some of our sexiest data….I sometimes try to find various users of USGS data. I got to random events and tell people that I work at USGS and it’s great when they say, “Oh I use USGS data.” And I ask, “How?” or “Why” and “What challenges have you had?” And some of our best data are the earthquake and streamgage data because that is more or less in some API format, machine readable format that is streamed in some way. But most of our other data...we have really great data, like the minerals data that we have had for a really long time... they are in PDF or a lot of our data that is in Excel format that is being presented through our various websites that could be made available, not just as CSV rather than Excel format but also as APIs. One way that that is happening is by putting it into the ScienceBase system. What I have found in talking to other USGS folks is that a large number of them have not heard of ScienceBase. ScienceBase is meant to make the data available in a machine-readable format. It tries to automate it so it can be available in a JSON format or various formats that are more standardized. And the end goal of that is to get it into But what I have heard from people is that that process was very painful. And in general, has its own painpoints. So I think there is a push towards it, but I would like to see and learn how we can raise awareness of things like ScienceBase and machine-readable formats and what APIs mean.

12:20p  Working Group Reports

  • Citizen Science - Sophia Liu and Dave Govoni (N/A)
  • Data Management - Heather Henkel and Viv Hutchison (N/A)
  • Earth-Science Themes - Roland Viger (N/A)
  • Semantic Web  - Fran Lightsom and Janice Gordon
    • Semantic Web is continuing to explore our geosparql endpoint. we are having working meetings to figure things out. Anyone is welcome to come and join us to figure things out. Our next meeting is tomorrow, Thursday, December 10 at 2:00 ET.
  • Tech Stack  - Daniella Birch
    • Daniella: We have been doing a lot around open source and sharing code. I will be stepping down as coordinator for the Tech Stack working group and Rich Signell will be the new working group coordinator. He has already been organizing a lot of talks for us anyway so it only seems natural.
    • Rich Signell: Thank you Daniella for picking up the ball when we dropped it way back when. She really picked it up and ran with it for a while. We will be continuing to drill into the technologies and getting demonstrations for some of the tools that are really behind what Sophia was talking about there, so if you see something interesting, please join us.
  • Connected Devices - Tim Kern and Lance Everette
    • Lance: We are planning a January for this group and we may be talking about publishing APIs. It has come up in a number of calls that Tim and I have been on recently. So, join us for that.

12:30p  Adjourn