If you want to improve your skills as a Data Scientist, then you have to get practice at it. If you’re not sure where to start with that, chances are you can find an engaging, considerate, and driven civic tech community in your area. I started attending Hack for LA’s weekly meetups to get working on real problems in my community, and it’s been a great resource to get involved in my community and to hone my data analysis skill set.
Our team decided to tackle LA’s out of control homelessness problem by using data to inform the people, the organizations that provide services to the homeless, and hopefully the policy makers who can effect real change on the issue. There are so many individuals and organizations working on this problem, but either it’s not enough or the efforts are not aligned for maximum impact, because homelessness in Los Angeles is worse than anywhere else in the country, and homelessness is increasing year after year. Our overall mission is to “help the helpers” in order to optimize these efforts throughout the LA community.
This past weekend was Code for America’s National Day of Civic Hacking. I attended the LA Meetup, where I went to a number of panels and workshops on the growing impact of data on civic issues. One of the most insightful and inspiring panels was on how to get involved in civic tech and how to use data metrics to effect change. Here are my notes from that panel.
Panel on Civic Tech and Data Metrics
- Leonard Hyman, Creator of WeLobby
- Steven Corwin, Founder at City Grows and Compiler LA
- Jeanne Holm, Deputy CIO City of LA
- Juan Lopez, City of LA Deputy for Tech and Innovation
Leonard: What’s a data metric that you’re excited about in your own work or in the community that seems to really be working?
Stephen: The push from discrete data to realtime data. For example, a company called Placemeter sets up a video camera and you can monitor how many people walk by in a given timeframe.
Jeanne: We’re working on making things work better for all Angelenos. I’m interested in that data that’s hard to get. How do we understand how many homeless people are in the city? Those kinds of surveys are actually very hard to do.
Juan: The City Controller is working toward financial transparency. We see data as a resource as well. We’re working on the propietaries: ports, water/power, etc. Their metrics are bad and sometimes just made up, so we created a set of metrics for these propietaries. This will improve services, which at the end of the day is what we all want.
Leonard: Where do you start? Where do you find data?
Stephen: My interest is in neighborhoods and livibility. There are a lot of blogs that make data maps, and you can see where they got their data from. LA Times has a great open data portal.
Jeanne: There are a lot of formal government sites: data.lacity.org, data.gov, data.lacounty.gov. These give you a combined 500,000 datasets. The other thing I like to do is go into Google Scholar. It’s hard to search Google for finding data that’s embedded in PDFs, but if I’m interested in a very specific problem, then looking at academic research gives you really rigorous data. ArcGIS has bunch of map data available, too.
Juan: You have to start with the question. Like, how many parking tickets are issued on my street. And then you start saying: Who would have this data? Who would track this stuff? Then you can peel back the layers until you get the data you need. Talk to people and find out what they can give you. A lot of people do it this way.
Stephen: If there’s data that you’re not seeing, it might not be true that someone doesn’t want you to have it, but maybe no one has ever asked for it before.
Leonard: With that in mind, if the public can ask for data, what’s the best way that the public can use that data? What are the hardest things for governments to do that citizens can help with?
Stephen: The value in open data is that people can be inspired to do something about a problem. Look at the open source software community. People don’t contribute because they think “This software needs my help.” They think, “This software does 95% of what I need, but I can code the remaining 5% and add it to the repository.” The same is true of open data. Something that is pertinent to your life, like affordable housing, can be addressed by anyone. Anyone can be the scientist.
Jeanne: When you think of the size of the city government and the size of the city, you realize that maybe not the smartest people are all working in the government. There’s 4 million people out there. The ideas that you have at the simplest level, we [the government] really want to hear about that. These government organizations have some input channels that people can speak through to get the government’s ear. LA Open Acres is an example of a great idea that came out of a personal passion and became a government program. It’s your ideas and your passion we’re looking for.
Juan: Jeanne is right. You have to act. You have to do. If you’re looking to make an impact, then you need to get involved. Do you know who your representatives are? Find out who they are. Follow them on twitter. Tweet at them when you see something wrong. Some of them will see 5 tweets and think something is a big deal. We had an anonymous tipster bring an anomoly in our overtime data to our attention, and we looked into it, and it was right, so we made the appropriate change. So please, go out and do something.
Leonard: How do you frame good questions? For example, if you’re looking at a single metric, rent price, it varies wildly across the city, but so does dwelling size and other variables. How do you get at the heart of the problem?
Stephen: The days of just having an idea for a solution and executing it are long gone. It’s important to use data. When you’re looking at a single metric, you often don’t know if you have a correlation, a causation. You know when you take the DMV test and there are questions unrelated to driving a car? I wonder if there is reasearch behind that to show that these questions are worthwhile. You have to analyze things around you and ask questions about them.
Jeanne: It’s important to take your question and make it specific. People we’re working with are looking at homeless information, and so you look at a dataset with homeless counts and you find an answer. But then you go to the county and there’s a different number. United Way might have a third number. And it keeps going. So when I’m approaching problems, I take all of these perspectives into account. This might not be a bias issue, but it’s just a different sample. That data helps inform an action that I want to take. Take that big question you’ve got, and you dig down to get something really specific, and don’t trust just one source.
Juan: How do you ask the right questions? You have to be critical. Sometimes you’re looking at something and you get stuck, but you just shift your point of view by 3 degrees, and all of the sudden you see it in a new light. People start off at the beginning with a hunch. Explore it. Find the meat of it. You have to peel everything away and get at the core of the issue.
Leonard: Jeanne, you mentioned bias in data. With that in mind, who needs to be at the table when government is determining which data they’re looking at and which metrics are being used? Policy makers, for sure, but who else?
Stephen: My answer is the obvious answer: the stake holders. If you’re trying to solve homeless issues, you need homeless people at the table.
Jeanne: I agree there, and that’s where government sometimes fails. They talk to nonprofits and policy makers and forget who it is they’re trying to serve. It’s also important to have a diverse group: young, old, different races. In LA, we’re one of the most diverse cities there is. I think having that sense of diversity brings new ideas, new solutions, and breaks through places where we’ve all made assumptions that don’t hold true.
Juan: That’s true. Sometimes we [the government] make solutions in a vaccuum. The people who are closest to us are labor unions and organizations of people in suits, not everyday people.
Stephen: This idea comes from the Agile methodology. You should never lose sight of your goal. People sometimes get so caught up in the goal of what they’re working on, but that might have shifted from your overall goal, and you have to realign.
Audience Question: What kinds of statistics does a layperson need to know to approach these problems?
Stephen: That’s a great question, and unfortunately there’s no answer to it. There’s no specific data standard out there. You really need to figure out what you need and ask people for that. It’s hard if you can’t interpret what data you’re seeing. Instead of asking what you need to know from the data, ask what you need to answer the question, and ask the people around you to help you get there.
Jeanne: Some levels of government do have some data standards. There are standards to try to get datasets talking better with each other. And at the federal level there are initiatives as well. There’s one initiative to open source policy, where anyone could offer solutions. We’re interested in data standards, and if you come back to us we will try to fix our data, too.
Juan: I’m a political science major, so I never thought I’d work in data. You don’t have to be a data scientist, you can use Excell. Yeah, data cleaning is a huge issue, but you can at least get started. You can also use Tableau and that will take you a long way. Also, you can use Google Sheets that has an Explore feature that shows a lot of statistics and anomalies to you. You don’t need to be a data scientist, you just need to be curious.
Stephen: I want to expand on Juan’s point, if you just play around with the data, you’ll realize you know more than you think. As soon as you see a simple visualization, you might be like “Oh, I discovered something here, and I can share this with people.” This open data initiative is very new, and the government response originally was just to get data out there without as much thought to it. But that’s changing, too. And just remember to explore.
Juan: The city’s data is also in Socrata, so you don’t even need Excell to visualize it.
Audience Question: What software do you recommend for data mining?
Stephen: My experience with data mining application is pretty minor. I usually use scripting languages like Ruby. A lot of people use R, too, to find statistical insights in data. I would definitely recommend R. And there’s also MatLab.
Jeanne: I use R too. At some point, I get pretty far in Excell, but then I go into R or Hadoop.
Juan: We use R and Tableau. The city has been talking about BI tools for years, like IBM’s Watson, etc., but it’s so new that I don’t think it’ll be used to its potential right now. It’ll mostly be power users for now but that will change in the coming years.
Audience Question: I’ve heard about it being hard to get feedback to the LA goverment. Some people just bypass the government altogether because it’s so difficult to work with. How do you think that can be reconciled?
Jeanne: I think that’s totally fair. Government is this huge monster of organizations. We’re treating homelessness in LA like we should treat everything. There’s this No Wrong Door policy where homeless people should be directed to the services they need no matter if they go to a library or a fire department. There are 40,000 city employees we’re trying to get on board with this and it’s hard. But we’re working to be better.
Juan: Government can seem like a fortress sometimes. LAUSD is a fortress. The county is getting better. When I first worked for the City Controller and tried to get some data from LAUSD, I was shut down. This is a problem I’ve been working on for years. Some of these organizations don’t have a culture of openness. You have the power to vote other people in to those jobs. Michelle Kane says she’ll try to change this culture within LAUSD, so we’ll see if that happens.
Stephen: As a private sector angle, you kind of have to put yourself in the shoes of who you need something from. The people working in these organizations are the same as you, and what you’re asking for probably doesn’t fall under their job description. But if you approach the situation in a way that makes it easy for you to get what you want - for example don’t go in saying you want to talk, but say, I’ve done this research and I have specific questions - you’re more likely to convice somebody to help. People just need a reason for something and it doesn’t always matter what the reason is. There was a guy who did an experiment where this guy cuts in line and says “I’m cutting in front of you” and then he did the same thing saying “I’m going to cut in front of you because I need to cut in front of you” and people were less resistant to the second approach.
There are people and initiatives in government that are trying to foster openness and data-centric approaches to age-old civic problems. As more of this data becomes available, it can yield more insights into the communities around us. Start by looking around you and asking questions. Reaching out to civic tech groups that are engaged in this problem is a great way to create connections with people who can help you answer the questions you have. By practicing and talking to others about the data and the questions you have, you can simultaneously improve your community and improve your data analysis skills.