Ledge: Hey, welcome to the Gun.io Frontier Podcast. I’m Ledge. Today, I have Josh Hendler with us. Josh is the chief technology officer of Purpose, a global strategic consultancy and creative agency that builds and supports campaigns and tools for leading organizations, activists, businesses and philanthropies.
Josh is a political technology leader. He has served as a technology chief for the Democratic National Committee and for Organizing for America. He has also worked with the Obama campaign, Rock the Vote, and major league baseball.
Josh, thanks for joining us.
Josh: Happy to be here.
Ledge: Awesome! You and I met on Twitter of all things. I didn’t know that that really happened. And I reached out. I was really interested in this project you and Purpose are doing called “Crush the Midterms.”
Tell us quickly about it. After that, I want to dive into the technology picture behind something of this nature because I think it will really be interesting to our technology leadership audience.
Josh: Absolutely! It turns out that this election happening in fifty odd days is really important. And so, way back in May or June, given all the elections that are happening everywhere across the country, we came up with this idea which is actually really complicated to figure out “What’s the best way for an individual to actually get involved?”
If you’re trying to volunteer time, if you’re trying to give your money, if you’re trying to activate your friend network, answering that question seems like it should be really straightforward.
But it turns out that it’s far from straightforward.
Obviously, there are elections happening on the federal level, at the House and the Senate but there are also incredibly important elections happening in the local level. State House and state Senate elections are some of the most important ones happening.
We think of this as an experiment. So, essentially, we decided to come up with a way to take a bunch of folks, take a bunch of information from folks and, based on about eight questions, we have an algorithm that an actual recommendation for “Here’s the most impactful thing that you can do over the next fifty-five days to help win the midterm election.”
I will put my partisan hat on and say that it’s definitely set up to help Democrats. Not everyone agrees with that. But that’s really where we came up with the idea.
We launched at the tail end of July and just have found really fantastic feedback; and we’re continuing to build it and grow it and make it even more useful and help folks crush the midterms.
Ledge: Fantastic! That’s really exciting. I was saying that this thing is going on a roll in a way that everybody hopes that their thing will go viral.
Maybe I’ll plan on that. Maybe you’re surprised at the results. Either way, there is a technology scaling question going on there. And I would love if you would dive into the stack and how you’re handling that ─ the load, the distribution of traffic, the hosting, everything there.
What does this look like on the backside to make something like that happen?
Josh: That’s a great question. When we were running these sorts of projects, as I’ve said before, they’re always experiments. What we don’t want to do is spend months and months building infrastructure and making it as scalable as possible before it actually meets the
That said, we also know that, in politics especially though this is generalizable beyond that, your big moments are often the most unexpected ones. It’s often not launch day. It’s often when some random celebrity decides to shine some light on your project and, all of a sudden, you go from a couple of dozen users in an hour to hundreds of thousands.
That is definitely something I’ve experienced before in politics. And so, that has definitely led us to a way of thinking about how to build these things and deploy them in a way that’s not necessarily overengineered but in a way that can scale in big moments.
Typically, we are at Purpose and Crush the Midterms very at home in AWS world. In terms of the tech itself, we’ve built in a relatively way. It is Python and Django. We are absolutely in love with AWS database that just makes scaling dramatically easier.
And so, we’ve set this up on Amazon using their container service and it’s set up in such a way that we’ve been able it deploy it such that, “Yes, we’re not going to overinvest in infrastructure on Day 1 but if this does get really popular, then we’re able to scale it really easily and deal with whatever that unexpected load is.”
The other thing I will say is that we’ve got plenty of in big moments to think about like, okay, what can this possible scale look like? and, as much as possible, try to run some testing ahead of time to see, okay, how many visitors can we actually deal with and how many plans can we create in a given moment?
That’s definitely something we a little bit. And, as I’ve said before, it’s like a bouncing act where we didn’t want to overengineer this and but we also know that this is incredibly important and it’s happening in fifty days.
If we crashed during the moment when some major celebrity featured us, that would be really bad.
Ledge: And so, is the scaling automatic? Is that a threshold-based thing? I’m not a DevOps guy. How does that work from a horizontal-vertical standpoint?
Josh: We have a little bit of both horizontal and vertical. We’ve tried to set this up in such a way that we can scale horizontally, at the very least, on the upside. We basically have taken the approach of trying to have enough capacity such that if we need to scale horizontally at any moment, we can do that manually. We haven’t yet set it up at a place where it’s scaling automatically.
That has served us really well. I think the other thing is that we’ve tried to take this approach by guessing what traffic could be in a really good world and multiplying that by five or so and making sure we’re prepared for that.
If we’re doing really well and we’re seeing this amount of traffic, we’re making sure that we have the infrastructure for five times that.
This is all like a totally rough guessing game but that’s been really helpful. And then, things like really help us on the database side with the ability to add replicas and scale that to some extent horizontally as well.
Ledge: You use a bunch of data to help people make their plans. That’s not data that’s probably easy to access, I imagine. There is no open API for all these things.
Where did it come from? How did you get it? Is this scraping or crawling? What’s going on here?
Josh: That’s a great question. We realized pretty early on that part of the challenge here was that this data just wasn’t all in the same place and that’s one of the reasons that this was a particularly hard problem. So the answer is a little bit of all of the above.
Number one, it’s understanding what political geography someone lives in is the first problem. What House district are you in? What state Senate district are you in?
The good news about that is we’re able to pool in a lot of publicly available census data and shapefiles to actually make those determinations using to actually bring in those shapefiles and understand exactly what district people are in.
That is all largely publicly available. Especially when you get to newer districts that have been changed recently, those aren’t all available with census which is a little bit of a pain in the butt.
So, number one, we’re able to pool together a lot of publicly available information. And, number two, we absolutely just had to do some scraping and pulling data from sites that aren’t available in open format.
One challenge was “Okay, how do we even figure out where the volunteer ─ we want to feature thirty or forty different organizations that out there. What’s the best way to volunteer?”
There’s a little bit of just a brute force thing here of getting some of our team in the room and search for things and build up some of these databases ourselves. And that was challenging.
Finally, it was finding a number of different partners who actually were trying to solve, I would say, different slices of this problem. In some cases, those partners had readily available APIs; and, in some cases, we had to work with them to figure out how to get access to that sort of documented APIs.
But it was really a combination of trying to pull data from eight to ten different sources and combining it. And going back to the scaling question, when we heavily rely on our partners, it’s making sure that that doesn’t create a potential single point of failure for the entire app.
So we tried to deploy pretty aggressive caching strategies so that in the case one of our partners fails, we would be in a reasonably good position until they came back out.
That’s something we really thought about a lot. Obviously, when you’re reliant on a number of different APIs, which we are, it’s something that we thought about ahead of time because we’ve seen these sort of failures happen before when we were not entirely in control of our own destiny.
A great example of open data that we were able to use is the DNC ─ Democratic National Committee ─ early this year or, actually, I think last year which released a ton of political information including things like voter registration deadlines and vote my mail deadlines just as a publicly available.
And so, that wasn’t an API. It was just like data that we were able to pull out. And that was incredibly helpful for us. I think it’s super cool that you have the party doing things like that because it’s just not helping them; it’s helping Democrats all over the country.
And so, we’re able to both use that and also submit a bunch of changes and either inaccurate or adding data that we’ve collected ourselves.
So that was a super cool example of using open data as part of this.
But I think the big story is if we’re trying to get information about who is on the ballot and in what states and what candidates and how you can volunteer for different candidates and who are the best candidates and what races are competitive, very little of this information is public.
Even the competitiveness information is super interesting. As people are, obviously, obsessed right now with looking at FiveThirtyEight to understand the races and how competitive they are.
But one challenge for us is we’re just like, okay, there are races all over and in order to give someone a real recommendation for what races I should care about, we need to know how competitive these races are, right? Is your congressional race like Alaska’s?
And so, for us, that was just like we identified five top providers of race competitiveness data and just scrape that every week.
It was super hacky but there was no other way to do it.
Ledge: What does the ETL’s set up look like? You’re pulling in all these quasi-structured data none of which, I assume, is in a similar format at all or certainly not a commensurate format.
What are the actual technologies that you’re using to normalize that?
Josh: I will admit that it’s reasonably hacky just using Python to pull down the sites and scrape as much information as we can and load it into basically. It’s, as you’d expect, super fragile.
And so, as stuff changes, we need to revisit it. And we’ve done some combination of trying to scrape things automatically and using to double check that.
Basically, it’s some combination of “Okay, we have athat’s pooling stuff in but also creating a fairly simple CMS for our team to go in make modifications when things aren’t exactly right.
And when races get upgraded, those are things that we want make sure to try to bring into the app as quickly as possible.
Looking back at this, I think could definitely have done something more elegant on the ETL side. But we did kind of what we needed to do
Ledge: So you said you went zero idea to implementation starting July to now. You went into production in a month then?
Josh: We really started this in May, June when we really started moving. We ran some very early experimentation to try to get a sense of how interesting this was. We did some paper-based user testing in San Francisco that we worked basically.
I would say that it was a couple of months to three months to get everything out the door.
At the last minute, we decided to change that that created a ton of work. But it was reasonably quick from start to finish.
The hardest thing here was trying to pool together all the data and then think about how to develop an algorithm that is reasonably sophisticated that gives people good suggestions but is also like human explainable so that we can say, “Here’s the recommendation and here’s why we made that recommendation” which, for us, is really important because we knew that trust is really important in politics.
And so, it’s making sure that people aren’t just going to say, “Oh, this algorithm told me to volunteer for this candidate so I’m going to do that.”
They want to know why.
And so, I think that potential forced us to make it little bit less complex. But that was an important part of it as well.
Ledge: What are the roles on the team and how did you organize it? That’s fast. That’s a lot of people doing a lot of things, a lot of integration. What’s the project management paradigm? How did you organize? Who were the people? What are they doing?
I think that’s a really important question for anybody trying to pull off an effort at all that fast. So how did you do that and how did that work?
Josh: We were doing weekly sprints and when we started we had the benefit of Purpose’s really award-winning creative team which was fantastic.
So, basically, we had a creative team that was riffing on the design and trying to get that as locked as possible. And we had a full time frontend and backend engineer who was helping do some of the early experimentation.
Actually, when we were locking in production, we had our writer and editor who was part of the team who was creating a lot of contentthird-party sources for all the data.
And I was sort of playing the jack of all trades, scrum master as well as product and pulling that together. And then, we also had a DevOps person who was making sure that everything was automatable and that our deployment procedure made sense and that our deployment gets up in AWS.
So that was roughly the team. We got into our groove in June as to being clear about what stories were playing every week and doing demos every week and showcases and making sure we’re all clear on what we’re building and how.
It was a very ambitious project but we had a really awesome team of different types of folks coming together.
Ledge: Fantastic! Last question: So you built all this technology; election comes and goes; you’ve collected a bunch of data. What do you see as the future of it? Does it become something else? Does it evolve? Do you open source the data? What are you thinking about there?
Josh: We think that there is a lot of promise in this idea of helping individuals have more agency in their activism, in thinking about how they’re going to solve some of the biggest issues facing our world right now.
And so, we are really intrigued about how we take the same approach that we here which is based on a bunch of information making a recommendation of a plan for how to solve a particular problem?
In this case, that’s winning the election. But we think that type of solution might work in a number of different domains in the social sector.
And so, I think, coming out of the election, first of all, there’s definitely a lot of stuff that we’ve built which is very specific for the election but we do think we are keen to start experimenting with other domains.
We think that there’s a here that can be shared across them ─ essentially, an API that can give a bunch of information, can make recommendations on the most impactful plan for solving a particular problem.
And so, I think we’re going to spend a bunch of time trying to focus on that and trying to see if this approach will work in other domains, not least the 2020 election which is right around the corner. But we think on the issue side as well.
We know people are so interested in not just politics but anything relating to how we can solve some really tough problems in the world right now.
So we’re excited about that and we think we’re going to probably have a spare amount of technical debt to clean up after election day but we think there’s a lot of promise here.
Ledge: We did a great episode about technical debt and legacy code. So you’re welcome to check that out after the election and when you have some spare time.
Josh, thank you. This is super enlightening. It’s so much fun. It’s really good to have you here. Best of luck with the mission and you’ve got a crazy fifty days.
Keep up the good work! Thanks so much.