Using Elections Data With the Chicago City Data Users Group

“Democracy depends on people giving scrutiny to all of this at all levels; the machinery of elections and the politicians making decisions about all that they do.” – Ryan Chew, Deputy Director of Elections, Cook County

We are in an election season in the Chicago area. It was therefore timely for the Chicago City Data Users Group to hear from the folks at Cook County who put an enormous amount of effort into what is nothing less than the key process for a functioning liberal democracy. Ryan Chew is the Deputy Director of Elections for Cook County. Geetha Lingham is the Manager for the voter and ballot data. Both addressed voter-related data and the complexity involved in turning data sets into accurate ballots for voters.

Cook County manages elections for a large suburban area (the City of Chicago has its own elections board) that includes 129 villages and cities, representing an area of 40×20 miles, with 1.5 million voters, representing hundreds of jurisdictions. And that is not which is what makes it complicated. What makes it complicated is that each person is part of many districts that each have their own geographical boundaries…and elected officials.

For example, look at the electable candidates for a single village: that will include candidates for the school district, the high school district, the community college district, the park district, the library district, maybe the aldermanic district, along with mayoral district, and congressional district, all with different boundary lines. At any given election, any resident has between 4 and 16 different districts that they belong to. And the proper ballots have to be created for each citizen (called ballot entitlement).

For each of the 1.5 million voters, the county has to aggregate all that information on a ballot ready for them on day they vote for elections. If one of those boundaries is off, incorrect, or unclear, their ballot is wrong. For this last election, there were nearly 800 different ballot styles for those 1.5M registered voters of the 129 municipalities. That number is probably not as high as it gets, as this was an election for taxable officials (as opposed to political officials). So again, think of all that is at stake: The process, is key to the function of the democracy; the volume of voters is high; each one is a priority; and standardization is low. That is the job that the County Elections officials that spoke to us about their usage of data is facing.

To really appreciate the process today, you have to take a look at a brief timeline of Cook County Elections:

1950’s: 60 years ago, voter registration meant a trip to city hall, or an election judge came to your door. A voter was removed from the list if the election judge determined that you didn’t live there anymore. So this was a process open to both fraud as well as honest mistake. Not very efficient.

1959: The voter list was so ungainly that the decision was made to throw out the voter list in its entirety and start all over. By the time that this was ready, it was 1961, and all would-be voters needed to re-register. That itself was an exercise on a massive scale, prone to error and vagaries. For example, someone would register with an address that said “the corner of First and Main”. Which corner? North East? South West? It makes a difference to the ballot you are going to receive. Did you registered with a Rural Route? Those change to street names over time.

1970’s: The County saw that method way of registration was not going to scale. Too few election judges, too many questions, too many people. So they decided that organizations could take training and register people to vote. Ryan used the example of League of Women Voters. That expanded through the 80’s. Having more people drive registration helps with scale, but that doesn’t help the accuracy of the registrars.

1990’s: the Motor Voter law was enacted. Federal law stated that when you get your drivers licenses or state ID, you can register on the spot (i.e. using Secretary of State data). This is something we take for granted today. They also expanded mail in registration. It was much better, but a mail in registration where handwritten cards need to be computerized is still prone to error.

2000’s: Today you can register online with a few uniquely identifiable data, such as social security number or driver’s license number. Privacy-conscious citizens may bristle, but that data is needed because the Elections team needs to balance security concerns with the requirement of ensuring you are who you say you are.

This in itself is better, but not perfect. The County is getting all of these streams of data with varying levels of quality from a variety of sources. Since the goal is to take these various streams and use it to understand who is whom and where they live, a good amount of this data needs to be shaped and re-shaped. When they send data to Social Security Administration for validation, and it is back saying “there is no such person”, they need to figure out why – every time. When they send a driver’s license number to a Secretary of State to ensure that the voter is a real person, and that comes back negative, the County has to figure out why – every time.

And then there are the “DelVanO’s”. Ryan said that these are people with names beginning with “Del”, “Van”, or “O’”. And there are lots of these in Chicago. They may, at different times, have put a space after the first part, and other times not! A data cleansing nightmare. This is just one example, but Elections have all sorts of data issues that require them to go to many other systems to determine where there was a typo, where there are actually two different people, and that every voter registration has a real identity behind it.

This is not to say that this is necessarily a manual process (although a certain amount of it is). For new voter registration, there are good tools that can validate that you are who you say you are against another system (SSN validation, DL validation, etc). They are now starting to use these tools to purge the entire list, not just new registration, to correct mistakes. For example, if someone dies, when the list is next scrubbed, it can be matched against a data set that contains death certificates and remove the dead from the voter rolls (Ryan was careful to note that the dead do not vote in Cook County, they just may be on the voter rolls).

So that is the trajectory of where we have come from and where we are at today. Diving a little deeper into today’s process gives you some appreciation for what the Elections team goes through, and the data required to do it, in order to protect the democratic process. Geetha Lingham is the Manager of Data for Voter Data and Elections Data. It is her job to ensure that they take the voter registration data and accurately associate it with the complex system of jurisdictions.

For even the smallest of villages, that means figuring out where you pay taxes, for whom, and whom you are entitled to vote for and elect. Again, think of all the districts that a single person belongs to. In some cases, your district may be clear (i.e. park district). In others, the boundary for the school district may literally go right through your house. So aggregating all that information to get a completely accurate ballot is a complex tax.

Geetha works with three primary types of data:

Parcel data: This is where they try to locate a parcel of property where a citizen resides, primarily by address. The parcel data contains a 5 digit tax code that tells you the taxing bodies relevant to you. They relate this data to the voter. The good news is that this is an accurate way to identify where you live. The down side is that, while census data changes every decade, taxing data can change every day.

Census data: This is used that identify the voter’s political districts

Precinct data: This is for the administrative units for the elections

These three come together in a GIS system, using ArcGIS. The GIS provides an index of the nearly 40,000 streets, and 630,000 unique addresses. Each of these datasets uses their own notion of streets (addresses have not yet been standardized – more on this topic in a future blog). The GIS helps to reconcile all of that spatial data. Recall that the goal here is to make sure that they get the right people in the right location so that they can create the right ballot.

GIS helps significantly, but there is a good deal of manual cleansing that takes place. This data is coming from many sources, with different ways of looking at things like street addresses (Main Street vs. N. Main Street; Randolph Drive vs. Randolph Street), and are all and updated on different frequencies. The volume is huge, the data is critically important, and the process which is fundamental to our functioning form of government. It takes dedicated people.

Ryan ended his portion of the evening by saying that “Democracy depends on people giving scrutiny to all of this at all levels; the machinery of elections and the politicians making decisions about all that they do”. It also requires passionate people with strong data skills, such as those at Cook County, to make it work for everyone. Watch the full presentation here.

Adam J. Hecktman