The Lack of Data

If I wanted to, I could tell you the precise angle and speed of a small ball thrown in a game of professional baseball. I can tell you all sorts of information about each of the nearly 300 pitches thrown in any of the 162 games each baseball team plays each year.

As a person who cares about data, you can learn a lot by looking simply at what data is– and is not– collected. If I knew nothing else about human society besides the detail in which we gather data about each of our activities, I could wisely intuit that America seems to care a lot about baseball. Every detail of these gatherings, from each man's position on the field, to the number of people watching from the stands, is carefully noted and stored and archived for further use.

As a person who cares about data, when I want to learn more about something one of the first things I do is seek out the data that has been gathered about it. The amount of data being collected these days, as you may have heard, is reminiscent of oil. It is a commodity to be bought and sold, and the data surrounding our interests, desires, and willingness to part with our money are invisibly traded and are the cornerstones of America's largest businesses: Facebook, Google, Amazon, and a tumultuous school of would-be competitors following closely behind.

But when you go to look for data about, say, the number of people that the police have shot or killed in departments across America, there is an eery absence.

And once you have encountered one huge, important, absence of data you can't stop seeing them.

How many rape kits remain untested in America's largest cities? We don't really track that.

How many guns were registered per county in the last year? We don't really track that.

How many people were killed by police in the last year? Well, we didn't really track that until different newsrooms and non-profit organizations took it upon themselves to do the work the police should be doing, using their own resources to track and publish the statistics necessary to understand one of the biggest problems our society is facing.

It is certainly not because a lack of ability to track these things. If I drive by a house and want the data on how much it costs, the prices of all of it's previous sales, and it's score in various indices that rate it's proximity to schools, grocery stores, and public transit.

If I own a mall and I want to track the phone of every person who enters the door, I can rent a service that will do so. It tracks each person's device via Bluetooth or Wifi, and makes note of their path around the store and how long they stay in different sections. I might use this data to adjust the layout of my store to encourage more purchases.

We are able to track the things we care about to an incredible, awe-inspiring amount of detail. Sometimes, not always, that helps us understand the world around us and design ways to do things better, or at least differently, and see how the data changes.

But without a baseline, there is no way to determine if an intervention is successful or not. If we don't know a patient's cholesterol, we can't tell if the medicine we have given them is working or not. So we track those things and use all of that data to guide our actions.

#data #dataviz