As big data becomes more and more popular, the push for bigger and bigger data sets becomes inevitable. Our appetite for data goes hand in hand with the desire for transparency from our corporations, governments, and other institutions. Data can answer so many questions and do so much good, but to come to the party with such a data high and a “you can never have too much” mentality can be a dangerous thing.
One person to have gotten bitten by this “more data and transparency is better” philosophy was Gavin Newsom, current Lieutenant Governor of California, and former mayor of San Francisco. During his time as mayor, he was a big proponent of total government transparency. But he quickly realized that this can create a lot of potential for your words to be “misinterpreted or [to] undermine your cause”. As this happened more and more, he found himself self-censoring to prevent the headache of a (possibly misused or out of context) quote being used against him. He goes on to say that “if you can write in emails only the things that you’d be happy to see on the front page of the local newspaper, you have to leave a lot out.”
This effect goes beyond government officials, though. It can also affect citizens. In 2013, executive editor of the MIT Technology Review, Brian Bergstein wrote a piece about this effect where citizens who donated over $100 to Prop 8 had their personal information, including names, zip codes, and some of their employers, published online. Some of the individuals then reported being harassed or having their businesses boycotted. In response to this, Evgeny Morozov, senior editor at The New Republic, says that “we are too often making this trade-off –– opting to publish more information to increase transparency even if it undermines principles such as privacy or civic involvement.”
“If you want to keep a secret, you must also hide it from yourself.”
― George Orwell, 1984
The dangers of too much data can go beyond the institutional level, and affect people at a much more personal level as well.
There are so very many criticisms of the quantified self, which is the act of tracking facets of your life such as steps, sleep, weight, and so on. The arguments against this trend is that people who are predisposed to this kind of tracking are more susceptible to anxiety around health and other more extreme problems like body dysmorphia or eating disorders.
Granted, I’ve only found arguments along these lines that are speculative, and my personal experiences with the quantified self are not so disasterous. I enjoy keeping track of various aspects of my day to day life. However, I also have to keep myself aware of when it does more harm than good. One way that it does me harm is when my sleep tracker malfunctions and I lose some number of nights’ sleep data. I find myself getting extremely frustrated, as if this sleep was somehow “lost”. In retrospect, this frustration is doing me harm. However, due to tracking my sleep, I do find myself sleeping more and on a more consistent schedule, so there are beneficial changes to my behavior along with the harmful ones.
knowing you're being watched causes you to change your behavior
But this last point is the common thread of all of these examples: knowing you’re being watched causes you to change your behavior. This “Heisenberg uncertainty principle” for data science is reason enough to think twice before adopting a more-data-is-unquestionably-better mentality. Therefore, data collection, just like every other step of the entire data analysis process, should be a mindful and measured process. Opening up data to the public is not always the panacea that it’s touted to be, and privacy (even from yourself) can be valuable and freeing and, as such, should be part of the equation.