by Ben Tarnoff
What is big data? And how do we democratize it?
This piece is also available in audio from our friends at Curio.io.
Google knows you’re pregnant. Spotify knows your favorite throwback jams.
Is this convenient or creepy? It depends. Anxiety is a feeling that tends to come in waves—big data anxiety is no different. One minute, you’re grateful for the personalized precision of Netflix’s recommendations. The next, you’re nauseated by the personalized precision of a Facebook ad.
Big data has been around for awhile, but our discomfort with it is relatively recent. We’ve always had dissenters sounding the alarm about Silicon Valley’s surveillance-based business model. It’s only since 2016, however, that their message has gone mainstream. The election of Donald Trump punctured many powerful fictions, among them the belief in the beneficence of the tech industry. The media, long captive to the tales that Silicon Valley tells about itself, has turned a sharper eye on tech. Among other things, this has meant greater public awareness of how a handful of large companies use technology to monitor and manipulate us.
This awareness is a wonderful thing. But if we want to harvest the political opportunity it presents, and channel the bad feelings swirling around tech into something more enduring and transformative, we need to radicalize the conversation. The techno-skeptical turn is fragile, incomplete—it needs to be consolidated, intensified. It’s good that more people see a problem where they didn’t before. The next step is showing them that the problem is much larger than they think.
The problem is not personal. Yes, our private lives are being pillaged on an unprecedented scale. Information as trivial or as intimate as our favorite sandwich or our weirdest sexual fantasy is being hoarded in data centers and strip-mined for profit.
But big data is bigger than that. It is not merely the mechanism whereby Google learns you’re pregnant. It is not confined to the cluster of companies that we know, somewhat imprecisely, as the tech industry.
Rather, big data describes a particular way of acquiring and organizing information that is increasingly indispensable to the economy as a whole. When you think about big data, you shouldn’t just think about Google and Facebook; you should think about manufacturing and retail and logistics and healthcare. You should think about pretty much everything.
Understanding big data, then, is crucial for understanding what capitalism currently is and what it is becoming—and how we might transform it.
What Makes Data Big?
As long as capitalism has existed, data has helped it grow. The boss watches how workers work, and rearranges them to be more efficient—this is a good example of how surveillance generates information that’s used to improve productivity. In the early twentieth century, Frederick Winslow Taylor made systematic surveillance of the productive process a key part of “scientific management,” a set of widely influential ideas about how to increase industrial efficiency.
Data is useful for capitalism. That’s not new. What’s new is the scale and significance of data, thanks to breakthroughs in information technology.
Take scale. Digitization makes data infinitely more abundant, because it becomes much easier to create, store, and transmit. You can slap a sensor on almost anything and stream data from it—an assembly line, a gas turbine, a shipping container. Our ability to extract information from the productive process in order to optimize it has reached a level of sophistication far beyond anything Taylor could’ve ever imagined.
But observing the productive process isn’t the only way we create data. More broadly, we create data whenever we do anything that is mediated or monitored by a computer—which, at this point, is almost everything. Information technology has been woven into the entire fabric of the economy. Just because you’re not directly using a computer doesn’t mean you’re not making information for someone somewhere. Your credit score, your healthcare history—simply by virtue of being alive in an advanced capitalist country, you are constantly hemorrhaging data.
No single technology contributes more powerfully to our perpetual data hemorrhage than the internet, of course. The internet both facilitates the flow of data and constantly creates more of it. It goes without saying that everything we do online leaves a trace. And companies are working hard to ensure that we leave more traces, by putting more of our life online.
This is broadly known as the “Internet of Things”: by placing connected devices everywhere, businesses hope to make corporate surveillance as deeply embedded in our physical environment as it is in our virtual one. Imagine a brick-and-mortar store that watches you as closely as Facebook, or a car that tracks you as thoroughly as Google. This kind of data capture will only grow in coming years, as the already porous boundary between online and off disappears.
At one level, then, big data is about literal bigness: the datasets are larger and more diverse because they are drawn from so many different sources. But big data also means that data can be made more meaningful—it can yield valuable lessons about how people or processes behave, and how they’re likely to behave in the future.
This is true for a few reasons. It’s partly because we have more data, partly because we have faster computers, and partly because developments in fields like machine learning have given us better tools for analysis. But the bottom line is that big data is driving the digitization of everything because any scrap of information, when combined with many other scraps and interpreted en masse, may reveal actionable knowledge about the world. It might teach a manufacturer how to make a factory more efficient, or an advertiser what kind of stuff you might buy, or a self-driving car how to drive.
If information can come from anywhere, then it can hold lucrative lessons for any industry. That’s why digitization is becoming as important to capitalism as financialization became during and after the 1970s. Digitization, as scholars like Shoshana Zuboff and Nick Srnicek have shown, offers a new engine of capital accumulation. It gives capitalism a new way to grow.
Rosa Luxemburg once observed that capitalism grows by consuming anything that isn’t capitalist. It eats the world, to adapt Marc Andreessen’s famous phrase. Historically, this has often involved literal imperialism: a developed country uses force against an undeveloped one in order to extract raw materials, exploit cheap labor, and create markets. With digitization, however, capitalism starts to eat reality itself. It becomes an imperialism of everyday life—it begins to consume moments.
Because any moment may be valuable, every moment must be made into data. This is the logical conclusion of our current trajectory: the total enclosure of reality by capital. In the classic science-fiction film The Blob, a meteorite lands in a small town carrying an alien amoeba. The amoeba starts expanding, swallowing up people and structures, threatening to envelop the whole town, until the Air Force swoops in and air-lifts it to the Arctic.
Big data will eventually become so big that it devours everything. One way to respond is to try to kill it—to rip out the Blob and dump it in the Arctic. That seems to be what a certain school of technology critics want. Writers like Franklin Foer denounce digitization as a threat to our essential humanity, while tech industry “refuseniks” warn us about the damaging psychological effects of the technologies they helped create.
This is the path of retreat from the digital, towards the “authentically human”—an idea that’s constantly invoked by the new techno-moralists but rarely defined, although it’s generally associated with reading more books and having more face-to-face conversations.
The other route is to build a better Blob.
Building a Better Blob
Data is the new oil, says everyone. The analogy has become something of a cliche, widely deployed in media coverage of the digital economy.
But there’s a reason it keeps coming back. It’s a useful comparison—more useful, in fact, than many of the people using it realize. Thinking of data as a resource like oil helps illuminate not only how it functions, but how we might organize it differently.
Big data is extractive. It involves extracting data from various “mines”—Facebook, say, or a connected piece of industrial equipment. This raw material must then be “refined” into potentially valuable knowledge by combining it with other data and analyzing it.
Extractive industries need to be closely regulated because they generate all sorts of externalities—costs that aren’t borne by the company, but are instead passed on to society as a whole. There are certain kinds of resources that we shouldn’t be extracting at all, because those costs are far too high, like fossil fuels. There are others that we should only be extracting under very specific conditions, with adequate protections for workers, the environment, and the broader public. And democratic participation is crucial: you shouldn’t build a mine in a community that doesn’t want it.
These principles offer a framework for governing big data. There are certain kinds of data we shouldn’t be extracting. There are certain places where we shouldn’t build data mines. And the incredibly complex and opaque process whereby raw data is refined into knowledge needs to be cracked wide open, so we can figure out what further rules are required.
Like any extractive endeavor, big data produces externalities. The extractors reap profits, while the rest of us are left with the personal, social, and environmental consequences. These range from the annihilation of privacy to algorithmic racism to a rapidly warming climate—the world’s data centers, for instance, put about as much carbon into the atmosphere as air travel.
Society, not industry, should decide how and where resources are extracted and refined. Big data is no different.
Giving People Stuff
Regulating big data is a good start, but it’s far from revolutionary. In fact, it’s already begun: the General Data Protection Regulation (GDPR) that takes effect in the European Union in 2018 embodies aspects of this approach, imposing new obligations on companies that collect personal data. Congress isn’t anywhere close to passing something similar, but it’s not impossible to imagine some basic protections around data privacy and algorithmic transparency emerging within the next decade.
More public oversight is welcome, but insufficient. Regulating how data is extracted and refined is necessary. To democratize big data, however, we need to change who benefits from its use.
Under the current model, data is owned largely by big companies and used for profit. Under a more democratic model, what would it look like instead?
Again, the oil metaphor is useful. Developing countries have often embraced “resource nationalism”: the idea that a state should control the resources found within its borders, not foreign corporations. A famous example is Mexico: in 1938, President Lázaro Cárdenas nationalized the country’s oil reserves and expropriated the equipment of foreign-owned oil companies. “The oil is ours!” Mexicans cheered.
Resource nationalism isn’t necessarily democratic. Revenues from nationalized resources can flow to dictators, cronies, and militaries. But they can also fund social welfare initiatives that empower working people to lead freer, more self-directed lives. The left-wing governments of Latin America’s “pink tide,” for instance, plowed resource revenues into education, healthcare, and a raft of anti-poverty programs.
In a democracy, everyone should have the power to participate in the decisions that affect their lives. But that’s impossible if they don’t have access to the things they need to survive—and, further, to fulfill their full potential. Human potential is infinite. “You can be anything when you grow up,” parents tell their kids, a phrase we’ve heard so often it’s become a cliche—but which, when taken literally, is a genuinely radical thing to say. It’s a statement that would’ve been considered laughable for most of human history, and remains quite obviously untrue for the vast majority of the human race today.
How could we make it true? In part, by giving people stuff. And this stuff can be financed out of the wealth that society holds and creates in common, including the natural wealth that Thomas Paine once called “the common property of the human race.”
Data isn’t natural, but it’s no less a form of common property than oil or soil or copper. Resources that come from a planet we all happened to be born onto belong to everyone—they’re our “natural inheritance,” said Paine. Data is similar. Data is made collectively, and made valuable collectively.
We all make data: as users, workers, consumers, borrowers, drivers. More broadly, we all make the reality that is recorded as data—we supply the something or someone to be recorded. Perhaps most importantly, we all make data meaningful together, because the whole point of big data is that interesting patterns emerge from collecting and analyzing large quantities of information.
This is where the excessive emphasis on personal data is misleading. Personal data represents only one portion of the overall data pool. And even our personal data isn’t especially personal to the companies that acquire it: our information may have enormous significance for us, but it’s not particularly significant until it’s combined with lots of other people’s information.
There’s a contradiction here, the most fundamental contradiction in capitalism: wealth is made collectively, but owned privately. We make data together, and make it meaningful together, but its value is captured by the companies that own it, and the investors who own those companies. We find ourselves in the position of a colonized country, our resources extracted to fill faraway pockets. Wealth that belongs to the many—wealth that could help feed, educate, house, and heal people—is used to enrich the few.
The solution is to take up the template of resource nationalism, and nationalize our data reserves. This isn’t as abstract as it sounds. It would begin with the recognition, enshrined in law, that all of the data extracted within a country is the common property of everyone who lives in that country.
Such a move wouldn’t necessarily require seizing the extractive apparatus itself. You don’t have to nationalize the data centers to nationalize the data. Companies could continue to extract and refine data—under democratically determined rules—but with the crucial distinction that they are doing so on our behalf, and for our benefit.
In the oil industry, companies often sign “production sharing agreements” (PSAs) with governments. The government hires the company as a contractor to explore, develop, and produce the oil, but retains ownership of the oil itself. The company bears the cost and risk of the venture, and in exchange receives a portion of the revenue. The rest goes to the government.
Production sharing agreements are particularly useful for governments that don’t have the machinery or expertise to exploit a resource themselves. This is certainly true in the case of big data: there is no government in the world that can match the capacity of the private sector. But governments have something the private sector doesn’t: the power to make and enforce laws. And they can use that power to ensure that data extractors pay for the privilege of making a profit from common property.
The Data Dividend
Bringing data revenues into public coffers is only the first step. To avoid the bad forms of resource nationalism, we would also need to distribute those revenues as widely as possible.
In 1976, Alaska established a sovereign wealth fund with a share of the rents and royalties collected from oil companies drilling on state lands. Since 1982, the fund has paid out an annual dividend to every Alaskan citizen. The exact amount fluctuates with the fund’s performance, but in the last few years, it’s generally ranged from $1000 to $2000.
We could do the same with data. In exchange for permission to extract and refine our data, companies would be required to pay a certain percentage of their data revenue into a sovereign wealth fund, either in cash or stock. The fund could use that capital to acquire other income-producing assets, as the Alaskan fund has, and pay out an annual dividend to all citizens. If it were generous enough, this dividend could even function as a universal basic income, along the lines of what Matt Bruenig has proposed.
A data fund that distributes a data dividend would help democratize big data. It would enable us to collectively benefit from a resource we collectively create. It would transform data from a private asset stockpiled by corporations to make a small number of people rich into a form of social property held in common by everyone who helps create it.
If we’re going to require companies to pay a chunk of their data revenue into a fund, however, we first have to measure that revenue. This isn’t always easy. A company like Facebook, by virtue of its business model, is wholly dependent on data extraction—all its revenue is data revenue. But most companies don’t fall into that category.
Boeing, for instance, uses big data to help manufacture and maintain its planes. A 787 can produce more than half a terabyte of data per flight, thanks to sensors attached to various components like the engines and the landing gear. This information is then analyzed for insights into how to better preserve existing planes and build new ones. So, how much of Boeing’s total revenue is derived from data?
Further, how much of a company’s data revenue can be attributed to one country? Big data is global, after all. If an interaction between an American and a Brazilian generates data for Facebook, where was that data extracted? And if Facebook then refines that data by combining it with information sourced from dozens of other countries, how much of the value that’s subsequently created should be considered taxable for our data fund?
Measuring data’s value can be tricky. Fortunately, scholars are developing tools for it. And politics can help: in the past, political necessity has motivated the creation of new economic measurements. In the 1930s, the economist Simon Kuznets laid the basis for modern GDP because FDR needed to measure how badly the Great Depression had hurt the economy in order to justify the New Deal.
Economic measurement doesn’t happen in a vacuum. Political power helps determine which parts of the economy are worth measuring, and how those measurements are understood. If we can build enough power to make our data ours, we can build enough power to measure what it’s worth.
Every analogy breaks down eventually. Thinking about data as the new oil takes us a fair distance towards understanding how it works, how to regulate it, and how to socialize it.
But data is also very different than oil, or any other resource. That’s because it has genuinely radical potential. It’s not just a source of profit—it’s also, possibly, a mechanism for moving beyond profit as the organizing principle of our economic life.
Maybe the most intriguing idea from the Marxist tradition is that capitalism creates the conditions for its overcoming—that the building blocks for making a better world are already present in our own. Information technology is almost certainly one of those building blocks. Data gives capitalism a new way to grow, yes, but it also might give us a way to turn capitalism into something else.
One of capitalism’s sustaining myths is that it’s unplanned. Markets impartially, impersonally allocate wealth; Detroit goes bankrupt, Jeff Bezos makes another billion dollars, all because of something called the market. In truth, however, capitalism is planned. The planners are banks and other large financial institutions, as the economist J.W. Mason has pointed out—they make the decisions about how to allocate wealth, and their decisions are anything but impartial or impersonal.
What if those decisions were democratic? What if everyone had the power to help make them? Such an economy would still be planned, of course. But planning would have to become more explicit and more participatory. This would also presumably change what an economy is for: if everyone had a say over how society organizes its wealth, the economy would no longer be run solely for the purpose of profit-making. It would become a machine for fulfilling human needs.
Fulfilling human needs is a daunting task. After all, people’s needs vary. We all share some big ones, like the need for food, shelter, healthcare, and a habitable planet. But beyond the basics, needs can get pretty varied.
For that reason, democratic planning is likely to be more complex than the capitalist variety. Drug addicts often talk about the clarity of addiction, how it simplifies one’s life by structuring it around a single goal: scoring the next dose. The clarity of capitalism is similar: it structures the economy around profit-making. In a society without this compulsion, the economy becomes less simple. Planning no longer serves a single goal, but many.
This is where data comes in. Information technology has the potential to be planning’s killer app. It offers tools for meeting the complexity of the task, by enlarging our capacities for economic coordination.
Better Laid Plans
The idea of using computers to plan an economy isn’t new. The Soviets briefly experimented with it in the 1960s, Salvador Allende’s Chile explored it in the 1970s, and Western leftists have been particularly interested in it since the 1990s. In 1993, W. Paul Cockshott and Allin Cottrell published Towards a New Socialism, which proposed that advances in computing made a more efficient, flexible, and liberating form of planning possible—a theme picked up by more recent “accelerationist” works like Inventing the Future by Nick Srnicek and Alex Williams.
The dream of a digitally run economy is an old one, then. But it’s rapidly becoming more workable, as vast new quantities of information become available.
The problem of planning is primarily a problem of information. Friedrich Hayek famously said that planning couldn’t work because markets have more information than the planners. Markets give us prices, and prices determine what to produce, how to allocate assets, and so on. Without markets, you don’t have the price mechanism, and thus you lose a critical source of information. In Hayek’s view, this explained the inefficiencies of Soviet-style command economies, and their failure to meet people’s material demands.
As more of our economy is encoded as data, however, Hayek’s critique no longer holds. The Soviet planner couldn’t possibly see the entire economy. But the planner of the near future might. Data is like the dye that doctors inject into a patient’s veins for an MRI—it illuminates the entire organism. The information delivered by prices looks crude by comparison. Who needs prices when you know everything?
Greater transparency enables greater coordination. Imagine a continuous stream of data that describes all economic activity in granular detail. This data could be analyzed to obtain a clearer picture of people’s needs, and to figure out how to fulfill those needs in the most efficient and sustainable way.
Even better, much of this process could be automated. Economic democracy has the potential to be terribly time-consuming. Everyone should have the opportunity to participate in the decisions that most affect them, but nobody wants to make every decision. Nick Srnicek and Alex Williams offer one possible solution: rather than subject every last detail of the economy to democratic deliberation, we could come up with our preferred outcomes—“energy input, carbon output, level of inequality, level of research investment and so on”—and let the algorithms worry about how to get there.
This is an excellent future, and an entirely feasible one. But it’s far from guaranteed. Transparency, coordination, automation—if these have democratic possibilities, they have authoritarian ones as well.
China is likely to be the innovator in this respect. The government is developing a “Social Credit System” that uses big data to rate citizens’ “trustworthiness.” China also happens to be investing heavily in big data and artificial intelligence, which suggests that more sophisticated forms of surveillance and control will soon emerge.
Technology helps set the parameters of possibility. It frames our range of potential futures, but it doesn’t select one for us. The potential futures framed by big data have a particularly wide range: they run from the somewhat annoying to the very miserable, from the reasonably humane to the delightfully utopian. Where we land in this grid will come down to who owns the machines, and how they’re used—a matter for power, and politics, to decide.
Ben Tarnoff is a founding editor of Logic.