Get a demo of a Bloomberg terminal. You’ll be is blown away by the depth of available data. Thousands of statistics, historical tables, sources… Everything is available through the proprietary terminal. Bloomberg started by offering a real-time news flow dedicated to the needs of the financial community, traders, analysts, etc. Over the years, the system expanded in two directions. First, remarkable journalistic work grew Bloomberg from a unidimensional newswire into a multi-product company providing breaking news, features stories, in-depth reporting, TV feed, radio, podcasts, even a magazine. The service is encapsulated in a terminal rented for a fixed price (€1800 a month), no discount, no complex pricing structure, just one product, that’s it. (This choice of integrating content into a piece of hardware reminds me of a famous Cupertino-based fruit company). Bundled with the product, you get raw data, lots of it. That’s the other Bloomberg’s gem. The ability to tap into big databases is an essential journalistic tool. It undoubtedly helped Bloomberg to reach its status in the financial information sector.
Access to a world of cross-referenced historical data dramatically improves the journalist’s ability to put events in perspective, quickly and accurately. Consider the ambitious journalistic project initiated by the New York Times last Spring: titled Remade America, this seven parts series examines how immigration is reshaping the country. The treatment relied on databases like this “Immigration Explorer” interactive map displaying demographic changes, over a century (1880-2000), for each of the 3140 counties that (now) constitute the country.The project required the aggregation of huge amounts of data coming from many sources, one of them being the Social Explorer that proposes 15.000 interactive maps reconstructed from publicly available data:
Here are other examples that demonstrate the newsworthiness of well processed data.
This map shows the structure of the US workforce, by country of origins, skills, etc.:
This one, produced by the Associated Press, shows the evolution of the US economy’s “stress index” based on the combination of unemployment, foreclosures and bankruptcy for each county (note the 27% unemployment rate in the highlighted region):
Or this view of the spectacular degradation of the credit market in the US:
Or even this city map of New York: for each neighborhood, it displays the evolution of crime over the years:
For more infographic resources (galleries, books, links), go to Nicolas Rapp’s website, he is the art director for Associated Press.
This is just the beginning. A massive shift towards making publicly available as much data as possible is under way. Many countries are heading in that direction — at a different pace, though. For the most part, it depends upon who’s in office and their (real as opposed to campaigned with) priorities. The US are determined to push hard in that direction thanks to their techno-savvy president. Earlier this month, the new administration’s recently appointed Chief Information Officer launched Data.gov, which will be the main repository for all public databases. In a Wired interview, Vivek Kundra, Obama’s CIO (age: 34), explains how he started the project, which datasets are his priority, and how to convince reluctant branches of the government to actually release their stats. Healthcare, energy and education are obviously on the top of the agenda.
This is not an easy task. Just remember the heated criticism the Clinton administration faced when it decided to open the Global Positioning System which is operated by the US Air Force. At first, the Pentagon was quite upset. Now GPS devices and GPS-related data are part of everyone’s life.
As for now, hundreds of feeds are already available — and practical. No need to be a statistician to understand what a dataset is about. Most have several levels of complexity, depending on what you intend to do, a simple graph or a more complicated dissection of facts. Currently, subjects include serious stuff such as socio-economic data, but also more diverse topics such as Airline On-Time Performances and Causes of Flight Delays, all accessible in ways that allow multiple treatments. Weirder, you can access a complete catalogue (1000+ items) of every space Shuttle collision with minuscule orbital debris, with the size of the crater and the estimated velocity of the object. Sounds eccentric, until someone will have to produce a story with it.
Vivek Kundra expects everyone — and at the forefront, the newsmedia — to take advantage of this initiative for the public’s benefit: “The key is recognizing that we don’t have a monopoly on good ideas and that the federal government doesn’t have infinite resources. (…) Democratizing data enables comparative analysis of the services the government provides and the investments it makes, leading to a better government”, he said to Wired.
Will such initiative save journalism ? Certainly not. Will it empower it? Undoubtedly. As an example, consider how such editorial tools would have impacted major stories, ranging from the Los Angeles riots of 1992 to the Paris unrest of december 2005. All of a sudden, every structural imbalances afflicting a city or a region, involving social ghettos, racial divides, educational failures, poverty, health issues, public and private expenditures, could be exposed in a compelling, understandable way. Data treatment raises the objectivity of a story; instead of — or in addition to — a piece about the economic context of an event, based on interviews with their human limitations, a clever presentation of raw data can be a great tool to offer facts in a unbiased way.
There is one condition, though, to make good use of this ocean of raw data: training. I’m suggesting young journalists will greatly benefit from being technically trained. (Coincidentally, a young French-Portuguese journalist named David Castello-Lopes who gave me insights about data-driven journalism, spent a year in Berkeley learning the tech trade; when he came back this month, he found a job on a TV network almost right away).
What about monetization? Well, first of all, there are already many private entities who make a nice living processing public data. Why not the newsmedia? Take the education market: Why not having editorial products, designed by professional journalists, capitalizing on powerful label such as Le Monde, VG or The Guardian to address this audience with well designed products, in print or online? Think about students, how they could use this new knowledge with their laptops or iPhones. This market is up for grabs. And medias are well positioned to enter it. (Or someone else will.) –FF