How do mainstream media and blogs interact? How do they feed each other ? Everyone in the newsmedia would love to get a better view of the mating dance. A few weeks ago, scientists at the Cornell University unveiled a thorough analysis of the relationship between the two universes. Borrowing from genomics techniques, they dug into a huge corpus of politically-related sentences and tracked their bounces between mainstream media (MSM) and the blogosphere.
- About 90 million documents (blog posts and news sites articles) collected between August 1 and October 31, 2008, i.e. at the height of the last US Presidential race.
- 1.65 million blogs scanned.
- 20,000 media sites reviewed, marked as mainstream because they are part of GoogleNews.
- From this dataset, researchers extracted 112 million quotes leading to 47 million phrases, out of which 22 million were deemed “distinct”. These phrases were important enough to be considered as news.
- The phrases where political statements or sound bytes pertaining to the political race and uttered by the two candidates, their running mates or their staff.
- Processing these 390GB of data took about nine hours of computer time (using a complex set of algorithms, involving “markers”, as in genetics).
The findings, in a nutshell:
- Mainstream media lead the news cycle. They are the first to report a quote, the story behind it, the context, etc.
- The 20,000 MSM sites generate 30% of the documents in the entire dataset and 44% of the documents that contained frequent phrases.
- It takes about 2.5 hours for a phrase to reverberates through the blogosphere.
- The phrases that propagate in the opposite way (from blogs to MSM) amounts to a mere 3.5%.
- A news piece decays faster on the MSM than on the blogosphere.
The comparative curve looks like this :
For those who want the complete analysis, the full report is available here.
As expected, this research triggered controversy. More