For publishers: How much money is lost because of stolen contents? Of that, how much can be realistically reclaimed? Before getting into numbers, an overview.

In recent weeks, I’ve gained a first-hand media perspective on anti-piracy technology. The technology is Attributor’s, and the media is Agence France-Presse, one of the big three global newswires along with AP and Reuters. (Disclosure: I produced recently a 15,000 words report for AFP covering its strategy and its future. I’m no longer working for AFP but I keep close links with this news company).

Every day, AFP sends about 400 news items to Attributor – a fraction of its daily production. These items are then matched against a set of web sites, both subscribers and non subscribers of the newswire’s services. Using a simple interface, Attributor ranks the sites by their propensity to reuse contents. For regular clients, the system shows how stories are used, what percentage is utilized and if they are properly credited or even linked. For non-clients if offers a great way to track down stolen content and to make the distinction between minor abuses, honest mistakes and systematic infringement, since the data are viewed from statistical and time-related angles.

For obvious reasons, I can’t disclose the medias I’ve been reviewing in detail. Let’s simply say that results are stunning. AFP material is widespread. To make it short, there are three types of abuses of copyrighted material.

- The first one is insufficient attribution by a client. Typically, a journalist puts his or her byline on a story largely taken from a newswire. On most cases, the byline will be reduced to initials along with “avec agence” (with newswire) mention, like this one for instance where the borrowed text form AFP is automatically highlighted….

For this piece, we can safely say that “M.D.’s” input was minimal and it would have been nicer to simply put the mention “AFP” at the bottom of this cut & paste performance. (It that particular case, the newswire story is itself an explicit recycling of a scoop by Le Figaro, a typical illustration of the Internet’s endless content loop). From a legal perspective, there is no particular issue. It’s only an ethical matter.

Another case involves misuse of contents by bloggers. In most case, bloggers have no clues about the meaning and use of copyright. And big medias who host them don’t really help. Typically, a young an passionate blogger “covering” his beat will simply take in good faith an entire AFP (or AP or Reuters) story and paste it on his blog, this time with a proper attribution. Except that he has no right whatsoever to do so. I’ve seen one big French site, whose boss loathes the AFP, ending up with 60% of its content illegally “borrowed” from the AFP (confronted with facts, the site has made serious efforts to correct the situation). Hypocritically, many sites shield themselves behind the fine print of the term of services buried deep in their site reminding bloggers not to steal copyrighted content. Fact is, most of them, including big medias, do not properly educate their legally challenged blogging contributors.

The third case involves pure and deliberate looting. It could be anyone: foreign based sites, activist groups with small audiences, all sort of people who hope to avoid detection.

How important is this stolen content? “For newspapers and the three main wire agencies, the net present value of stolen content is about $250m on the American market alone”, says Jim Pitkow whom I met last week in Paris. To come up with such estimates, Attributor counts the advertising associated with the infringing content and multiplies it by the CPM (cost per thousand) and the audience of the site. Last December, Attributor released a study that showed for the first time the extent of the illegal reuse of news material.

The study covered a corpus of 100,000 articles from 157 American newspapers monitored for one month. Here are the key findings:

- 112,000 unlicensed, full copies of US newspapers articles were found on more than 75,000 sites across the internet. Full copy means more than 80% of illegal reproduction of the original article.

- If we extent the notion of copy to excerpts (i.e.: less than 80% of the original story but more than 125 words — roughly half a typewritten page), we add 163,000 more references.

- On average, an article is illegally reused 4.4 times, whole or in part; but for large national papers reuse can go up to 15 times per story!

- On the money side, not surprisingly, Google captures 53% of the value that is unrealized by publishers; next is Yahoo (19%); Microsoft (5%); scientific sites (5%); AOL (3%); the rest is atomized. Bloggers represents only 10%.

Books are not spared. Again according to another research project conducted by Attributor, online book piracy represents about 10% of total book sales in the US. The most stolen genres are business and investing books with an average of 13,000 downloads per title, followed by professional and science titles. On these categories, Attributor found out that each title was losing over $1m to online piracy!

Which leads us to the last question: which part of this face value can realistically be collected? Let put aside book publishing which shares the same characteristics as music: massive peer-to-peer availability and poorly protected files. The rise of ebooks should slow down the problem.

Attributor’s CEO Jim Pitkow admits that only a fraction of the potential loss of news content can be reclaimed. Most of it lies in the abundant newswire production. Pitkow says the face value loss for news agencies – again calculated for the surrounding ads – amounts to roughly 40% of the worldwide revenues of AFP, AP and Reuters Media combined.
The recovery operation goes like this. Phase one: detection & evaluation of the most obvious copyright violators (individual bloggers are left out as well as accidental violators, which are too widespread to collect). Phase two: an attorney letter is sent to the core abusers saying: cease right away or negotiate.

How much will it yield? Too early to tell (AFP for instance just started the enforcement phase). But it could be around 10% of the face value loss, translating into millions for each market.

This is just the beginning. Attributor is in close contacts with Google who, like in the rest of the internet, hold the keys of the vault. Aside of that, an important part of Attributor’s strategy will involve ad networks who aggregate unsold inventories and resell them at a bargain price to advertisers. Ad networks play an important role in the general deflation of the advertising on the web. One of them, Ad Brite, has joined the Fair Syndication Consortium set up by Attributor to get some money back. Several more could follow. Call it the redemption of the bottom feeders.


