copyright

Linking: Scraping vs. Copyright

 

Irish newspapers created quite a stir when they demanded a fee for incoming links to their content. Actually, this is a mere prelude to a much more crucial debate on copyrights,  robotic scraping and subsequent synthetic content re-creation from scraps. 

The controversy erupted on December 30th, when an attorney from the Irish law firm McGarr Solicitors exposed the case of one of its client, the Women’s Aid organization, being asked to pay a fee to Irish newspapers for each link they send to them. The main quote from McGarr’s post:

They wrote to Women’s Aid, (amongst others) who became our clients when they received letters, emails and phone calls asserting that they needed to buy a licence because they had linked to articles in newspapers carrying positive stories about their fundraising efforts.
These are the prices for linking they were supplied with:

1 – 5 €300.00
6 – 10 €500.00
11 – 15 €700.00
16 – 25 €950.00
26 – 50 €1,350.00
50 + Negotiable

They were quite clear in their demands. They told Women’s Aid “a licence is required to link directly to an online article even without uploading any of the content directly onto your own website.”

Recap: The Newspapers’ agent demanded an annual payment from a women’s domestic violence charity because they said they owned copyright in a link to the newspapers’ public website.

Needless to say, the twittersphere, the blogosphere and, by and large, every self-proclaimed cyber moral authority, reacted in anger to Irish newspapers’ demands that go against common sense as well as against the most basic business judgement.

But on closer examination, the Irish dead tree media (soon to be dead for good if they stay on that path) is just the tip of the iceberg for an industry facing issues that go well beyond its reluctance to the culture of web links.

Try googling the following French legalese: “A défaut d’autorisation, un tel lien pourra être considéré comme constitutif du délit de contrefaçon”. (It means any unauthorized incoming link to a site will be seen as a copyright infringement.) This search get dozens of responses. OK, most come from large consumers brands (carmakers, food industry, cosmetics) who don’t want a link attached to an unflattering term sending the reader to their product description… Imagine lemon linked to a car brand.

Until recently, you couldn’t find many media companies invoking such a no-link policy. Only large TV networks such as TF1 or M6 warn that any incoming link is subject to a written approval.

In reality, except for obvious libel, no-links policies are rarely enforced. M6 Television even lost a court case against a third party website that was deep-linking to its catch-up programs. As for the Irish newspapers, despite their dumb rate card for links, they claimed to be open to “arrangements” (in the ill-chosen case of a non-profit organization fighting violence against women, flexibility sounds like a good idea.)

Having said that, such posture reflects a key fact: Traditional media, newspapers or broadcast media, send contradictory messages when it comes to links that are simply not part of their original culture.

The position paper of the National Newspapers of Ireland association’s deserves a closer look (PDF here). It actually contains a set of concepts that resonate with the position defended by the European press in its current dispute with Google (see background story in the NYTimes); here are a few:

– It is the view of NNI that a link to copyright material does constitute infringement of copyright, and would be so found by the Courts.
– [NNI then refers to a decision of the UK court of Appeal in a case involving Meltwater Holding BV, a company specialized in media monitoring], that upheld the findings of the High Court which findings included:
- that headlines are capable of being independent literary works and so copying just a headline can infringe copyright
- that text extracts (headline plus opening sentence plus “hit” sentence) can be substantial enough to benefit from copyright protection
- that an end user client who receives a paid for monitoring report of search results (incorporating a headline, text extract and/or link, is very likely to infringe copyright unless they have a licence from the
Newspaper Licencing Agency or directly from a publisher.
– NNI proposes that, in fact, any amendment to the existing copyright legislation with regard to deep-linking should specifically provide that deep-linking to content protected by copyright without respect for  the linked website’s terms and conditions of use and without regard for the publisher’s legitimate commercial interest in protecting its own copyright is unlawful.

Let’s face it, most publishers I know would not disagree with the basis of such statements. In the many jurisdictions where a journalist’s most mundane work is protected by copyright laws, what can be seen as acceptable in terms of linking policy?

The answer seems to revolve around matters of purpose and volume.

To put it another way, if a link serves as a kind of helper or reference, publishers will likely tolerate it. (In due fairness, NNI explicitly “accepts that linking for personal use is a part of how individuals communicate online and has no issue with that” — even if the notion of “personal use” is pretty vague.) Now, if the purpose is commercial and if linking is aimed at generating traffic, NNI raises the red flag (even though legal grounds are rather brittle.) Hence the particular Google case that also carries a notion of volume as the search engine claims to harvest thousands of sources for its Google News service.

There is a catch. The case raised by NNI and its putative followers is weakened by a major contradiction: everywhere, Ireland included, news websites invest a great deal of resources in order to achieve the highest possible rank in Google News. Unless specific laws are voted (German lawmakers are working on such a bill), attorneys will have hard time invoking copyright infringements that in fact stem for the very Search Engine Optimization tactics publishers encourage.

But there might be more at stake. For news organizations, the future carries obvious threats that require urgent consideration: In coming years, we’ll see great progress — so to speak — in automated content production systems. With or without link permissions, algorithmic content generators will be able (in fact: are) to scrap sites’original articles, aggregate and reprocess those into seemingly original content, without any mention, quotation, links, or reference of any kind. What awaits the news industry is much more complex than dealing with links from an aggregator.

It boils down to this: The legal debate on linking as copyright infringement will soon be obsolete. The real question will emerge as a much more complex one: Should a news site protect itself from being “read”  by a robot? The consequences for doing so are stark: except for a small cohort of loyal readers, the site would purely and simply vanish from cyberspace… Conversely, by staying open to searches, the site exposes itself to forms of automated and stealthy depletion that will be virtually impossible to combat. Is the situation binary — allowing “bots” or not — or is there middle ground? That’s a fascinating playground for lawyers and techies, for parsers of words and bits.

frederic.filloux@mondaynote.com

Gunning for the Copyright Reformers

Going after copyright reformers is risky business. To digital zealots, defending copyright is like advocating the return to the typewriter. (I personally like typewriters; I own several and I recommend a wonderful 1997 Atlantic piece on them at Longform.org). Going after sworn copyright opponents is what Robert Levine does in his just-published  book Free Ride — How the Internet is Destroying the Culture Business and How the Culture Business can Fight Back.

The pitch: Digital corporations are conspiring to promote the free ideology that has been plaguing the internet over the last decade. With their immense financial firepower, the Googles and the Apples and the Silicon Valley venture capital firms that funded Napster did whatever it took to undermine the concept of copyright. From lobbying the United States Congress to funding free-culture advocates, they created a groundswell for rip-and-burn products that would sell their MP3 devices. They got lawmakers and pundits to pave the way for a general ransacking of intellectual property — from music to journalistic content. Once Levine makes his point, he explores possible solutions to restore value to creativity (We’ll address these in a future column).

Needless to say, Robert Levine has produced a non-politically correct opus. And that’s what makes his book fascinating.

To start, the author reframes the famous quote, “Information wants to be free.” Free Ride recalls the complete sentence as far more nuanced. This is actually what tech writer Stewart Brand said at an 1984 a hacker conference:

“One one hand information wants to be expensive because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.”

Few quotes in recent history have been more twisted and misinterpreted than this one. Everyone jumped on Stewart Brand’s distinction between collecting information and making it available to the audience. While the cost of the former remains high — at least for those producing original information, or content — the marginal cost of broadcasting it fell dramatically, and that is what sparked the idea of a zero-cost culture. Yet, “media products have never been priced according to their marginal cost,” Levine says, and therefore, free is an idea that’s hard to defend.

As described in Free Ride, US lawmakers played a critical role in opening the floodgates of piracy and copyright violation on the internet. On October 28, 1998, Bill Clinton signed the Digital Millennium Copyright Act. That law, says Levine, gave a “safe harbor” to internet service providers and some online companies. No longer liable for copyright infringement based on the actions of users,  Levine writes that the “safe harbor made it easier for sites like YouTube to become valuable forums for amateur creativity. But it also let them build big businesses out of professional content they didn’t pay for.” That, he says, is how Congress created YouTube. (Google purchased it in 2006 for $1.65 billon).

The book’s most spectacular deconstruction involves Lawrence Lessig. The Harvard law professor is one of the most outspoken opponents of tough copyright. For years, he’s been criss-crossing the world delivering well-crafted, compelling presentations about the need to overhaul copyright. When, in 2007, Viacom sued YouTube for copyright infringement, seeking more than a billion dollars in damages, Lessig accused Viacom of trying to overturn the Digital Millennium Copyright Act. It was a de facto defense of Google by Lessig who at the time was head of the Center for Internet and Society at Stanford University. What Lessig failed to disclose is that two weeks after closing the deal to acquire YouTube, Google made a $2-million donation to the Stanford Center, and a year later gave another $1.5 million to Creative Commons, Lessig’s most famous intellectual baby. To be fair, Levine told me he didn’t believe Lessig’s positions on copyright were influenced by the grants from Google. Moreover, Google set aside $100 million to fight the Viacom lawsuit. Numerous examples throughout Free Ride show how technology companies are committed to influence public policy. Ironically, Lawrence Lessig’s newest crusade at Harvard is about corruption in Washington.

Robert Levine’s book could be disputed on a few items.

- One, he’s too kind to the music industry. (His view may have been influenced by his tenure as executive editor of Billboard magazine where he witnessed first-hand the self-inflicted deterioration of the music industry.) The music business missed all the trains: (a) it defended the physical model up to the last minute even as its annihilation seemed unavoidable; (b) it extended as long as it could the double screwing of consumers and artists alike (sadly, poor analog artists have been replaced by poor digital ones).

- Two, he tends to forget the general complacency of content creators toward all forms of digital looting. I’ve often described in the Monday Note how publishers – blinded by the short-term appeal of the eyeball count – became consenting victims of all sorts of aggregators (see my Lenin’s Rope series).

- Three, the advent of free content has in fact unleashed talent. Unknown authors have been able to rise from obscurity thanks to direct access to the audience. And some have found alternative ways to make money (more on this in another future column).

Lastly, the unfolding of technology made the relaxing of copyright unavoidable. The Digital Millennium Copyright Act may have accelerated the transition but it didn’t cause the upheaval. Today, BitTorrent file transfer for music and movies accounts for about 10-12% of the internet bandwidth consumption, and YouTube accounts for 11%. Pirated content represents almost 100% of the former and about a third of the latter. Huge numbers, indeed, and huge losses for the music and movie industries. But Netflix with its legitimate content now accounts for 30% of the entire internet traffic (Hulu has less than 2%) and iTunes is growing faster than ever. And some economists do consider that giving up a large quantity of content for free is the price that must be paid to preserve a marketable share.

The music industry paid a terrible price during the digital transition, with a drop of 50% of its sales in one decade. But it would be unfair to make lenient lawmakers and internet pirates the main culprits. Unbundling played a critical role as well, just as in the newspaper industry. Being able to buy a single song on iTunes (instead of an album), or hoping that a single article on a web page will generate enough viewers to pay for itself (instead or purchasing an entire bundled newspaper) caused a great deal of damage.

As plagued as it is by piracy, the movie industry is immune to the notion of unbundling, which partly explains why box office revenue between 2006 and 2010 rose by 30% outside the United States and by 15% in the US/Canada market. Although the number of moviegoers is slipping, the industry has been able to find its way into the digital world.

Robert Levine’s book is a must-read that reframes the debate on the evolution of copyright. In an unusual way, it encompasses a European view on the issue (Levine lives part-time in Berlin). That makes the book even more interesting as countries explore ways for content creators to finance their work while not killing the formidable creative freedom unleashed by the digital world.

frederic.filloux@mondaynote.com

Free Ride, By Robert Levine is published by Bodley Head in the UK (available now on Amazon UK)and by Doubleday in the US (available oct 25 on Amazon US) and is also available the iTunes iBook Store.

Fighting Unlicensed Content With Algorithms

It’s high time to fight the theft of news-related contents, really. A couple of weeks ago, Attributor, a US company, released the conclusions of a five-month study covering the use of unauthorized contents on the internet. The project was called Graduated Response Trial for News and relied on one strong core idea: once a significant breach is established, instead of an all-out legal offensive, a “friendly email”, in Attributor’s parlance, kindly asks the perpetrator to remove the illegal content. Without a response within 14 days, a second email arrives. As a second step, Attributor warns it will contact search engines and advertising networks. The first will be asked to suppress links and indexation for the offending pages; the second will be requested to remove ads, thus killing the monetization of illegal content. After another 14 days, the misbehaving site receives a “cease and desist” notice and faces full-blown legal action (see details on the Fair Syndication Consortium Blog). Attributor and the FSC pride themselves with achieving a 75% compliance rate from negligent web sites taking action after step 2. In other words, once kindly warned, looters change their mind and behave nicely. Cool.

To put numbers on this, the Graduated Response Trial for News spotted 400,000 unlicensed cloned items on 45,000 sites. That is a stunning 900 illegal uses per site. As reported in a February 2010 Monday Note (see Cashing in on stolen contents), a previous analysis conducted by Attributor pointed to 112,000 unlicensed copies of US newspapers articles found on 75,000 sites; this is a rate of of 1.5 stolen articles per site. Granted, we can’t jump to the conclusion of a 900x increase; the two studies were not designed to be comparable, the tracking power of Attributor is growing fast, the perimeter was different, etc. Still. When, last Friday, I asked Attributor’s CEO Jim Pitkow how he felt about those numbers, he acknowledged that the use of stolen content on the internet is indeed on the rise.

No doubt: the technology and the deals organized by Attributor with content providers and search engines are steps in the right direction. But let’s face it: so far, this is a drop the ocean.
First, the nice “Graduated Response” tested by the San Mateo company and its partners needs time to produce its effects. A duo of 14 day-notices before rolling out the legal howitzer doesn’t make much sense considering the news cycle’s duration: the value of a news item decays by 80% in about 48 hours. The 14-days spacing of the two warning shots isn’t exactly a deterrent for those who do business stealing content.
Second, the tactics described above rely too much on manual operations: assessing the scope of the infringement, determining the response, notifying, monitoring, re-notifying, etc. A bit counter, to say the least, to the nature of the internet with its 23 billion pages.

You get my point. The problem requires a much more decisive and scalable response involving all the players: content providers, aggregators, search engines, advertising networks and sales houses. Here is a possible outline:

1/ Attributor needs to be acquired. The company is simply too small for the scope of the work. A few days of Google’s revenue ($68m per 24 hrs) or less than a month for Bing would do the job. Even smarter, a group of American newspapers and book publishers gathered in an ad hoc consortium could be a perfect fit.

2 / Let’s say Google or Bing buy Attributor’s core engineering know-how. It then becomes feasible to adapt and expand its crawling algorithm so it runs against the entire world wide web — in real time. Two hours after a piece of news is “borrowed” from a publisher, it is flagged, the site receives an pointed notification. This could be email, or an automatically generated comment below the article, re-posted every few hours. Or, even better, a well-placed sponsored link like the fictitious one below:

Inevitably, ads dry up. First, ad networks affiliated to the system stop serving display ads. And second, since the search engine severed hyperlinks, ads on orphan pages become irrelevant. Every step is automated. More