The press, Google, its algorithm, their scale

 

In their fight against Google, traditional media firmly believe the search engine needs them to refine (and monetize) its algorithm. Let’s explore the facts.

The European press got itself in a bitter battle against Google. In a nutshell, legacy media want money from the search engine: first, for the snippets of news it grabs and feeds into its Google News service; second, on a broader basis, for all the referencing Google builds with news media material. In Germany, the Bundestag is working on a bill to force all news aggregators to pay their toll; in France, the executive is pushing for a negotiated solution before year-end. Italy is more or less following the same path. (For a detailed and balanced background, see this Eric Pfanner story in the International Herald Tribune.)

In the controversy, an argument keeps rearing its head. According to the proponents of a “Google Tax”, media contents greatly improve the contextualization of advertising. Therefore, the search engine giant ought to pay for such value. Financially speaking, without media articles Google would not perform as well it does, hence the European media hunt for a piece of the pie.

Last week, rooting for facts, I spoke with several people possessing deep knowledge of Google’s inner mechanics; they ranged from Search Engine Marketing specialists to a Stanford Computer Science professor who taught Larry Page and Sergey Brin back in the mid-90′s.

First of all, pretending to know Google is indeed… pretentious. In order to outwit both competitors and manipulators (a.k.a, Search Engine Optimization gurus), the search engine keeps tweaking its secret sauce. Just for the August-September period, Google made no less than 65 alterations to its algorithm (list here.) And that’s only for the known part of the changes; in fact, Google allocates large resources to counter people who try too game its algorithm with an endless stream of tricks.

Maintaining such a moving target also preserves Google’s lead: along with its distributed computing capabilities (called MapReduce), its proprietary data storage system BigTable, its immense infrastructure, Google’s PageRank algorithm is at the core of the search engine’s competitive edge. Allowing anyone to catch up, even a little, is strategically inconceivable.

Coming back to the Press issues, let’s consider both quantitative and qualitative approaches. In the Google universe — currently about 40 billion indexed pages –, contents coming from media amount to a small fraction. It is said to be a low single-digit percentage. To put things in perspective, on average, an online newspaper adds between 20,000 and 100,000 new URLs per year. Collectively, the scale roughly looks like millions of news articles versus a web growing by billions of pages each year.

Now, let’s consider the nature of searches. Using Google Trends for the last three months, the charts below ranks the most searched terms in the United States, France and Germany (click to enlarge):


Do the test yourself by going to the actual page: you’ll notice that, except for large dominant American news topics (“Hurricane Sandy” or “presidential debate”), very few search results bring back contents coming from mainstream media. As Google rewards freshness of contents — as well as sharp SEO tactics — “web native” media and specialized web sites perform much better than their elder “migrants”, that is web versions of traditional media.

What about monetization ?  How do media contents contribute to Google’s bottom line? Again let’s look at the independent rankings of the most expensive keywords, those that can bring $50 per click to Google — through its opaque pay-per-click bidding system. For instance, here is a recent Wordstream ranking (example keywords in parenthesis):

Insurance (“buy car insurance online” and “auto insurance price quotes”)
Loans (“consolidate graduate student loans” and “cheapest homeowner loans”)
Mortgage (“refinanced second mortgages” and “remortgage with bad credit”)
Attorney (“personal injury attorney” and “dui defense attorney”)
Credit (“home equity line of credit” and “bad credit home buyer”)
Lawyer (“personal  injury lawyer”, “criminal defense lawyer)
Donate (“car donation centers”, “donating a used car”)
Degree (“criminal justice degrees online”, “psychology bachelors degree online”)
Hosting (“hosting ms exchange”, “managed web hosting solution”)
Claim (“personal injury claim”, “accident claims no win no fee”)
Conference Call (“best conference call service”, “conference calls toll free”)
Trading (“cheap online trading”, “stock trades online”)
Software (“crm software programs”, “help desk software cheap”)
Recovery (“raid server data recovery”, “hard drive recovery laptop”)
Transfer (“zero apr balance transfer”, “credit card balance transfer zero interest”)
Gas/Electricity (“business electricity price comparison”, “switch gas and electricity suppliers”)
Classes (“criminal justice online classes”, “online classes business administration”)
Rehab (“alcohol rehab centers”, “crack rehab centers”)
Treatment (“mesothelioma treatment options”, “drug treatment centers”)
Cord Blood (“cordblood bank”, “store umbilical cord blood”)

(In my research, several Search Engine Marketing specialists came up with similar results.)

You see where I’m heading to. By construction, traditional media do not bring money to the classification above. In addition, as an insider said to me this week, no one is putting ads against keywords such as “war in Syria” or against the 3.2 billion results of a “Hurricane Sandy” query. Indeed, in the curve of ad words value, news slides to the long tail.

Then, why is Google so interested in news contents? Why has it has been maintaining  Google News for the past ten years, in so many languages, without making a dime from it (there are no ads on the service)?

The answer pertains to the notion of Google’s general internet “footprint”. Being number one in search is fine, but not sufficient. In its goal to own the semantic universe, taking over “territories” is critical. In that context, a “territory” could be a semantic environment that is seen as critical to everyone’s daily life, or one with high monetization potential.

Here are two recent examples of monetization potential as viewed by Google: Flights and Insurance. Having (easily) determined flight schedules were among the most sought after informations on the web, Google dipped into its deep cash reserve and, for $700m, acquired ITA software in July 2010. ITA was the world largest airline search company, powering sites such as Expedia or TripAdvisor. Unsurprisingly, the search giant launched Goolge Flight Search in Sept 2011.

In passing, Google showed its ability to kill any price comparator of its choosing. As for Insurance, the most expensive keyword, Google recently launched its own insurance comparison service in the United Kingdom… just after launching a similar system for credit cards and bank services.

Over the last ten years, Google has become the search tool of choice for Patents, and for scientific papers with Google Scholar. This came after shopping, books, Hotel Finder, etc.

Aside of this strategy of making Google the main — if only — entry point to the web, the search engine is working hard on its next transition: going from a search engine to a knowledge engine.

Early this year, Google created Knowledge Graph, a system that connects search terms to what is known as entities (names, places, events, things) — millions of them. This is Google’s next quantum leap. Again, you might think news related corpuses could constitute the most abundant trove of information to be fed into the Knowledge Graph. Unfortunately, this is not the case. At the core of the Knowledge Graph resides Metaweb, acquired by Google in July 2010. One of its key assets was a database of 12 million entities (now 23m) called Freebase. This database is fed by sources (listed here), ranging from the International Federation of Association Football (FIFA) to the Library of Congress, Eurostat or the India Times. (The only French source of the list is the movie database AlloCine.)

Out of about 230 sources, there are less than 10 medias outlets. Why? Again, volume and, perhaps even more important, ability to properly structure data. When the New York Times has about 14,000 topics, most newspapers only have hundreds of those, and a similar number of named entities in their database. (As a comparison, web native medias are much more skilled at indexation: the Huffington Post assigns between 12 and 20 keywords to each story.) By building upon acquisitions such as Metaweb’s Freebase, Google now has about half billion entries of all kinds.

Legacy media must deal with a harsh reality: despite their role in promoting and defending democracy, in lifting the veil on things that mean much for society, or in propagating new ideas, when it come to data, news media compete in the junior leagues. And for Google, the most data-driven company in the world, having newspapers articles in its search system is no more than small cool stuff.

frederic.filloux@mondaynote.com

Be Sociable, Share!

Related columns:

  1. Google — The case for buying Associated Press TweetWould it makes sense for Google to buy AP? Yes, says a contributor to Wired.Com. AP is a non-profit cooperative with 1500 members, many of them on the verge of extinction. (See the latest’s figures form the New York Times which is bleeding ad revenue at a yearly rate of 13%), or the terrible situation [...]...
  2. Google traffic : comply or ignore? TweetEach and every media gathering those days includes one subject: how to deal with the increasing traffic derivated from search engines, should our sites be “optimized”, just “compliant” or “aggressively attractive” to search? Of course, Google is at the epicenter of the debate since it commands a market share for search ranging from 60% to [...]...
  3. Google — Indexing the physical world TweetThe US Patent Office is a gold mine for journalists and industry watchers. From sketches on a possible new Macintosh (the first one being a small notebook sliding inside a vertical docking station) to hints on future Google plans, there is always something to eat. The website Search Engine Land has found three new patents [...]...
  4. Samsung vs. Google TweetAndroid is a huge success. Google bought Andy Rubin’s company in 2005 and turned it into a smartphone operating system giant, with more than 50% of the global market and 700,000 activations a day this past December. Perhaps, as Steve Jobs seemed to think, it was Eric Schmidt’s position on Apple’s Board of Directors that [...]...
  5. What If Google Stored All Our Medical Records? TweetRegard the horrified looks on the faces of the attendees at a California Council on Science and Technology meeting in Irvine six or seven years ago. I’m the only member from the Dark Side, from the venture capital milieu, inside an institution “designed to offer expert advice to the state government and to recommend solutions [...]...

15 Comments

  1. Posted November 5, 2012 at 8:28 am | Permalink

    Good points, all. But imagine what would happen if Bing would pay newspapers, and Google would not. Then if Bing created a page that looked like Google News, and made its use exclusive to those who would make Bing their default search engine, then I suspect that I, along with tens of millions of others, would switch.

  2. Posted November 5, 2012 at 11:51 am | Permalink

    Hi Fred,
    Two points are missing in your analysis :
    - Google absolutely need to have a very fresh content to improve second by second the level of its answers. And Google can only use three kind of realtime content : Twitter (Google paid to use it) GMail (which belongs to Google) and… press websites. Today Google crawles every en minutes press web sites, and it’s not just because Google have nothing else to do.
    - The top most expansive keywords, does not change a lot, but the average price of keywords increase year after year, and a large part of this increase is coming from the increase of the relevance of any adwords position, in part due to a better knowledge of real time demand…

  3. Lawrence Neumann
    Posted November 5, 2012 at 12:37 pm | Permalink

    If all it takes to have a lucrative search engine is a Knowledge base of 10 to 20 million feeds, then GOOG would be big trouble. That’s a helluva lot easier to copy than indexing the web.

  4. fjpoblam
    Posted November 5, 2012 at 4:54 pm | Permalink

    Yes, I agree with what you’ve said about Google’s practices. It’s been discussed among pro webmasters, and Google reps have even stated *explicitly* that they no longer consider, or wnt or plan to consider, their product as a “search engine”. However, there’s one crucial flaw in your argument. You say, “Aside of this strategy of making Google the main — if only — entry point to the web, the search engine is working hard on its next transition: going from a search engine to a knowledge engine.” The fact is, these days, all browsers (in my experience) allow users to select a different SE. *Please* educate and *encourage* your users to do so! (I have, for quite awhile.)

  5. Posted November 5, 2012 at 5:21 pm | Permalink

    I personally would LOVE for all my competitors to get greedy and demand a “Google Tax”. Guess where they would rank on the SERP vs. my page?

    Keep the internet free. Free of controls, regulations, and tax.

  6. James in LA
    Posted November 5, 2012 at 6:26 pm | Permalink

    Meshnets will make a mockery of “legacy money.” With any 4G device, you can create a cloud and invite other 4G devices. Carriers typically throttle this back to under 10 clients, but today’s phones can support many more. “Always on” devices, such as common appliances, become the new infrastructure, and ISPs become obsolete.

    Without ISPs, bills like Germany is contemplating for aggregators are dead in the water. They are stupid idea on the face of it.

    Meshnets are coming. They will use strong encryption and will be a rebirth of desperately needed freedom.

  7. Haluk Akin
    Posted November 5, 2012 at 8:24 pm | Permalink

    “News” link is placed at the top navigation bar on every google page. It’s accompanied by links such as “Youtube”,”Gmail”,”Documents”. All very important products to Google.

    Then there is this link called “More” where less important links are stuffed into. If “news” was just another cool stuff it would be in that “More” link.

    Despite the detailed analysis above, news appears to be pretty important to Google.

  8. Posted November 6, 2012 at 2:55 am | Permalink

    The price of the most expensive keywords is determined by the large margins on the initial payment and possibility of recurring payments for the products and services that are being advertised through SE, like insurance, loans, or umbilical cord blood banks, for that matter. It has little to do with their popularity online. A sales agent will pay $50 for a lead that can give him/her a couple of hundreds in commission for years to come (and no additional expenses). But…

    It is not the info on the current car insurance rates or online degrees that bring people to Google every day — it is the third party’s free content. That is why Google needs it — the more of it the better — to claim it has the necessary number of eyeballs to generate those $50 leads.

  9. Antoine
    Posted November 7, 2012 at 8:12 am | Permalink

    Thanks for this article Frederic. I think you are right that News primarily helps with user loyalty, not advertising inventory or ad targeting capability.

    The real question here is how does one encourage the production (and consumption) of quality news (assuming we should do so in the first place). I would love to brainstorm and maybe co-write something with you on this – Let me know if you are interested.

    Phil Hood, I think this is a good thought experiment and I would suspect the smart folks at Bing are regularly reviewing these types of options. If you push the thought experiment though, I would imagine your conclusion to be: it does not make economic sense!

  10. Markus Allen
    Posted November 7, 2012 at 2:47 pm | Permalink

    I took this one step further and compiled a list of the 250 most expensive keywords in Google here:

    http://www.fetch123.com/SEM/the-most-expensive-keywords-in-google

    (My list also includes month-over-month trending data – the results are fascinating.)

  11. Mathieu Belay
    Posted November 14, 2012 at 1:19 am | Permalink

    Really interesting point of view even I disagree with the conclusion
    The news and all content created by the legacy media might not add a lot of value in terms from an advertising perspective but it’s a lot more than a “small cool stuff”. Google needs these content to remain what is it now, the fastest way to find the most relevant information about any topics .

    Imagine that Google won’t index anymore sites from legacy media in retaliation for the tax. The impact could be huge :
    - Image : it will go against Google’s philosophy (organize the world’s information) and their motto Don’t be evil : “a company that does good things for the world even if we forgo some short term gains.”

    - Usage : a lot of users would no linger trust Google anymore if Google would be in position to filter the results from legacy media.
    And we can think that this specific category of users, the most interested in finding quality news, will influence all users as they are more likely opinion makers / trendsetters.
    Doing that, Google may create a vicious circle.

  12. Posted December 19, 2012 at 8:29 am | Permalink

    Hi there! I was interested to know if setting up
    a blogging site such your own: http://www.
    mondaynote.com/2012/11/04/the-press-google-its-algorithm-their-scale/
    is challenging to do for unskilled people? I have been wanting
    to develop my own website for a while now but have been turned
    off mainly because I’ve always believed it demanded tons of work. What do you think? Thanks

  13. Posted March 28, 2013 at 3:22 am | Permalink

    You will be able to pay it back, policymakers sought to reduce it by finding more revenue sources in
    paphos car hire itself. We took the bride and groom to be.

  14. Posted May 13, 2013 at 12:13 am | Permalink

    Thanks for this article Frederic. I think you are right that News primarily helps with user loyalty, not advertising inventory or ad targeting capability.
    The real question here is how does one encourage the production (and consumption) of quality news (assuming we should do so in the first place). I would love to brainstorm and maybe co-write something with you on this – Let me know if you are interested.

  15. Posted May 13, 2013 at 12:19 am | Permalink

    As I was contemplating the joys of wandering down the sidewalk in Osaka, my email alert beeped, and I saw a link to a “Monday Note” with this title: “The Press, Google, Its Algorithm, Their Scale.”

18 Trackbacks

  1. [...] director at Trinity Mirror, knows a thing or two about SEO. And he doesn’t have much time for this argument, from Monday Note‘s Frédéric Filloux, that Google’s search results aren’t much affected by the [...]

  2. [...] As I was contemplating the joys of wandering down the sidewalk in Osaka, my email alert beeped, and I saw a link to a “Monday Note” with this title: “The Press, Google, Its Algorithm, Their Scale.” [...]

  3. [...] The press, Google, its algorithm, their scale | Monday Note [...]

  4. [...] (MondayNote) [...]

  5. [...] on http://www.mondaynote.com Share this:Google +1TwitterFacebookTumblrPinterestLinkedInDiggEmailLike this:LikeBe the first to [...]

  6. [...] http://www.mondaynote.com/2012/11/04/the-press-google-its-algorithm-their-scale/ [...]

  7. [...] on http://www.mondaynote.com Share this:TwitterFacebookLike this:LikeBe the first to like [...]

  8. By Liens vagabonds (8 nov) | Metamedia on November 8, 2012 at 10:19 am

    [...] Les médias traditionnels ne font pas gagner d’argent à Google – Monday Note [...]

  9. [...] Les deux tiers du chiffre d’affaire de Google pro­viennent des AdWords. Ceux-ci sont géné­rés par des publi­ci­tés contex­tuelles pla­cées sur nos résul­tats de recherche. Or ces publi­ci­tés ont plus de chances d’apparaitre et d’avoir de la valeur lorsque les termes de la recherche cor­res­pondent à des marques, des ser­vices ou des pro­duits qu’à des événe­ments cou­verts par la presse. En d’autres termes, les news génèrent peu d’opportunités publi­ci­taires de type AdWords. C’est très bien expli­qué par Fré­dé­ric Filloux dans Mon­day Note. [...]

  10. [...] http://www.mondaynote.com/2012/11/04/the-press-google-its-algorithm-their-scale/ Share this:DiggRedditLike this:LikeBe the first to like this. This entry was posted in Uncategorized on November 9, 2012 by fozbaca. [...]

  11. By E’ il Web, Bellezza | Il Giornalaio on November 12, 2012 at 9:02 am

    [...] dagli utenti e le keywords, le parole chiave di maggior valore, vendute a maggior prezzo da Google. Analisi [qui tradotta in italiano] secondo la quale il valore apportato dagli editori sarebbe scarso o [...]

  12. By Suchmaschinengeschäftsmodell | Felsenbürger on November 18, 2012 at 10:12 pm

    [...] Abhängigkeit des Geschäftsmodells der Suchmaschine(n)? Nun, wenn man den von Frédéric Filloux recherchierten Zahlen glauben darf, dann liegt hier eine große Fehleinschätzung vor. Diese könnte neben anderem das [...]

  13. [...] privée par Google de revenus qui devraient légitimement lui revenir n’est pas une certitude – certains éditeurs eux-mêmes n’en sont pas convaincus. Mais ses attaques contre la firme de Moutain View se font d’autant plus pressantes qu’il [...]

  14. [...] and linking their contents and for collecting 20 words snippets (see a previous Monday Note: The press, Google, its algorithm, their scale.) For perspective, this €70m amount is roughly the equivalent to the 2012 digital revenue of [...]

  15. [...] Google to compensate €70m ($93m) per year for 5 years. This would be “compensation” for “abusively” indexing and joining their contents and for collecting 20 word snippets. For perspective, this €70m is roughly a homogeneous to a [...]

  16. By Google News: The Secret Sauce | Monday Note on February 24, 2013 at 9:00 pm

    [...] Ten years after its launch, Google News’ raw numbers are staggering: 50,000 sources scanned, 72 editions in 30 languages. Google’s crippled communication machine, plagued by bureaucracy and paranoia, has never been able to come up with tangible facts about its benefits for the news media it feeds on. It’s official blog merely mentions “6 billion visits per month” sent to news sites and Google News claims to connect “1 billion unique users a week to news content” (to put things in perspective, the NYT.com or the Huffington Post are cruising at about 40 million UVs per month). Assuming the clicks are sent to a relatively fresh news page bearing higher value advertising, the six billion visits can translate into about $400 million per year in ad revenue. (This is based on a $5 to $6 revenue per 1,000 pages, i.e. a few dollars in CPM per single ad, depending on format, type of selling, etc.) That’s a very rough estimate. Again: Google should settle the matter and come up with accurate figures for its largest markets. (On the same subject, see a previous Monday Note: The press, Google, its algorithm, their scale.) [...]

  17. By Tyrannosaurus Rex | Il Giornalaio on March 2, 2013 at 9:31 am

    [...] Peccato che di fatto la rilevanza in termini di ingressi pubblicitari per Google dalle notizie sia scarso o nullo come ha ben spiegato [qui tradotto in italiano] Frédéric Filloux che ha analizzato le ricerche effettuate dagli utenti e le keywords, le parole chiave di maggior valore, vendute a maggior prezzo da Google.  [...]

  18. [...] part de trafic web qui revient à celle-ci : le fait est que l’information journalistique est loin d’être la plus consultée sur Internet, et que la désindexation de la presse par Google ne changerait probablement pas grand [...]

Post a Comment

Your email is never shared. Required fields are marked *

*
*