About Frédéric Filloux

http://

Posts by Frédéric Filloux:

Video will be the online advertising engine

Last week, Akamai quietly rerouted loads of its client’s traffic to deflect Wikileaks related attacks. The company, based in Cambridge (Massachusetts), had a surfeit of busy days fighting massive DDoS (Distributed Denial of Service) attacks. These raids were directed at companies seen as too complacent with the US government (the so-called “Wikichickens”, as coined by the financial site Breaking Views). Akamai’s countermeasures involved quickly moving data from one server to another or, when the origin of a DDoS was detected, rerouting the flood of aggressive requests to decoy URLs.

Akamai Technologies Inc. is specialized in providing distributed computing platforms called CDN (Content Distribution Networks). Its business is mainly to reduce internet latency and to offload its customers’ servers. As its president David Kenny told me last week, Akamai runs on three main business drivers: Cloud Computing, e-commerce, and video delivery (with the associated advertising).
The first driver is very straightforward: as applications move away from the desktop, users need to feel they get about the same response time from the cloud as they do from their hard drive. The same is true for infrastructure-as-a service. All is built around the idea of elasticity: servers, storage capacity and networks dynamically adjusting to demand.
The second component of Akamai’s business stems from the need for e-commerce sites’ availability. On Thanksgiving, Akamai said it saved about $50m in sales for its e-commerce clients who came under a series of cyber attacks. On a routine basis, the technology company stores thousands of videos and other bandwidth intensive items on its servers.
The third pillar is the biggest, and the more challenging, not just for Akamai but for the commercial internet as a whole: the growth of video, and of its monetization, will become more bandwidth hungry as advertising migrates from contextual to behavioral.

A couple of weeks ago, David Kenny was in Paris at a gathering hosted by Weborama, the European specialist of behavioral targeting (described on a previous Monday Note How the Web talks to us). He presented stunning projections for the growth of internet video.
Here are the key numbers :
- Global IP traffic will quadruple between 2009 to 2014 as the number of internet users will grow from 1.7 billion today to 4 billion in 2020.
In 2014, the Internet will be four times larger than it was in 2009. By year-end 2014, the equivalent of 12 billion DVDs will cross the Internet each month.
- It would take over two years to watch the amount of video that will cross global IP networks every second in 2014.

Traffic evolution goes like this :

Let’s pause for a moment and look at the technical side. Akamai relies on a distributed infrastructure as opposed to a centralized one. It operates 77,000 servers, which is comparatively small to Google’s infrastructure (between 1m and 1.5m servers on 30 data centers). The difference is that Akamai’s strategy is to get as close as possible to the user thanks to agreements with local Internet Service Providers. There are 12,000 ISPs in the world, and Akamai says it has deals with the top 1,000. This results in multiple storage and caching capabilities in more the 700 biggest cities in the world.

This works for a page of the New York Times or for a popular iPhone application (Apple, like Facebook are big Akamai clients). In Paris, Cairo or Manilla, the first customer who requests an item gets it from the company — whether it is from NY Times or Apple’s servers — and also causes the page or the app to be “cached” by the ISP. This ISP could rely on storage leased from a university or a third party hosting facility. From there, the next user gets its content in a blink without triggering a much slower transcontinental request. That’s how distributed infrastructure works. Of course, companies such as Akamai have developed powerful algorithms to determine which pages, services, applications or video streams are the most likely to be much in demand at a given moment, and to adjust storage and network capacities accordingly.

Now, let’s look at the money side. What does advertising have to do with bandwidth issues? The answer is: behavioral vs. contextualization. Ads will shift from a delivery based on context (I’m watching a home improvement video, I’m getting Ikea ads), to targeted ads (regardless of what I’m watching, I’ve been spotted as a potential motorcycle buyer and I’m getting Harley Davidson ads). Such ads could be in the usual pre-roll format (15 sec before the start of the video) or inserted into the video or the stream, like in this example provided by Akamai.

As online advertising spending doubles over the next ten years, video is likely to capture a large chunk of it. It will require a increasing amount of technology, both to refine the behavioral / targeting component, and to deliver it in real-time to each individually targeted customer. This is quite a challenge for news media company. On one hand, they are well-placed to produce high value contents, on the other, they will have to learn how to pick up the right partner to address the new monetization complexities.

frederic.filloux@mondaynote.com

Measuring the Nomads

The more diverse and ubiquitous the internet gets, the harder it becomes to measure. Especially with the mobile version’s rapid growth. A few weeks ago, my friends from the International Newsmedia Marketing Association (INMA) asked for a presentation discussing audience measurements for smartphones and tablets. The target was a conference held last Friday in Boston. Since I didn’t have a clue, I assumed I could work on the presentation in a journalistic way, by reaching out to people in the trade and by doing my own research. Only to realize the mobile internet is well ahead of any of today’s usage measurement tools.

Audience measurement is much more complex on mobile devices than it is on PCs. The world of personal computers is relatively simple. PCs surf through a well-documented set of browsers: Internet Explorer, Firefox, Chrome and Safari (see their respective market shares here). The connection happens either through an ISP wire or via wifi. On the server side, each request is compiled into a log for further analysis.

In the mobile world, there are many more variations. The first dimension is the diversity of devices and operating systems. The real mobile ecosystem extends well beyond the pristine simplicity of the Apple world with its two main devices — the iPhone and the iPad, only one screen size for each — powered by iOS.

Android, the ultra fast growing mobile OS made by Google, is found on 95 170 (!) different devices. Each comes with its (almost) unique combination of screen size and hardware/software features; “small” differences translate into a nightmare for applications developers. There is more: the mobile ecosystem also comprises platforms such as Windows Phone devices, the well-controlled Blackberry, Palm’s WebOS (now in HP’s hands), Samsung’s Bada and the multiple flavors of Symbian, to be followed by Meego. Each platform sprouts many devices and browsing variants.

Then, we have applications. Apps are fantastic at taking advantage of the senses of smartphones and tablets. An app can see (though the device’s camera), hear (with the microphone), understand language, talk back; it can search — the Yellow Pages for a location or the web for an explanation; it can feel motion thanks to the smartphone motion detectors and gyroscopes ; it can navigate through GPS or cell tower/wifi triangulation; and of course, it can connect to a world of other devices. This results in an unprecedented canvas for the creativity of app developers. According to recents studies, apps account for about half of the internet connections coming from smartphones. It is therefore critical to analyze such traffic. But, to say the least, we are not there yet.

One example of the measurement challenge: a news related application. The first measure of an app’s success is its downloads count. In theory, pretty simple. Each time an app is downloaded, the store (Apple’s or any other) records the transaction. Then, things gets fuzzier as the application lives on and gets regular updates. Sometimes, updates are upgrades, with new features. At which point should the app be considered new — especially when it’s free, like most of the news-related ones? Second difficulty: a growing number of apps will be preloaded into smartphones and tablets. Rightly or wrongly, Apple nixes such meddling with its devices. But, outside of the iOS world, cellphone carriers do strike deals with content providers and preload apps on Android devices. That’s another hard to get number.

We might believe the app’s activation provides a measurable event that settles the issue. It doesn’t. Let’s continue with the news app example. When launched from a smartphone or a tablet, the app sends a burst of “http” requests to the web server. How many? It depends on the app’s design and default settings. There could be 20, 30, or more streams loading in the background. The purpose is instant gratification: when the user requests the most likely item, such as “hottest news”, the content shows instantly for having been preloaded. This results in several uncertainties in the counting process.
From the server standpoint, the pages have been served. But how many of those have been actually read and for how long? What if I tweak my app’s setting, selecting some items and removing others? In an ideal world, a tracking task running inside the application would provide the accurate, up-to-date information. Each time the app runs, the tracker records every finger stroke (or swipe) and, whenever possible, feeds everything back to the publisher.  But the OS gamut and other technical permutations makes this difficult. As for Apple, tracker code inside its apps is a no-no (although there are signs of an upcoming flexibility in that matter).

Even a well-implemented tracker module isn’t the perfect solution, though. For example, it doesn’t solve the issue of apps running in the background and downloading streams of data, unbeknownst to the user. Such requests are recorded as page views by the server, but the content is not necessarily seen by the user.

The French company Mediamétrie Net Ratings (in partnership with Nielsen), came up with a solution that might pave the way to useable hybrid measurements. Nielsen Net Ratings, NNR, is known for its technology built around panelsof users (details here) who agree to have trackers running on their PCs. To improve mobile measurement, NNR recently teamed up with the three French cellular carriers and built a new massive log analysis system. The structure looks like this:

The (simplified) sequence follows:

1 / Cell carriers. They compile millions of logs, i.e. requests coming from their 3G/Edge networks to websites (no distinction between a request coming from an Android web browser or an iPhone application). Basically the log ticket says: P. Smith, number ###, sent this http://www… request on Dec 3rd at 22:34:55.

2/ The third party aggregator. Its main job is to anonymize data thanks to an encryption key it gets form the carriers. France’s privacy authority is very serious about data protection. Neither the cell carrier, nor the measurement company can have a full view on what people do on the internet.

3/ The audience analysis company. Here, Nielsen Net Rating France. In our example, along with cell numbers for its 10,000 others panelists, NNR sends the third party aggregator John Doe’s number.

4/ The aggregator encrypts the John Doe’s number in a “fdsg4…” sequence and sends it back to NNR.

5/ The carriers then send huge log files to NNR.

6/ NNR’s job is to retrieve its encrypted panelist from within the logs haystack. When it does spot the “fdsg4…” sequence, it can tell that John Doe, whom NNR knows everything about, has gone to xyz websites via its cell phone at such and such dates, times and, perhaps, locations.The rest of the log remains encrypted, therefore useless.

This system has only been in operation for a few months. And it is not perfect either. For instance, it tracks only requests going through cell phone networks; it ignores web requests sent through wifi — that account for 30% of total usage! The new system also ignores Blackberry users using RIM’s proprietary network. And the NNR algorithms need help from a huge database of URLs provided by the sites publishers. These URLs will be used to differentiate web browser requests from the ones generated by an app; we are talking of millions of URLs here, growing by the thousands every single day. A daunting task. In addition to this complication, large amounts of data still reside in the publishers servers. Hence a certification issue, as for all site centric measurements.

So much work ahead. The future lies in a deeper merger of site centric (log analysis) and user centric (panel) techniques. And also in a wider deployment of HTML 5 apps. We’ll explore the new web Lingua Franca’s potential in an upcoming Monday Note.

frederic.filloux@mondaynote.com

Key Success Factors for a tablet-only “paper”

Can it fly? Last week, Rupert Murdoch announced he was plotting a tablet-only newspaper. Or rather, an iPad-only paper — at first; other tablets would follow. The Daily, as it is to be called (how modest and innovative) is to be blessed by Steve Jobs Himself at a media event introducing the new venture. Initially, rumors pointed to a December 9th date; the latest gossip now says the unveiling could be delayed over “issues”. In any case, this is big news: a major media group, crossing the Rubicon to get rid of both paper and web, riding the Apple promotional machine (details and speculations in this story from The Guardian).

Well before the iPad was introduced last Spring, many of us had dreamed of a news product encapsulated inside a self-sustaining iPhone application. The advent of the iPad, with its gorgeous screen, only made the dream more vivid. Then, reality interfered. Even with the combined installed bases of the iPhone and the iPad’s, numbers didn’t add up, the dream news product wouldn’t make real money. Could it work this time under Rupert Murdoch’s rule?

Let’s return to Earth and tally the project’s pluses and minuses.

On the plus side

1 /  Let’s make quick work of the staffing issue. Media pundits contend you can’t run a serious daily with a staff of hundred as envisioned by Murdoch. Of course, you can have a roaring newsroom with 100 people! As long as such staff is focused on the paper’s core journalistic beats; in an ideal world, a newsroom should be staffed by a relatively small number of dedicated, well-paid, hard-working reporters and editors, managed by a flat hierarchy. This compact crew only needs to be supplemented by a carefully outsourced network of specialized people whose expertise, while highly valued, isn’t used often enough to justify full time employment. Exactly the opposite of our dying print dinosaurs.

2 / The tablet immersive experience. Like no other device before, the iPad has the ability to capture the reader’s attention: iPad “sessions” last much longer than browsing expeditions on the internet. According to TigerSpike, the very design company that built apps for News Corp, the average iPad session lasts 30 to 40 minutes (see story in PaidContent).

3/ The market. Rupert Murdoch is convinced that, soon, an iPad, or a competing tablet, will find its way in almost every household. And he is said to have been impressed by projections of 40 million iPads in circulation by the end of 2011. Spreadsheet magic! Millions of customers… On the revenue side, numbers can work. A 100 persons newsroom should cost no more than $12-15m a year to operate. Assuming $99/year pricing, netting $66 per user after Apple’s fee, plus $10 per user per year of premium advertising (after all, it is a qualified audience), the ARPU can land at around $80, which translate into 150,000 subscribers required to break-even. Sounds appealing.

On the minus side

1 / Closed environment, no links. That is the side effect of the “cognitive container”: an application such as the Wall Street Journal, the Guardian or the Economist, is by definition autistic to the rest of the web. No links to the outside world (except if it has an embedded browser like Dow Jones’ All Things D), and no relation to the social/sharing whirlwind. Some will appreciate the coziness of a newspaper without parasitic external stimuli, other won’t accept to be cut-off from the social Babel. It could be a matter of generations.

2 / The Apple business model sucks (for media). At first, Apple’s 30% cut of the retail price sounds great compared to the physical world where production and distribution costs devour 40% to 50%. Not so simple. First, you need at least five times more readers in the to offset the advertising revenue depletion associated with the move to the digital world.
Second, the tax issue. In many countries, in spite of intense lobbying by media  companies, digital products carry standard VAT. In France, where the VAT is set at 19.6%, internal analysis made by publishers showed that a high volume daily will net less in the AppStore than in a physical kiosk.
Third, Apple’s terms of use. They deprive publishers of two things : first, the ability to set prices outside of Apple-dictated levels (usually too high or too low) and, second, access to customer data, which make any CRM monetization impossible. The latter is, in itself a major deterrent to dealing with Apple. Of course, if Steve endorses Rupert’s project, the conditions could be quite different.

Mandatory

1 /  Exclusive and proprietary content. If Murdoch’s paper — or any tablet-only publication for that matter — is unable to produce truly original content, it is doomed. The internet is flooded by reverberating newsflows of all kinds, and free. Value will inevitably follow uniqueness.

2 / Pricing: simple and adjustable. No one knows what readers will ultimately: the iTunes model (multiple 99 cents transactions) or the cable-TV or Netflix flat-but-fat fees? To find out, the only way is to offer multiple pricing options. Problem is: it goes against simplicity and readability.

3 / Beyond Apple and perhaps beyond the app. For all of its advantages, betting only on the AppStore could be risky. The market will be overflowed by other vendors and operating systems. Hedging one’s bets will be key.
Maybe it would be worthwhile to look beyond the application concept. Instead of an autistic app, why not build adaptative web sites that will adjust automagically to the device used (tanks to the user agent technique)? As screen sizes differ from an iPad, for a Samsung Galaxy Tab, or for the  upcoming Blackberry Playbook (see this video), the tablet-dedicated site could adjust and optimize its rendering. In doing so, the service would remain part of the web, connected to its social features; it could operate on a much better business model than Apple’s, and there would be no hassle with the app store application process, upgrades, inexplicable rejections, etc.

4 / Speedy and simple. On both my iPhone and my iPad, the applications I no longer use happen to be the most complicated and the slowest. One such example is the New York Times app: it needs more time to load than it takes to flip trough several pages of the paper’s web site. On the contrary, the just released Economist applications are great. Two buttons on the main page : Download (10 seconds for the entire magazine) and Read. That’s all. And if I want to change the font size, it is intuitive: I pinch in or out, and the whole layout resizes. Interestingly enough, The Economist gives its subscribers the choice between a great website experience and the magazine look and feed of its sleek application (I’m curious to see which one will prevail, audience-wise). The beauty of this app resides in what that has been removed from it.

Meditate on this: this is at the very core of Apple’ design genius.

frederic.filloux@mondaynote.com

Fighting Unlicensed Content With Algorithms

It’s high time to fight the theft of news-related contents, really. A couple of weeks ago, Attributor, a US company, released the conclusions of a five-month study covering the use of unauthorized contents on the internet. The project was called Graduated Response Trial for News and relied on one strong core idea: once a significant breach is established, instead of an all-out legal offensive, a “friendly email”, in Attributor’s parlance, kindly asks the perpetrator to remove the illegal content. Without a response within 14 days, a second email arrives. As a second step, Attributor warns it will contact search engines and advertising networks. The first will be asked to suppress links and indexation for the offending pages; the second will be requested to remove ads, thus killing the monetization of illegal content. After another 14 days, the misbehaving site receives a “cease and desist” notice and faces full-blown legal action (see details on the Fair Syndication Consortium Blog). Attributor and the FSC pride themselves with achieving a 75% compliance rate from negligent web sites taking action after step 2. In other words, once kindly warned, looters change their mind and behave nicely. Cool.

To put numbers on this, the Graduated Response Trial for News spotted 400,000 unlicensed cloned items on 45,000 sites. That is a stunning 900 illegal uses per site. As reported in a February 2010 Monday Note (see Cashing in on stolen contents), a previous analysis conducted by Attributor pointed to 112,000 unlicensed copies of US newspapers articles found on 75,000 sites; this is a rate of of 1.5 stolen articles per site. Granted, we can’t jump to the conclusion of a 900x increase; the two studies were not designed to be comparable, the tracking power of Attributor is growing fast, the perimeter was different, etc. Still. When, last Friday, I asked Attributor’s CEO Jim Pitkow how he felt about those numbers, he acknowledged that the use of stolen content on the internet is indeed on the rise.

No doubt: the technology and the deals organized by Attributor with content providers and search engines are steps in the right direction. But let’s face it: so far, this is a drop the ocean.
First, the nice “Graduated Response” tested by the San Mateo company and its partners needs time to produce its effects. A duo of 14 day-notices before rolling out the legal howitzer doesn’t make much sense considering the news cycle’s duration: the value of a news item decays by 80% in about 48 hours. The 14-days spacing of the two warning shots isn’t exactly a deterrent for those who do business stealing content.
Second, the tactics described above rely too much on manual operations: assessing the scope of the infringement, determining the response, notifying, monitoring, re-notifying, etc. A bit counter, to say the least, to the nature of the internet with its 23 billion pages.

You get my point. The problem requires a much more decisive and scalable response involving all the players: content providers, aggregators, search engines, advertising networks and sales houses. Here is a possible outline:

1/ Attributor needs to be acquired. The company is simply too small for the scope of the work. A few days of Google’s revenue ($68m per 24 hrs) or less than a month for Bing would do the job. Even smarter, a group of American newspapers and book publishers gathered in an ad hoc consortium could be a perfect fit.

2 / Let’s say Google or Bing buy Attributor’s core engineering know-how. It then becomes feasible to adapt and expand its crawling algorithm so it runs against the entire world wide web — in real time. Two hours after a piece of news is “borrowed” from a publisher, it is flagged, the site receives an pointed notification. This could be email, or an automatically generated comment below the article, re-posted every few hours. Or, even better, a well-placed sponsored link like the fictitious one below:

Inevitably, ads dry up. First, ad networks affiliated to the system stop serving display ads. And second, since the search engine severed hyperlinks, ads on orphan pages become irrelevant. Every step is automated. More

Ebooks Winners & Losers

Let’s come back to the ebook with more questions. There is no doubt: the digital book will find its place under the sun; its prospects look much better than those of the online press. In the first place, there isn’t an ingrained, now decade-old, habit of reading news for free on the internet. Second, the book (in its physical form) is the centuries-old incarnation of the “cognitive container”, with its unparalleled convenience and with a value attached to it. And third, it can’t be unbundled.

For the online press, on the contrary, more than 90% of online newspapers are available for free. The “cognitive container” is totally non-practical in terms of size, readability; the interface sucks: most broadsheets’ stories run on two pages but many readers don’t go beyond the jump. Lastly, the daily news is begging for unbundling (look at the Sunday edition of your favorite newspaper, with its ten plus sections).

What does the book gain by switching to the electronic format?
Three things:
- new formats with rich media appealing to reluctant books readers (the current Generations X and Y, mainly)
- enhanced capabilities such as search, ability to create a personal table of contents, or to extract and index snippets
- a complete overhaul in the production system, which will breed new market opportunities as editorial works, once finished, will enjoy instant worldwide availability.

Of course, obstacles remain. A recent survey conducted by Bain and Co listed eight obstacles in the way of widespread ebook adoption (PDF here).

Interestingly enough, said its authors Patrick Behar and Laurent Colombani, the nostalgia of the “paper experience” is disconnected from the generation factor: all age groups continue to enjoy the book as a physical object. This guarantees some level of coexistence between the two medias. But the authors also admit the two next barriers – the price of the device and reading comfort –  will fade quickly as Moore’s Law still rules, both for mass produced devices and for screen quality (see for instance Qualcomm’s Mirasol display combining the advantages of electronic ink and the color depth of LCD screen — see also this story in The New York Times). On the devices’ price, Bain & Co sees the following evolution and point to the thresholds required to convert purchase intents:

Using this backdrop, let’s now try to see how the different participants might fare. (For a close-ip on the digital rights issue, see last week’s Monday Note)

Manufacturers: uncertain. Users expect around a hundred dollars or euros for an e-reader and will soon expect three times this amount for a full color, full-feature tablet. To put things in perspective, a teardown analysis made by iSuppli shows the cost of components for an Amazon Kindle is $176 as the iPad reaches $264 and $214 for the Samsung Galaxy Tab. This gives an idea how thin margins are likely to be in the future. In other words, manufacturers who won’t be able to sell the blades (i.e. contents) along with the razor will have a hard time making any money. More

ebooks: trading digital rights, not files

There are many reasons to be bullish for ebooks. On the device side, the iPad set the standard (rather high) and triggered an intense competition among manufacturers and operating systems providers. On the people side, just take New York’s subway, or a high-speed train in Europe. And we’ve seen nothing yet: tablets prices will go down as cell phone carriers – and eventually media companies – subsidize e-readers. Before year-end, European telcos will offer the Samsung Galaxy — an Android-powered tablet — for €300 or less, preloaded with access to online bookstores and electronic newsstands. For the industry, this Christmas season is critical: tablet makers must secure defensible market territory before Apple’s probable roll-out of its next generation iPad.

The content side remains more complicated to figure out. A first phase is likely to consist of an extension of what we have today, i.e. a transaction system based of book files: text-based books or richer media products. The main players will remain Amazon, or the Apple iBooks store. But, in five to ten years, this way of dealing with intellectual content  will be seen as primitive.

The true revolution will be a shift from a files transaction system to a rights transaction system. This transformation involves radical changes in the way we think of digital content, books, videos or even games.

For now, let’s focus on books. Here is how it could work.

We’re now in 2015. I read books-related contents on a number of different devices: my smartphone, my high definition tablet, and even my PC some times. (I personally do not believe in TV for such products). I want spend a long weekend in Rome. Instead of buying a couple of books – one to organize my trip and another to use on location – I will buy rights to both.

As I download the books I bought rights to on an iPad or a Samsung Galaxy, the content takes advantage of specific screen features and displays large pictures, some of in 360° panoramic format and zoomable. My Microsoft tablet uses the extraordinary DeepZoom technology connected to the Bing Maps Live View

More

Time to rethink Word Processors — Seriously

Last Friday, at the Apple Store near the Paris Opera House, I paid my annual Microsoft tax: €140 ($194) for the 2011 edition of Microsoft Office. My hopes: more speed, less bugs, and smarter features. All in the service of producing all manners of text and presentations required by my multiple jobs. So far, no mind-blowing features, nothing more than a superficial makeover.
To look at this new iteration of Word, I use the framework built on my experience of Microsoft’s R&D effort.  A few months ago, I spent three days at the Microsoft Tech Fest in Redmond. At first, I felt like a kid in a candy store, chatting with some of Microsoft Research 900 plus PhDs who work on exotic fields such as Machine Learning or Epidemiology. But the amazement subsided and was replaced by doubt: How did this tremendous intellectual firepower actually make a difference in the Microsoft products I’ve been using for 15 years. In fact, Microsoft R&D has very little impact of everyday products. This is but one of Microsoft’s many problems: see the long piece I wrote in Le Monde Magazine.

Let’s go back to the subject of this column. Knowing what I know about Microsoft’s vision of computer science, I had envisioned of a quantum leap for applications I use the most, such as the very word processor on which I’m using “as we speak”.  No joy. Let’s ignore the letdown and, instead, speculate a little bit about the next generation of text creation tools branded Microsoft Word, or Apple Pages (which comes with fewer bells and whistles, but is tidier).

First, text creation. One of the biggest challenges, and a growing one, is spelling, syntax, and grammar. In a country such as France, whose language is loaded with utmost (and sometimes absurd) complexity, the quality of writing is in steep decline. For the youngest part of the population, it is accelerated by the demise of a school system where teachers in effect gave up on written language. As for the 30-40 age bracket, the bombardment of daily interactions (email at work, SMS, chat on social networks) has made proper spelling and syntax secondary. Quite often, coming from a manager or even an attorney, you’ll receive a business document riddled with spelling errors well beyond the typos acceptable in a hastily written piece.

Unfortunately, today’s word processors do a very poor job when dealing with mangled spelling and grammar. All of us have in mind examples where the Word application becomes absurdly creative when dealing with the unknown: regardless of context, and with no learning capabilities whatsoever, Word will stubbornly keep suggesting an alternate spelling instead of simply skipping an unrecognized term.

Let’s dream for a moment; let’s picture what a text processing software could look like in the light of existing technologies.

When I install my 2013 version of MS Word or Apple Pages, it asks me to load a “reference corpus” of texts it will learn from. Since I write both in French and in English, I will feed the app with the final versions (edited, and proof-read) of articles I published and I’m comfortable with. Grammar and syntax will be helpful for English and thesauruses will be used for both. Since I currently write about media and technology, the application dictionary will soon be filled with the names of people, places, companies I mention, as well as with the technical jargon I allow myself to use. Alternately, if I don’t want to feed the word processor with my own writings, I can direct it to URLs of texts I find trustworthy: great newspapers, magazines, or academic papers…

Similarly, a lawyer or a doctor will feed the word processor with texts (from his own production, or found online) to be used as reference for professional vocabulary and turns of phrase. In my dream, third-party software vendors have seen a business opportunity: they sell industry- or occupation-specific plugins loaded with high-quality reference corpuses. This results in reliable auto-correct for Word and Pages. Some vendors even provide their corpuses as on-line subscriptions, constantly updated with state-of-the art content.
Then, as I write, the application watches my typing and matches it against the relevant corpus. Instead of relying on rigid hit-or-miss grammatical rules, it uses a statistical algorithm to analyze a word, or a group of words within their context of intended or inferred meaning. Take this gross mistake: “GM increased its sails by 10 percent”. The word is spelled correctly but, in this context, wrong. Because it lacks a context in which to detect the misspelling, the 1998-vintage word processor won’t change “sails” into “sales”. Conversely, the 2013 statistical-based language model flags the mistake by using the proper body of reference to see that “sails” is unlikely in an auto industry context.

Just a year ago, Google introduced Wave, an ambitious reinvention of email, seemingly ahead of its time. Among other advances, Wave featured a spectacular implementation of Google’s huge statistical model of language. In this video (go to the 45th minute) you’ll see Google Wave’s product manager Lars Rasmussen type the following sentences: “Can I have some been soup? It has bean a long time. Icland is an icland”, etc. Each time, the software automagically corrects the mistakes as they are typed, confident in the power of its algorithm and of its immense body of reference.  This statistical approach works with gross, obvious mistakes, but also with more subtle ones.
Of course, I am aware of the difficulties in applying statistical language models to personal software: such algorithms are bandwidth and CPU intensive. This could explain why Google did not deploy the Wave spelling demonstrator on Gmail, or on Google docs. But the underlying algorithms do exist. A less sophisticated version, limited to professional dictionaries and thesauruses at first, could be fantastically helpful in properly spelling Zhengzhou, if you happen to write about Asia, or Neuroborreliosis, if you are a medical student.

Second, the use of texts. A significant proportion of writings goes to blogs and other social environments.  As a serious user of the WordPress platform [today’s Word can’t even change WordPress into the correct WordPress, I had to check on Google…], I would gladly pay for a Word or Pages plug-in allowing me to compose a clean post with text, images, tables, links, typographical enrichments and, when done, letting me click “publish on my blog” or “send it to the mailing list”. No more cut & paste surprises or image resizing headaches.  The word processor plug-in could be provided by the same developer who designed the style sheet (CSS) for my WordPress (or Blogspot, or TypePad) site. Or I could go for the auto-settings by inserting the CSS code in the plug-in that will, in turn adjust the word processor’s dials, from fonts and sizes, to background colors, etc.

You get my point: self-correcting spelling systems that guarantee (or at least vastly improve) decent grammar, syntax and the proper spelling of nouns and names can be a huge improvement for all professional writers – especially in a globalized economy where a greater number of us produce documents in a foreign language. Such auto-correct systems can even offer educational value in helping bloggers improve their basic writing skills.

I’m writing this on Word version 14 (yes, fourteen).  How long will I have to wait for this quantum leap, Mr. Ballmer? Or Mr. Jobs?

—frederic.filloux@mondaynote.com

Expanding Into New Territories

In defining business strategies for modern medias such as online newspapers, the most difficult part is finding the right combination of revenue streams. Advertising, pay-per-view, flat fee… All are part of the new spectrum media companies now have to deal with.

The gamut looks like this:

As we can see, newspapers mostly consist of one product line, confined to the mainstream, value-added news category. By going digital, this segment is likely to lose most of its value (expect a 60% meltdown as expressed in revenue per reader). Therefore, for these companies, it becomes critical to expand into new territories already taken over by other players. For instance, big media outlets endowed with strong brands should go into commodity news and participatory/social contents. This doesn’t mean a frontal attack on Facebook or Twitter, obviously; instead, the new reality dictates using and monetizing through them (see last week’s Monday Note on Facebook monetization).

Ancillary publishing should also be considered a natural expansion: news outlets retain large editorial staffs that could be harnessed to produce high value digital books (see this earlier Monday Note on Profitable Long Form Journalism). The “Events” item, on the list/graph above, is more questionable, but it remains a significant source of potential income tied to the brand’s notoriety. I left aside the classifieds business: except for a few media groups (Schibsted all over Europe or Le Figaro Group in France) that boarded the train on time, positions are now too entrenched to justify an investment to gain a position in that segment.

Advertising is likely to remain the biggest money maker for the two dominant categories: Commodity/Participatory/Social Media and Mainstream Value-Added. Unfortunately, in its digital form, advertising has run in deflationary mode for the past decade due to flat (at best) CPMs, with huge inventories putting further pressure on prices.

Print doesn’t look great either as investments shift en masse to digital; this reflects the growing imbalance between time spent by users on print and advertising investments in the medium. According to Nielsen Media Research, the Internet now accounts for 38% of time spent but only for 8% of ad spending; newspapers are on a symmetrical trend as they captured 20% of advertising dollars for only 8% of users’ time. More

The Facebook Money Machine

An update to this column: According to the Wall Street Journal, any of Facebook’s most popular applications have been transmitting identifying information — in effect, providing access to people’s names and, in some cases, their friends’ names — to dozens of advertising and Internet tracking companies. See here (paywall).

This year, Facebook will make about $1.5bn in advertising revenue. On average, this is about three dollars per registered user, a figure that is significantly higher for the 50% of the social network’s population that logs in at least once a day. How does Facebook achieve such numbers? Last week, we looked at the architecture Facebook is building as a kind of internet overlay. Now, let’s take a closer look at the money side.

If Google is a one-cent-at-a-time advertising machine, Facebook is a one-user-at-a-time engine. The social network is putting the highest possible value on two things: a) user data, b) the social graph, e.g. the connections between users.
For a European or American media, one user in, say, Turkey (23m Facebook users) carries little or no value as far as advertising is concerned. To Facebook, this person’s connections will be the key metric of his/her value. Especially if she is connected to others living outside Turkey. According to Justin Smith from the research firm Inside Facebook, in any given new market, the social network’s membership really takes off once the number of connections to the outside world exceeds domestic-only connections. A Turkish person whose contacts are solely located within the country is less valuable than an educated individual chatting with people abroad; the latter is expected to travel, has a significant purchasing power and carries a serious consumer influence over her network. As a result, Facebook extracts much more value from a remote consumer than any other type of media does.

Advertisers rely on three main strategies on Facebook, as explained by Frederic Colas, chief strategic officer for FullSix Group, a Paris-based interactive agency. The first one is the fan page. The goal is to manage and optimize user engagement with a brand through community management. Numbers are impressive.
Here are the top 15 compiled by Facebakers:

Getting high traffic on a fan pages is still more art than science; interaction volume varies widely. In a recent study (here, in French), FullSix demonstrated that, within a same market segment such as fashion, the number of monthly interactions per 1000 fans will be 4 times more important for H&M (4.3m fans) than for Gap (0.75m fans) and 25 times higher for Victoria’s Secret (8m fans) than RayBan (1.4m fans).
The second approach uses social plugins (such as the “Like” button, recommendations, external login, etc.).
And the third strategy is more like classic advertising campaigns with an unparalleled degree of targeting: Facebook makes possible to combine precise parameters, ranging from location to company name and the precise timing of an ad with a high degree of precision (find the women above 40 who work for IBM, in northern New York state and deliver an ad every Friday between 18:00 and 22:00, for instance). This advertising resource is self-serve, totally automated, and accounts for half of Facebook’s commercial revenue. More

Mark Zuckerberg, The Architect

The Social Network is an excellent movie. It’s fast, entertaining. And words crafted by Aaron Sorkin, one of Hollywood’s most talented screenwriter, flatter the Harvard crowd and make it sound wittier than it actually is. In addition, digital imaging enthusiasts will enjoy the Red Camera’s performance, demonstrating its extraordinary low light and depth-of-field creative potential. David Fincher’s movie has to be seen as fiction based on a true story. Nothing more. There is no room or need for an exegesis here.

And yet, Facebook’s most game-changing feature couldn’t be rendered into pixels. It is actually encapsulated on page 34 of Sorkin’s script, when Zuckerberg is facing the too-perfect Winkelvoss twins (played by a single actor in the movie, thanks to special effects) who pitch him their idea of the “HarvardConnection” social network. Their sketchy description triggers a short but intense burst of activity in Mark’s brain. The 20 year-old geek is seen processing the idea at light speed, before mumbling: “I’m in”.

No further questions. In five seconds, we’ve witnessed the fictitious Zuckerberg envisioning the seeds of a grand plan, going well beyond his own (and gross) rate-a-girl algorithm, and beyond the Winkelvosses project of “an exclusive Harvard-dot-e-d-u” network. (In the real life, the twin eventually sue Zuckerberg for stealing their idea, and settle for an alleged $64m). More