The more diverse and ubiquitous the internet gets, the harder it becomes to measure. Especially with the mobile version’s rapid growth. A few weeks ago, my friends from the International Newsmedia Marketing Association (INMA) asked for a presentation discussing audience measurements for smartphones and tablets. The target was a conference held last Friday in Boston. Since I didn’t have a clue, I assumed I could work on the presentation in a journalistic way, by reaching out to people in the trade and by doing my own research. Only to realize the mobile internet is well ahead of any of today’s usage measurement tools.
Audience measurement is much more complex on mobile devices than it is on PCs. The world of personal computers is relatively simple. PCs surf through a well-documented set of browsers: Internet Explorer, Firefox, Chrome and Safari (see their respective market shares here). The connection happens either through an ISP wire or via wifi. On the server side, each request is compiled into a log for further analysis.
In the mobile world, there are many more variations. The first dimension is the diversity of devices and operating systems. The real mobile ecosystem extends well beyond the pristine simplicity of the Apple world with its two main devices — the iPhone and the iPad, only one screen size for each — powered by iOS.
Android, the ultra fast growing mobile OS made by Google, is found on 95 170 (!) different devices. Each comes with its (almost) unique combination of screen size and hardware/software features; “small” differences translate into a nightmare for applications developers. There is more: the mobile ecosystem also comprises platforms such as Windows Phone devices, the well-controlled Blackberry, Palm’s WebOS (now in HP’s hands), Samsung’s Bada and the multiple flavors of Symbian, to be followed by Meego. Each platform sprouts many devices and browsing variants.
Then, we have applications. Apps are fantastic at taking advantage of the senses of smartphones and tablets. An app can see (though the device’s camera), hear (with the microphone), understand language, talk back; it can search — the Yellow Pages for a location or the web for an explanation; it can feel motion thanks to the smartphone motion detectors and gyroscopes ; it can navigate through GPS or cell tower/wifi triangulation; and of course, it can connect to a world of other devices. This results in an unprecedented canvas for the creativity of app developers. According to recents studies, apps account for about half of the internet connections coming from smartphones. It is therefore critical to analyze such traffic. But, to say the least, we are not there yet.
One example of the measurement challenge: a news related application. The first measure of an app’s success is its downloads count. In theory, pretty simple. Each time an app is downloaded, the store (Apple’s or any other) records the transaction. Then, things gets fuzzier as the application lives on and gets regular updates. Sometimes, updates are upgrades, with new features. At which point should the app be considered new — especially when it’s free, like most of the news-related ones? Second difficulty: a growing number of apps will be preloaded into smartphones and tablets. Rightly or wrongly, Apple nixes such meddling with its devices. But, outside of the iOS world, cellphone carriers do strike deals with content providers and preload apps on Android devices. That’s another hard to get number.
We might believe the app’s activation provides a measurable event that settles the issue. It doesn’t. Let’s continue with the news app example. When launched from a smartphone or a tablet, the app sends a burst of “http” requests to the web server. How many? It depends on the app’s design and default settings. There could be 20, 30, or more streams loading in the background. The purpose is instant gratification: when the user requests the most likely item, such as “hottest news”, the content shows instantly for having been preloaded. This results in several uncertainties in the counting process.
From the server standpoint, the pages have been served. But how many of those have been actually read and for how long? What if I tweak my app’s setting, selecting some items and removing others? In an ideal world, a tracking task running inside the application would provide the accurate, up-to-date information. Each time the app runs, the tracker records every finger stroke (or swipe) and, whenever possible, feeds everything back to the publisher. But the OS gamut and other technical permutations makes this difficult. As for Apple, tracker code inside its apps is a no-no (although there are signs of an upcoming flexibility in that matter).
Even a well-implemented tracker module isn’t the perfect solution, though. For example, it doesn’t solve the issue of apps running in the background and downloading streams of data, unbeknownst to the user. Such requests are recorded as page views by the server, but the content is not necessarily seen by the user.
The French company Mediamétrie Net Ratings (in partnership with Nielsen), came up with a solution that might pave the way to useable hybrid measurements. Nielsen Net Ratings, NNR, is known for its technology built around panelsof users (details here) who agree to have trackers running on their PCs. To improve mobile measurement, NNR recently teamed up with the three French cellular carriers and built a new massive log analysis system. The structure looks like this:
The (simplified) sequence follows:
1 / Cell carriers. They compile millions of logs, i.e. requests coming from their 3G/Edge networks to websites (no distinction between a request coming from an Android web browser or an iPhone application). Basically the log ticket says: P. Smith, number ###, sent this http://www… request on Dec 3rd at 22:34:55.
2/ The third party aggregator. Its main job is to anonymize data thanks to an encryption key it gets form the carriers. France’s privacy authority is very serious about data protection. Neither the cell carrier, nor the measurement company can have a full view on what people do on the internet.
3/ The audience analysis company. Here, Nielsen Net Rating France. In our example, along with cell numbers for its 10,000 others panelists, NNR sends the third party aggregator John Doe’s number.
4/ The aggregator encrypts the John Doe’s number in a “fdsg4…” sequence and sends it back to NNR.
5/ The carriers then send huge log files to NNR.
6/ NNR’s job is to retrieve its encrypted panelist from within the logs haystack. When it does spot the “fdsg4…” sequence, it can tell that John Doe, whom NNR knows everything about, has gone to xyz websites via its cell phone at such and such dates, times and, perhaps, locations.The rest of the log remains encrypted, therefore useless.
This system has only been in operation for a few months. And it is not perfect either. For instance, it tracks only requests going through cell phone networks; it ignores web requests sent through wifi — that account for 30% of total usage! The new system also ignores Blackberry users using RIM’s proprietary network. And the NNR algorithms need help from a huge database of URLs provided by the sites publishers. These URLs will be used to differentiate web browser requests from the ones generated by an app; we are talking of millions of URLs here, growing by the thousands every single day. A daunting task. In addition to this complication, large amounts of data still reside in the publishers servers. Hence a certification issue, as for all site centric measurements.
So much work ahead. The future lies in a deeper merger of site centric (log analysis) and user centric (panel) techniques. And also in a wider deployment of HTML 5 apps. We’ll explore the new web Lingua Franca’s potential in an upcoming Monday Note.
- Measuring time spent on a web page TweetHow much time is actually spent on websites? New technologies are emerging, starting with time spent on individual pages and drilling down to page segments. Such technologies will lead to improved monetization; they could even spell good news for paid sites. Here is why. First, display ads. Banners and other modules still represent 30% to [...]...