News sites use trackers more indiscriminately than ever. A random sample of twenty digital properties yields stunning results. Last week, we looked at how long web sites take to load, today, we see how messy their user data collection is.
When landing on Politico’s home page, your browser loads about 100 pieces of code known as Trackers – behind your back. These trackers are used mostly for advertising: detecting/building user profiles, serving targeted ads, picking up the brand with the best fit on a realtime bidding platform. Other trackers are beacons aimed, for instance, at following the reader from one site to another (the kind you gleefully thank when the North Face jacket you once looked at ends up pursuing you for months). Another kind of tracker is quite indispensable, it involves analytics, counting users, sessions, time spent, etc. With the advent of the social web came all sorts of trackers, users’ connectors to social or affiliation programs. For good measure, some sites also insert chunks of code aimed at organizing A/B Testing — submitting configuration A to a segment of the audience and configuration B to the other to see what works best. (Weirdly enough, A/B trackers are by far the least deployed, accounting for 1% of the total.)
In fairness, Politico is often a fast site and doesn’t always load its full stack of trackers. Most likely, the loading process times out (as show before, when I wanted to make a screenshot of the page, it was stuck to “only” 89 trackers.
Politico might the most trackers-saturated site of our random sample, but others are not far off. The Daily Mail is one of the most popular news sites in the world with 26m uniques visitors per month at home and 67m UVs in the US, according to Comscore. A single click on its Mail Online flagship sends a whopping 672 requests, but it manages to run them at blazing speed (19 sec loading time) for a feather weight of 3 Mb, including 2.7 Mb for 578 super-optimized pictures that don’t exceed 120 Kb each.
The Mail Online wins many digital speed/weight records. It is one of the most optimized web sites in the world (see our last week story on the obesity plaguing the news industry). But when it comes to monitoring users, The Mail Online also scores high with 79 trackers loaded in one stroke (see below), of which I was able to detail only 63 in my main table:
A broader analysis conducted last week on a random selection of large news sites shows a surprising high reliance on trackers of all types: On average, their home pages load about 30 trackers (article pages usually do less). Here is the ranking:
In total, the 20 sites sample collected 516 trackers. They come from about 100 vendors displayed on this column’s header chart of (As I’m sure I’ll find its way into various presentations in the coming months, the original Keynote file is available upon request — always happy to help.) To measure this, I simply loaded the Ghostery browser extension on my Chrome and Firefox browsers (I wanted to detect discrepancies — none found). Finally, I got a table that looked like this:
The table above is available as a Google Docs Spreadsheet here and in PDF format here.
About 60% of this trackers are ad-related. The crowd is obviously dominated by the two players commanding 60% of the global digital advertising: Google (53 trackers spotted) and Facebook (33). Then comes a cohort of players, some serious, others more questionable.
The strangest thing is this: When you look at each of their mission statements, you see a huge overlap in functionalities. Here is a sample hardcore sales pitch, also often found on the same news site:
“Our unifying DMP (Data Management Platform) helps marketers and publishers drive more revenue, efficiency and engagement through the power of audience data. Working as trusted partners, we help our customers transform the way they do business. Providing an unmatched level of industry knowledge and technical service to help them master the complexities of Big Data and gain the impact they need.” (Lotame)
“We bring web publishers and advertisers together via a single buying and selling solution tuned to enable publishers to maximize ad revenue and advertisers to buy quality impressions.” (Sonobi)
“Crimtan is a technology-rich Digital Advertising Services provider. Our proprietary Data Engine and state-of-the-art technical capability enables Crimtan to offer a wide range of ROI focused products to publishers and advertisers. Crimtan provides advertisers with precise audience targeting, optimisation and reporting. Publishers benefit from enhanced visitor insight and increased revenue.”
“We are a global media valuation platform that enables digital buyers and sellers to assess the value of every ad opportunity across channels and screens, and make informed decisions that maximize ROI. Through constant technological innovation, key partnerships, and strong client relationships, we’re driving the industry toward realizing the full potential of online advertising.” (Integral Ad Science)
“MediaMath is a digital marketing technology company dedicated to reengineering modern marketing to offer transformative results based on tangible goals. Its math-driven Marketing Operating System, TerminalOne™, brings together digital media and data into a powerful and flexible solution that simplifies planning, execution, optimization and analysis of both direct response and branding campaigns.”
Found on a French site (believe it or not):
“Melt is a Brazilian company established with the aim of revolutionizing the Hispanic market buying online media. With this, we launched a self-service tool that helps agencies, medium and large advertisers to optimize the most of their investments in digital advertising.”
Some are bands of brands that work together (sometimes on the same site), or have been acquired; examples:
Neustar PlatformOne (formerly Aggregate Knowledge) operates: Aggregate Knowledge
Conversant (formerly ValueClick Media) operates: ValueClick Media, ValueClick Notice
On another major French news site, I even found a beacon/widget supplier whose own site is so poorly implemented that it triggers a security warning hinting at a phishing risk (I still wonder what could be the client’s commercial benefit….)
That’s the smell of silicon snake oil. Evidence shows that vendors are better off than their customers at defining what’s needed…
This kludge of trackers reflects more desperate moves than thoughtful strategies. Traditional publishers tend to stuff their sites with all they can think of: in the ranking above, you’ll notice that native media companies (Vox, Vice, etc. or even Buzzfeed with only 11 trackers) are much more selective in their choices of tracking systems than old media (Politico might be a fantastic editorial pure player, but when it comes to analytics it behaves like an old-media.)
Evidently, many trackers are useful and sometimes indispensable to optimize advertising yields, connect to Data Management Platforms, refine users profiles, etc.
Still, none of these stacks of code are innocuous. Consider Chartbeat, the most powerful audience measurement tool. Its single tracker consists of no less than 29,000 characters of code.
And Chartbeat might be the best implemented code (It is probably one of the ten or twelve trackers to keep for a news site), but in order to perform the increasingly granular analytics asked by publishers, there is no other choices than relying on convoluted and opaque coding.
The web ecosystem is messier than ever. A clever crowd of vendors has taken advantage of the FOMO syndrome (Fear Of Missing Out) that stems from technological uncertainty. In the 70’s, the short-sleeved, pocket-protector-bearing IT man that shielded his butt by recommending IBM hardware and service. Today, it’s the myriad of young people in sales and marketing departments, fearing above all to act differently from their peers, who feed the beast. While higher in the chain of command very few understand what’s going on, it’s open bar for the opportunist, not for the customer.
— frederic.filloux@mondaynote.com




A lot of these trackers are crawling out of ad units, where publishers don’t have a lot of control over what trackers show up or not. Any publisher running remnant ads will find their site attacked by megabytes of advertiser tracking codes from that alone, and that doesn’t count the direct sale ads whose advertisers require the running of usually at least 2-5 different tracking codes.
I’d say most publishers are running, at most, 10 of their own tracking codes, the rest get pulled in through various ad networks. The reason why new new media like Vox shows up with less tracking code is because, like a lot of new media players, they’ve turned heavily towards native advertising, where they create, control, and dictate more direct terms on the advertisements, thus avoiding having to run a ton of random crappy and mostly dysfunctional tracking codes.
As to why advertisers run multiple tracking codes in their single ads? I couldn’t say, but I strongly suspect its because every time they add a new tracking code it gives them a different number, mostly because all those tracking companies are rushing to market and what I’ve seen of their code isn’t very good.
Thanks for your insightful comment. The whole thing pleads for native ads which are lighter and can be managed out of an ad server.
Indeed, native ads—or ads that look native, so that they’ll escape being blocked by filters reining in off-site links—are the likely future. But that makes every site responsible for its own ad sales, or means ad networks need to trust the sites for distribution.
This arms race is about to take a new turn, thanks to increasing ad-blocker usage and its endorsement by Apple’s forthcoming upgrade. Some sites will maintain functionality by creating a site-specific database of tracker code that just needs a signed/encrypted key downloaded; that’ll relieve the worst of the bloat.
Or so I imagine. This will probably sit very poorly on the small, single-writer sites that depend on ads, while simply upping the costs of doing business for the few thousand largest sites who can command native ads or work-alike versions that they develop with the ad networks.
Aram is right about trackers crawling out of ad units, 2 options here :
– direct campaigns : usually not a lot of trackers, you have at least the publisher adserver and the advertiser adserver (2 different trackers). But the advertiser adserver tag can call other 3rd parties trackers like an ad verification tool (verify the context of the ad, mainly URL and user geolocation), a viewability tool (is the ad viewable or not) or a panel tool (Comscore or Nielsen : is my ad targeted to the right sociodemo)
– indirect campaigns : here a lot more trackers, it becomes messy. You also have the same tags as from the direct campaign (publisher and advertiser adservers, maybe other tools like viewability, ad-verification or panel) but you have additional tags.
Publishers may opt to call ad-networks in a “daisy chain” : 1st ad-network called, if no ad available for the user (user not interesting for the ad-network), 2nd ad-network called, etc. It’s a bit out of date now and terrible for the user experience (very slow).
Now, we have the rise of programmatic and publishers use an ad-exchange (also called a SSP). Here the publisher ad-exchange will regularly synchronize your user id on the ad-exchange with your user id for each of the bidders (also called DSPs). This sync is necessary as those providers don’t work with the same cookie namespace (they all have different user ids on you). So that when the publisher ad-exchange call Criteo (for example), Criteo recognizes you and bid a high price to retarget you with the right product you saw yesterday
==> in effect, one call to an ad-exchange tag can lead to 10 calls to 10 different DSPs (even if there is only one winner, it will sync your user id)
Then, the winning DSP, after serving the ad, can also decide to synchronize your user id with some other ad-exchanges (e.g. not the one used by the publisher)
==> can lead to 10 more calls to 10 different ad-exchanges
You get it, programmatic is very messy and lead to bad user experience
Then, you have the DMPs :
Theorically, you can have calls to the publisher DMP and multiple advertiser DMPs on the same page. Here the logic is the same as with programmatic advertising : the DMP will synchronize your user id with 3rd party companies to build a better profile on you (catching all inputs on you from other tools), monetize you better (if publisher DMP) and eventually push your info to DSPs (and/or Facebook, Twitter, emailers like mailchimp…) in order to retarget you elsewhere (if advertiser DMP).
If the DMP is messy or not very will integrated with the other ad tech tools, it will even retrieve or push your information directly when you surf (instead of establishing server to server connections), further slowing down your surf
You get the picture, all of this mess is caused by extensive user tracking (you are the product) and no real regulation, it’s actually getting worse because of programmatic and DMPs (enabling advertisers to effectively target every impression)
Yes, often the programmatic ads are causing the worst performance of all though pretty much any display ad may daisy chain through a number of trackers. A while back I was building a video player to deal with VAST tags (video preroll on the web is provided through these XML files) and we had trouble because one our direct-sale ads were calling a chain that ended up being 4 total VAST tags, each calling the last, each designating at least 1 tracking code to deploy.
Native ads are arguably a way out, or ‘content’ ads like Taboola or Outbrain, though they deploy their own tracking codes and have their own problems. Even those assumptions may be challenged as programatic buying hits native and in-content ad types through stuff like https://gemini.yahoo.com/advertiser/home.
There’s a bigger issue I think, which is that publishers’ advertisers don’t trust them (which has to be one of the reasons why they deploy so many tracking codes). I’m not sure why (though I suspect the inability to detect bots and bad technology choices in the past have played into it) but that problem is going to need to be solved.
One of the things we’ll need to do is figure out the causes of *what* is pushing advertisers to deploy all these trackers, many of which aren’t even advanced enough to do the type of user tracking described above. I’d say at least half of the tracking codes I’ve examined do basic stuff, like viewability or plain impression detection.
From experience, I would say that publishers and advertisers deploy all there trackers but they are not aware of those. In fact, even when working in the ad tech companies, you are not aware of every player and mechanisms (examples : cookie sync or the multiple VAST passbacks you were talking about). The adtech industry is very very complex and few have a global view on it.
So I don’t know how this can improve, maybe if very educated publishers or advertisers pressure their adtech vendors to limit tracking (and speed-up pages loading). Example Edouard was talking about : SSPs should sync user ids only if needed.
Good point about native ads, and indeed it may change as native can now be transacted over RTB (though it’s only starting now).
Odd. My Ghostery loads 17 trackers for Daily Mail and only 10 for Politico. Could it be that these trackers are location-sensitive?
Could be, but also, different trackers will load on those pages depending on what ads you are presented, which may indeed vary wildly depending on browsing habits and geo-location.
I found them time sensitive. For instance much less tracking codes on a Saturday than on a Monday morning. My guess, is that time out connections also play a role…
The number of trackers can also depend of the frequency you visit the site. If you go on the site several times a day, then your cookies are already synced and they don’t need to be synced again.
Also, it really depends of the technology. I found that some SSPs will always call a lot of 3rd party technologies to sync when some others SSP will have a sync only if needed. It really lightens the loading.
Hello,
Fine, and now, is there any way to get rid of those bugs ?
[…] takeovers, un-killable auto-play videos and other monstrosities—including the use of literally hundreds of tracking agents, cookies, super-cookies and other invasive […]
> there is no other choices than relying on convoluted and opaque coding.
This is likely a performance measure called “uglification”. The JavaScript source is processed and made to be as minimal as possible. You can read more about the tool here: http://lisperator.net/uglifyjs/
No, I’m quite familiar with uglification. Many of the tracking codes are put together badly and not uglified. Even the ones that are, can still be readable.
The best way to have a lighter load of trackers, for whatever it is worth, is to run Adblock Plus.
[…] un-killable auto-play videos and other monstrosities–including the use of literally hundreds of tracking agents, cookies, super-cookies and other invasive […]
,别样的童年 星型卸料器,电磁脉冲阀,粉尘加湿机,斗式提升机,图纸价格生产厂家免费供应,叶轮给料机,星型给料机,电动锁风机,防爆型卸料器_泊头市&
[…] “mangiando” buona parte della banda. Frédéric Filloux mette sul piatto alcuni dati relativi alla questione. Che il mobile fosse un ulteriore grattacapo per gli editori si […]
Hi Frederic, thanks for this article. I work in a TMS company (TagCommander), so we have our share of responsibility on that! More seriously, the arrival of TMS on the market is one of the cause of what you describe (on top of the increasing number of digital marketing solutions / fact that the digital medium is the most measurable). I think your article opens up to two others interesting subjects: the fact that companies need a new person in their organization who should have a cross-services role to manage all the data that is collected and sent to partners in order to avoid the messy data user collection that you describe (role currently called “Chief Data Officer”). The other one is that IT / digital marketing team should measure and follows carefully tags loading time because the increasing number of tags is slowing down page load time that affects negatively user experience, conversion rate and SEO ranking.
Hey Frédéric,
Awesome post that really underscores the fragmented and scatterbrained approach to collecting data for advertising purposes.
I believe ads and their trackers are a “necessary evil” to support media companies, but I’ve found a bigger issue of trackers which collect private data its notnjust IP tracking anymore, take text field monitoring as an example. This leads to trackers having direct access to password fields, credit card fields and other such sensitive data.
Who’s controlling this?
The official stance by marketing firms is that they do not collect private data but it’s the equivalent of “we’ll close our eyes while you type in your sensitive data”
While reading this, my biggest concern became how 17 trackers could all be seeing more than they should and what publishers sould do to protect a users privacy being invaded instead of just being advertised to.
[…] are slow to admit any blame. To make things even more confusing, many sites also include dozens (or hundreds) of trackers which – although relatively small – can add […]
[…] Monday Note hicieron un ejercicio simple: instalaron Ghostery para recopilar los trackers que utilizan 20 […]
Great research and conducive to go discussion. Well done FF
[…] like to do, however a part of it was an actual concern, as additionally detailed in “20 Home Pages, 500 Trackers Loaded: Media Succumbs to Monitoring Frenzy” by Frederic […]
Meanwhile, if you want to at least protect your own browser, check out EFF’s Privacy Badger.
[…] Monday Note si cargas 20 portadas de sitios conocidos de Internet te rastrean unos 500 trackers en total, […]
[…] Monday Note si cargas 20 portadas de sitios conocidos de Internet te rastrean unos 500 trackers en total, […]
[…] “Según Monday Note si cargas 20 portadas de sitios conocidos de Internet te rastrean unos 500 trackers en total” […]
[…] articles, such as 20 Home Pages, 500 Trackers Loaded is a well done look into just how far advertising companies go in tracking you. The article Looking […]
[…] Monday Note si cargas 20 portadas de sitios conocidos de Internet te rastrean unos 500 trackers en total, […]
[…] too much into the mass market. But that it needs to stuff its web pages with 50+ trackers (see our last week story on the matter) reveals a hesitant marketing […]
FF
What happens, truly interested, if end users find an easy way to change their hosts files DNS, so every request to a tracking site doesn’t go out?
Does the extensive JavaScript prevent this?
If not, what if the intrusive tracking of advertising networks causes a significant portion of end users to block millions of websites (ad tracking IP addresses) with their host files?
That can’t be good for the ad industry?
I feel like they are digging their own hole…
All this prompts only one question : so what ?
News sites, like pretty much any other site on the web, use many tracking tags because they can : it only takes a few minutes to add new tags to a site thanks to tag management solutions, and for the vast majority the tags do not impact the visible load time thanks to asynchronous loading.
A very cheap resource might be wasted in search of minimal benefits, but I fail to see how that is a problem for users or publishers ?
[…] too much into the mass market. But that it needs to stuff its web pages with 50+ trackers (see our last week story on the matter) reveals a hesitant marketing […]
Consider one thing: the page view model calls for a correlated ad impression model. The reason there are so many ad trackers is because publishers are obsessed with fill rate vs experience or vs even optimized revenue per impression.
Let’s throw as many trackers as we can to catch as many ad fishes as we can.
I would be curious to see how this plays out on mobile native apps (not mobile web pages). I suspect this is not as bad because the opportunities to display ads are more reduced.
FF I found a very interesting post about the very same topic. Also interesting was Owen Williams’s comment on that post. Leaving this her for your consideration
http://blog.lmorchard.com/2015/07/22/the-verge-web-sucks/
[…] didn’t last long. In two previous Monday Notes (News Sites Are Fatter and Slower Than Ever and 20 Home Pages, 500 Trackers Loaded: Media Succumbs to Monitoring Frenzy), my compadre Frédéric Filloux cast a harsh light on bloated, prying pages. Web publishers insert […]
Frederic – Great piece. I went a bit deeper here on LinkedIn: http://mygho.st/wp, as there’s more than meets the eye from your analysis.
Scott Meyer
CEO, Ghostery
Scott (at) Ghostery (dot) com