The PDF document format is digital publishing’s worst enemy. For a large part, the news industry still relies on this 18-year-old format to sell its content online. PDF is to e-publishing what the steam locomotive is to the high-speed train. In our business, progress is called XML and HTML5.

Picture today’s smartphone reading experience. We’ll start with a newspaper purchased on a digital kiosk. For a broadsheet, a format still largely used by dailies, the phone’s “window” covers 1/60th of the paper’s page. Multiply by 30 pages of news. You’ll need 1800 pans and zooms to cover the entire publication (plus, each time time you pinch out, you can take a leisurely sip of your coffee as the image redraws).

Next, we have two iPhone screen captures of American Photo, purchased on Zinio. The more compact magazine format doesn’t help. Note that you need to scroll laterally to read a full line (as for the “Text” function, meant to insure easier reading, it is ineffective) :

Am I being too derisive, or can we say this is not the best way to read?

The battle for online news will be won on mobility. We’re just at the beginning of the smartphone era. We can count on better screens, faster processors combined to extended battery life, more storage, better networks… The bulk of news consumption will come from people on the move, demanding constant updates and taking a quick glance at what is stored in their mobile device — regardless of networks conditions. Speed, lightness and versatility will be key success factors. There won’t be much tolerance for latency.

In that respect, PDF is just a lame duck.

Back in 1993, the Portable Document Format was a fantastic digital publishing breakthrough. All of a sudden, using a sophisticated mathematical description of images, texts, typefaces, layout elements, the most complex graphic creation could be encapsulated into a single file. Large font sets and dedicated software were no longer needed. The PDF reader, licensed from Adobe Systems under the name of Acrobat, soon became free or pre-loaded in various OS platform. PDF became an open standard in 2008. As for the performance, it was stunning: see a 6400% magnification below:

Great for high-quality book publishing… And a completely pointless stunt for a mobile news product.

The newspaper industry jumped on PDF. The new format let a production crew send the full publication to the printing plant using huge, high definition PDF files directly transferred to the printing plates. When the web arose, the industry kept using the same format to make the publication available for downloading. After years of file optimization, a newspaper or a magazine still weighs 20 to 50 megabytes. The download is manageable over ADSL or cable, but impractical on a mobile network. But wait, it can get worse: on the Android platform, for example, the reader can actually ad weight to the original PDF file. This is the consequence of a good intention: giving the publisher the choice between a finished product that is easier to leaf through, but requires a heavier file, and one that downloads faster, but is more difficult to read.

Publishers’ inclination to keep using PDF is based on one idea: the graphical elements of a publication — layout, typefaces — are an essential component of a printed brand. By extension this visual identity is seen as a “label of trust” for the news brand, with the design-perfect PDF being the medium of choice.

Now, three things:
#1, this widely shared assertion is not supported by strong facts. There is no survey (to my knowledge) that links visual identity to reader loyalty, to feelings of trust;
#2, on this matter, if there remains any lingering bond with readers, it will fade away with the new generation of news consumers: they are much less sensitive than their elders to the notion of “trusted brand”, let alone to any design associated to it;
#3, the web has evolved. The HMTL5 standard has shown the ability to render any graphic design without the PDF format’s downsides (see this previous Monday Note: Rebooting Web Publishing Design).

Why not, therefore, jumping off the PDF train? The short answer is XML management. Our techiest Monday Note readers will forgive this shortcut: the Extensible Markup Language is a version of the web language readable by both machines and humans. An article encoded in XML is not an image but a set of character strings associated to various “tags” that describe what they are, where they belong; the description also provides contextual information to be retrieved at will. In theory, any publishing system, big or small, should be able to produce clean XML files. It should also be able to generate a “zoning file” that maps the coordinates of a story, or any other element in the page (see the red box below that indicates the position of the story in a newspaper front page). Armed with such position data, smartphone software can provide the right reading experience, limiting the need for the painful panning and zooming I mentioned above.

Unfortunately, no one lives in theory’s wonderland.

In fact, very few newspapers are able to produce usable XML or zoning files. Part of the reason lies in outdated editorial systems that were not designed (not upgraded either) to handle such sophisticated, web-friendly files. IT managers have been slow to embrace the web engineering culture and it didn’t occur to publishers than a “human upgrade” was badly needed deep in the bowels of their company…  (This, by the way, leaves another wide open field to internet pure players and their web-savvy tech teams).

This backwardness has created its own ecosystem… in low-wage countries. Every night, all over the world, highly specialized contractors collect the PDF files of hundreds of newspapers and send them to India, Romania or Madagascar. Down there, it takes a few hours to electronically dismantle the image files and to convert them to dynamic XML text files, with proper tagging and zoning. Thanks to the time difference, the converted static newspaper is sent back to the publishers by dawn, ready to be uploaded on an internet platform, right before the physical version hits the streets.

Many will find these shortcomings appalling. For a large part it is. The good news is the evolution has merely begun. Still, very few publishers realize that upgrading of their production chain is a crucial competitive asset. As for the PDF, it remains immensely useful for many applications, but it is no longer suitable for news content that thrives on nomadic uses.

