Last Friday, at the Apple Store near the Paris Opera House, I paid my annual Microsoft tax: €140 ($194) for the 2011 edition of Microsoft Office. My hopes: more speed, less bugs, and smarter features. All in the service of producing all manners of text and presentations required by my multiple jobs. So far, no mind-blowing features, nothing more than a superficial makeover.
To look at this new iteration of Word, I use the framework built on my experience of Microsoft’s R&D effort. A few months ago, I spent three days at the Microsoft Tech Fest in Redmond. At first, I felt like a kid in a candy store, chatting with some of Microsoft Research 900 plus PhDs who work on exotic fields such as Machine Learning or Epidemiology. But the amazement subsided and was replaced by doubt: How did this tremendous intellectual firepower actually make a difference in the Microsoft products I’ve been using for 15 years. In fact, Microsoft R&D has very little impact of everyday products. This is but one of Microsoft’s many problems: see the long piece I wrote in Le Monde Magazine.
Let’s go back to the subject of this column. Knowing what I know about Microsoft’s vision of computer science, I had envisioned of a quantum leap for applications I use the most, such as the very word processor on which I’m using “as we speak”. No joy. Let’s ignore the letdown and, instead, speculate a little bit about the next generation of text creation tools branded Microsoft Word, or Apple Pages (which comes with fewer bells and whistles, but is tidier).
First, text creation. One of the biggest challenges, and a growing one, is spelling, syntax, and grammar. In a country such as France, whose language is loaded with utmost (and sometimes absurd) complexity, the quality of writing is in steep decline. For the youngest part of the population, it is accelerated by the demise of a school system where teachers in effect gave up on written language. As for the 30-40 age bracket, the bombardment of daily interactions (email at work, SMS, chat on social networks) has made proper spelling and syntax secondary. Quite often, coming from a manager or even an attorney, you’ll receive a business document riddled with spelling errors well beyond the typos acceptable in a hastily written piece.
Unfortunately, today’s word processors do a very poor job when dealing with mangled spelling and grammar. All of us have in mind examples where the Word application becomes absurdly creative when dealing with the unknown: regardless of context, and with no learning capabilities whatsoever, Word will stubbornly keep suggesting an alternate spelling instead of simply skipping an unrecognized term.
Let’s dream for a moment; let’s picture what a text processing software could look like in the light of existing technologies.
When I install my 2013 version of MS Word or Apple Pages, it asks me to load a “reference corpus” of texts it will learn from. Since I write both in French and in English, I will feed the app with the final versions (edited, and proof-read) of articles I published and I’m comfortable with. Grammar and syntax will be helpful for English and thesauruses will be used for both. Since I currently write about media and technology, the application dictionary will soon be filled with the names of people, places, companies I mention, as well as with the technical jargon I allow myself to use. Alternately, if I don’t want to feed the word processor with my own writings, I can direct it to URLs of texts I find trustworthy: great newspapers, magazines, or academic papers…
Similarly, a lawyer or a doctor will feed the word processor with texts (from his own production, or found online) to be used as reference for professional vocabulary and turns of phrase. In my dream, third-party software vendors have seen a business opportunity: they sell industry- or occupation-specific plugins loaded with high-quality reference corpuses. This results in reliable auto-correct for Word and Pages. Some vendors even provide their corpuses as on-line subscriptions, constantly updated with state-of-the art content.
Then, as I write, the application watches my typing and matches it against the relevant corpus. Instead of relying on rigid hit-or-miss grammatical rules, it uses a statistical algorithm to analyze a word, or a group of words within their context of intended or inferred meaning. Take this gross mistake: “GM increased its sails by 10 percent”. The word is spelled correctly but, in this context, wrong. Because it lacks a context in which to detect the misspelling, the 1998-vintage word processor won’t change “sails” into “sales”. Conversely, the 2013 statistical-based language model flags the mistake by using the proper body of reference to see that “sails” is unlikely in an auto industry context.
Just a year ago, Google introduced Wave, an ambitious reinvention of email, seemingly ahead of its time. Among other advances, Wave featured a spectacular implementation of Google’s huge statistical model of language. In this video (go to the 45th minute) you’ll see Google Wave’s product manager Lars Rasmussen type the following sentences: “Can I have some been soup? It has bean a long time. Icland is an icland”, etc. Each time, the software automagically corrects the mistakes as they are typed, confident in the power of its algorithm and of its immense body of reference. This statistical approach works with gross, obvious mistakes, but also with more subtle ones.
Of course, I am aware of the difficulties in applying statistical language models to personal software: such algorithms are bandwidth and CPU intensive. This could explain why Google did not deploy the Wave spelling demonstrator on Gmail, or on Google docs. But the underlying algorithms do exist. A less sophisticated version, limited to professional dictionaries and thesauruses at first, could be fantastically helpful in properly spelling Zhengzhou, if you happen to write about Asia, or Neuroborreliosis, if you are a medical student.
Second, the use of texts. A significant proportion of writings goes to blogs and other social environments. As a serious user of the WordPress platform [today’s Word can’t even change WordPress into the correct WordPress, I had to check on Google…], I would gladly pay for a Word or Pages plug-in allowing me to compose a clean post with text, images, tables, links, typographical enrichments and, when done, letting me click “publish on my blog” or “send it to the mailing list”. No more cut & paste surprises or image resizing headaches. The word processor plug-in could be provided by the same developer who designed the style sheet (CSS) for my WordPress (or Blogspot, or TypePad) site. Or I could go for the auto-settings by inserting the CSS code in the plug-in that will, in turn adjust the word processor’s dials, from fonts and sizes, to background colors, etc.
You get my point: self-correcting spelling systems that guarantee (or at least vastly improve) decent grammar, syntax and the proper spelling of nouns and names can be a huge improvement for all professional writers – especially in a globalized economy where a greater number of us produce documents in a foreign language. Such auto-correct systems can even offer educational value in helping bloggers improve their basic writing skills.
I’m writing this on Word version 14 (yes, fourteen). How long will I have to wait for this quantum leap, Mr. Ballmer? Or Mr. Jobs?
—frederic.filloux@mondaynote.com
Related columns:
- That word again: Open TweetThe Other Steve, Microsoft’s Ballmer, just treated us to another paean to open systems. This was last week at the Churchill Club, a Silicon Valley schmoozing institution. There, we meet, gossip, drink, dine and watch a never ending and never boring parade of industry figures submitting themselves to soft-ball interviews by local notables of suitable [...]...
- Processors: More, yes, but better? TweetLast week’s Intel Developers’ Forum brought the expected crop of new CPU chips. The simplest way to summarize what’s taking place is this: We’re stuck at 3GHz, so we add more processors on the CPU chip. Intel continues to lead with small “geometries”, 32 nanometers today, 22 nm tomorrow. The company pitches its x-86 processors [...]...
- Technology / Multicore Processors: More is Better, Right? TweetLies, damned lies and benchmarks. So goes an old industry joke setting up an ascending order of offenses to the truth. Old joke but alive and well in the latest industry trend: the recourse to multicore processors in our PCs. Here, multicore means several processor modules (cores) on the same CPU (Central Processing Unit) chip, [...]...






19 Comments
Interesting thoughts. A company called WordPlace Inc. (founded in the 1990s by one of the people who ran WordPerfect Corp.) tried to do something like that. The result was a word processor called YeahWrite. It still exists (see http://www.yeahwrite.com), but it sadly never caught on.
Personally, I like the TextEdit app that comes for free with your Mac. YeahWrite is only for windows and unfortunately, I refuse to use or try to use Windows of any flavor. We don’t need more features on these apps. We need many module that can be added on for the needs of different users. The nascent app store that Apple will offer for the Mac looks to be a likely source for what you are looking for Fred!
You can apply a CSS to a document if you save the document as HTML first. Sadly, you cannot apply a CSS to a .docx directly. It’s nice to be able to edit a blog entry in the style in which it will appear. If you enable the ‘Developer’ tab in the ribbon via ‘Word Options’, there’s the ‘Document Template’ button where you can link in a cascading style sheet, if you have the .css file from your blog somewhere on your computer.
As for the second part. Maybe I got you wrong but it seems to me that this is exactly what Windows Live writer is doing?
OK, I’m only being half serious, but I look at bad grammar and bad spelling as a marker for bad thinking. Instead of apps to correct this, I’d like to see an app that could detect the spelling and grammar and apply a rating. Then, based on evaluation of the rate, the app could accept a level I deem a minimal oversight or reject content that exceeds the threshold I feel represents negligence and poor thought.
As you noted, schools are doing a poor job of teaching spelling and grammar. They seem to be doing an even worse job teaching critical thinking. Sorry, Apple, there’s no app for that!
it’s about time to talk about the revolutionary concept of the Opalesup opensource software (http://bit.ly/eVPLd). Developed by the French University in Compiegne, it’s built about the idea of versatile writing as you detach content from style sheets and generate on the fly the text editor, powerpoint-like or web publishing versions.
I came across the solution as I needed to develop reusable contents for blended learning. It’s still in its infancy due to limited financial resource. Microsoft has built a success in the last 10 yrs around buying or incubating others’ ideas. It’s a pity to learn that Office 2011 will not provide editorial chains when the company masters so many of its components…
When I type “Can I have some been soup? It has bean a long time. Iceland is an iceland” in my version Word 2007, “been”, “bean” and “iceland” are ‘squiggled’, so I am confused by your post. Did Microsoft take their contextual speller out? I’ve found it to be super valuable! Thanks!
Now if we can just get today’s young people to stop writing phrases like “Apple are having a press conference today.” (Or “GM increased their sales by 10 percent” in your example. I won’t bother to get too excited about the general laziness of not knowing when to use “its” or “it’s.”) That one drives me crazy. The second I read that, the writer’s credibility just took a hit. Companies are singular entities, not plural entities. Unfortunately, no matter how good you make software, there is still no substitute for knowing what is correct to begin with. “Helpful” software can also hurt. I can’t help but notice the more my aging father and my two nephews rely on their software to fix their mistakes, the worse their language skills become.
“Microsoft R&D has very little impact of everyday products.”
I disagree. Looks to me like MS R&D designs most Microsoft products. Their UI’s are pathetic – only an MS engineer could love ‘em.
Everyone has to have Office! We love it!
At home I use a Mac and rarely need anything more complex than than TextEdit unless I’m authoring or refereeing a scientific paper.
At work, I recently became acquainted with a newer version of MS Office for Windows having previously used– I don’t know– maybe Office 2000. I was absolutely appalled at how they had taken an already opaque and overly-complex product suite and made it ten times worse. Now, they have managed to hide all the menus and have made it very difficult indeed to find any feature you want. No doubt they did this seeking a more streamlined appearance, but, for goodness sakes, when I bother with a real, full-featured Office suite it is because I have complex work to do. I like to be able to find the features. Maybe the most glaring example is the little “Windows” symbol in the top left of documents where they hide the open/save/print commands. It took me quite a while to find these basic functions because the Windows icon in no way offers the user a clue that it is anything other than a decoration.
Wild, we were thinking on nearly the same sides of thought. The idea of a word processor hasn’t really evolved all that much, and much of what has been added into the “applications” has amounted into the kinds of things that are either better served in fully connected environments, or have been better served by other (silted) applications.
I think that Wave was and still is a great idea not only for collaboration (seemingly its intent) but also for documents that dont’t have a silted meaning. In other words, you began this blog post nearly the same way you’ve begun a word document, a spreadsheet, or a form. It is basically the intent of the document and metadata attached with it that changes its definition. If document applications started from there, I wonder were we’d be (besides the browser).
On another note, when you look at the entire space of inputing text, and then look at paradigms such as mobiles, tablets, and traditional PCs, there’s a different approach to the style and method of input that isn’t much thought about. My thinking recently is challenged on this end because I live on my mobile and am working on that input puzzle on my iPad. Its interesting that not too many folks are solving this widely, and at the same time, there are issues of ingrained behaviors that also have to be dealt with before a change can happen.
@jsk: you’re hard to take seriouly, considering you lack the basic knowledge that there is an English usage outside that of your own. http://en.wikipedia.org/wiki/American_and_British_English_differences#Formal_and_notional_agreement
Grow up!
For:
> All in the service of producing all manners of text
Read:
> All in the service of producting all manner of text
The English grammer in your copy is fairly frequently erroneous. Perhaps because you have the ability to correct the grammar, but not the time to do so. Or because you apply French grammar to your English copy. Publishing is about multiplication and multiplication is about spelling, specifically, spelling that is self-consistent, and spelling that is self-explanatory. In the traditional publishing trade, the corrector checked the printing surface, the rubricator inked the printing surface, and the torquator pulled the platen that pressed the paper onto the printing surface. The Linotype CEO travelled to New York in 1986 to celebrate the first Linotype to be installed a century before at the New York Tribune, and to place three mechanical Linotypes in the Smithsonian Institution. In 1980, there were perhaps 120,000 sites world wide where type could be composed. In 1992 when Microsoft implemented Apple TrueType, there were perhaps 120,000,000 sites world wide. Today, there are perhaps 1.2 billion. This is the impact of graphic information processing.
The issue in twenty-first century image writing is that there is only the author and the audience. In twentieth century impact writing, there used to be the author of the text, the artisan who corrected and composed the text for the printing surface, the archivist who catalogued the corrected and composed text, and the audience that accessed the catalogued text. Is the answer to add value through intelligent application software, or is the answer to add value through interactivity for character input and character identification in a world where glyph identification is less and less important (the audience doesn’t see the author’s choice of glyphs until the document has been found by its content of character information). Probably a bit of the one and a bit of the other.
If the input of character information is not correct, then searching, spelling and sorting collapses. After two centuries of compulsory education, it is still the case that 20% of adult Danes are functionally illiterate. This is in part due to social heritage and in part to problems in industrial design, that is, through the introduction of graphic information processing 1980-2000, industrial design focussed on glyph identification and not on characte identification.
Add to this that applications are too big; that input methods are not interactive (e.g. on a Nokia keyboard Danish and Norwegian æÆ, øØ, åÅ are many levels down, never mind Swedish, German or something as exotic as French); and that communication protocols from upper case ASCII/ISO646 for MS DOS file names in 1981, to upper and lower case ASCII/ISO646 for domain company names in 1989, to upper and lower case ASCII/ISO646 long character identifiers in ISO10646 don’t permit natural language identification of character identification.
Best wishes,
Henrik Holmegaard
Let’s not forget that WordPress has also invested in an artificial intelligence plugin for that matter. It works pretty well after a few tests:
http://wordpress.org/extend/plugins/after-the-deadline/
Let me go farther, I’d like to see a good voice to text software with the features you describe in your post. Simple settings, just like one creates a webpage with metadata (keywords, meta description), that defines the context of my future document and that helps find the right “reference corpus”.
The perfect dictation software!
You correctly point out that a word processor based around statistical analysis of word use would require extensive processing, although adding AI would reduce the need for processing power per user over time.
Still, I think your article, while well intentioned, misses the real point. If people doing writing have such poor writing skills that they can’t differentiate between the meaning of “been” and “bean,” as an employer, I don’t want them working for me. As a client, I wouldn’t want to work with such an ill-trained person.
The problem is not with word processing software, it is with the failure of educational systems to teach children and adults spelling and grammar. Let’s put our money where it really belongs, teaching children language skills, not adding CSS export features to a word processing document to further lower excellence in web site design.
I was writing about this a few months ago; of sorts.
I’m not impressed with any particular one right now, nor service. There are so MANY things that could be helping people write. But Microsoft just seems to be focusing on how can we export and share it..
erm… How about composing it?! While dictionaries and grammar important, I believe helping the users to use NEW words [like from a thesaurus] as they write would be a nice improvement.
Over the years, it seems Microsoft prices their Microsoft Office packages too high, and the users really get WAY too little for what they purchase— they should be getting more.
Better presentation, better compositions, helping the writer get better, helping students learn and expanding their minds…
Alas, that’s not Microsoft. If all I needed was a spelling and grammar checkers, I can get those much cheaper, and just use WordPad.
Thanks for spilling your head.
Until next time,
Larry Henry Jr.
Great weblog here! Also your website rather a lot up very fast! What host are you the usage of? Can I am getting your affiliate link for your host? I desire my site loaded up as quickly as yours lol
One Trackback
[...] a question that’s bugging the Monday Note’s Frederic Filloux as he pays his ‘Microsoft tax’ this year. His beef, though, is not about basic [...]