On its own, Twitter builds an image for companies; very few are aware of this fact. When a big surprise happens, it is too late: a corporation suddenly sees a facet of its business — most often a looming or developing crisis — flare up on Twitter. As always when a corporation is involved, there is money to be made by converting the problem into an opportunity: Social network intelligence is poised to become a big business.
In theory, when it comes to assessing the social media presence of a brand, Facebook is the place to go. But as brands flock to the dominant social network, the noise becomes overwhelming and the signal — what people really say about the brand — becomes hard to extract.
By comparison, Twitter more swiftly reflects the mood of users of a product or service. Everyone in the marketing/communication field becomes increasingly eager to know what Twitter is saying about a product defect, the perception of a strike or an environmental crisis. Twitter is the echo chamber, the pulse of public feelings. It therefore carries tremendous value.
Datamining Twitter is not trivial. By comparison, diving into newspaper or blog archives is easy; phrases are (usually) well-constructed, names are spelled in full, slang words and just-invented jargon are relatively rare. By contrast, on Twitter, the 140 characters limit forces a great deal of creativity. The Twitter lingo constantly evolves, new names and characterizations flare up all the time, which excludes straightforward full-text analysis. The 250 million tweets per day are a moving target. A reliable quantitative analysis of the current mood is a big challenge.
Companies such as DataSift (launched last month) exploit the Twitter fire hose by relying on the 40-plus metadata included in a post. Because, in case you didn’t know it, an innocent looking tweet like this one…
…is a rich trove of data. A year ago, Raffi Krikorian, a developer on Twitter’s API Platform team (spotted thanks to this story in ReadWriteWeb) revealed what lies behind the 140 characters. The image below…
…is a tear-down of a much larger one (here, on Krikorian’s blog) showing the depth of metadata associated to a tweet. Each comes with information such as the author’s biography, level of engagement, popularity, assiduity, location (which can be quite precise in the case of a geotagged hotspot), etc. In this WiredUK interview, DataSift’s founder Nick Halstead mentions the example of people tweeting from Starbucks cafés:
I have recorded literally everything over the last few months about people checking in to Starbucks. They don’t need to say they’re in Starbucks, they can just be inside a location that is Starbucks, it may be people allowing Twitter to record where their geolocation is. So, I can tell you the average age of people who check into Starbucks in the UK.
Companies can come along and say: “I am a retail chain, if I supply you with the geodata of where all my stores are, tell me what people are saying when they’re near it, or in it”. Some stores don’t get a huge number of check-ins, but on aggregate over a month it’s very rare you can’t get a good sampling.
Well, think about it next time you tweet from a Starbucks.
DataSift further refined its service by teaming up with Lexalytics, a firm specialized in the new field of “sentiment analysis“, which measures the emotional tone of a text — very useful to assess the perception of a brand or a product.
Mesagragh, a Paris-based startup with a beachhead in California plans a different approach. Instead of trying to guess the feeling of a Twitter crowd, it will create a web of connections between people, terms and concepts. Put another way, it creates a “structured serendipity” in which the user will naturally expand the scope of a search way beyond the original query. Through its web-based application called Meaningly, Mesagraph is set to start a private beta this week, and a public one next January.
Here is how Meaningly works: It starts with the timeline of tens of thousands Twitter feeds. When someone registers, Meaningly will crawl his Twitter timeline and add a second layer composed by the people the new user follows. It can grow very quickly. In this ever expanding corpus of twitterers, Meaningly detects the influencers, i.e. the people more likely to be mentioned, retweeted, and who have the largest number of qualified followers. To do so, the algorithm applies an “influence index” based on specialized outlets such as Klout or Peer Index that measure someone’s influence on social medias. (I have reservations regarding the actual value of such secret sauces: I see insightful people I follow lag well behind compulsive self-promoters.) Still, such metrics are used by Meaningly to reinforce a recommendation.
Then, there is the search process. To solve the problem of the ever morphing vernacular used on Twitter, Mesagraph opted to rely on Wikipedia (in English) to analyze the data it targets. Why Wikipedia? Because it’s vast (736,000 subjects), it’s constantly updated (including with the trendiest parlance), it’s linked, it’s copyright-free. From it, Mesagraph’s crew extracted a first batch of 200,000 topics.
To find tweets on a particular subject, you first fill the usual search box; Meaningly will propose a list of predefined topics, some expressed with its own terminology; then it will show a list of tweets based on the people you’re following, the people they follow, and “influencers” detected by Meaningly’s recommendation engine. Each Tweet comes with a set of tags derived from the algorithm mapping table. These tags will help to further refine the search with terms users would have not thought of. Naturally, it is possible to create all sorts of custom queries that will capture relevant tweets as they show up; it will then create a specific timeline of tweets pertaining to the subject. At least that’s the idea; the pre-beta version I had access to last week only gave me a sketchy view of the service’s performances. I will do a full test-drive in due course.
Datamining Tweeter has great potential for the news business. Think of it: instead of painstakingly building a list of relevant people who sometimes prattle endlessly, you’ll capture in your web of interests only the relevant tweets produced by your group and the group it follows, all adding-up in real-time. This could be a great tool to follow developing stories and enhance live coverage. A permanent, precise and noise-free view of what’s hot on Twitter is a key component of the 360° view of the web every media should now offer.
—frederic.filloux@mondaynote.com
Related columns:
- Twitter, Facebook and Apps Scams TweetHere is the latest Twitter scam I’ve heard this week. Consider two fictitious media, the Gazette and the Tribune operating on the same market, targeting the same demographics, competing fort the same online eyeballs (and the brains behind those). Our two online papers rely on four key traffic drivers: Their own editorial efforts, aimed at [...]...
- Trifling Twitter TweetWhen a member of the old guard barges into their cozy backyard, the Digerati jump up and strike indignant poses. And when the intruder’s point is missed, its author gets crucified. This is what happened to Bill Keller, the New York Times’ executive editor, when he dared to write a column critical of Twitter. In [...]...
- The Discreet Shift to Twitter TweetYou hear things about Facebook. You see things. As its audience matures, a subtle shift might be underway. Of course, numbers remains staggering. Facebook is heading toward the 800 million users mark, mostly by conquering new markets. The growth is distributed as follows : Middle-East Africa, Asia-Pacific and Latin America grow by around 60% per [...]...







6 Comments
Hi Frédéric,
Another great article, as always on the intersection of publishing and social media. May I recommend you take a look at Conde Naste’s Reddit.com? It is as an older, but fast growing (revived) online community where prediction and sentiment modeling may be actually more compelling because of the movement of entire communities of subreddits at a time.
This is something that’s not easily tracked or measured on Twitter.
All the best,
Taariq
I’m a bit bewildered by this post. You’re just discovering social media monitoring? Sentiment analysis is an ‘emerging field’? I worked a for a company doing these things in 2008 (SM2 from Techrigy). This was the year when these capabilities exploded in usage with companies like Radian 6 (acquired by Salesforce two years ago for $326 million) emerging.
Honestly, because you’ve discovered something, it doesn’t mean it’s new. This is a fundamental problem with publishing and media: those who should be in the forefront of trends are not even aware of them. Write about that. And do your homework.
And, btw, if you think Twitter has metadata, look at Facebook. It has at least 1000 times as much metadata. And that metadata has a far longer life span than any mercurial Tweet stream.
good post,thank you share it,i like it very much
Great story!
Hi Sir/Madam
I would be grateful if you could spare a few minutes to complete this questionnaire. This questionnaire attempt to measure the impact of data mining on consumer satisfaction
Please after done press “submit” button..Thank in advance
best regard
Student : diaa ali
Hey there! I just wanted to ask if you ever have any trouble with
hackers? My last blog (wordpress) was hacked and I ended up losing many months of hard work due to
no backup. Do you have any solutions to prevent hackers?
5 Trackbacks
[...] …is a rich trove of data. A year ago, Raffi Krikorian, a developer on Twitter’s API Platform team (spotted thanks to this story in ReadWriteWeb) revealed what lies behind the 140 characters. The image below… Datamining Twitter | Monday Note [...]
[...] Datamining Twitter, http://www.mondaynote.com [...]
[...] Datamining Twitter | Monday Note [...]
[...] Twitter bit.ly/rWz6sv via Monday [...]
[...] full story Advertisement LD_AddCustomAttr("AdOpt", "1"); LD_AddCustomAttr("Origin", "other"); [...]