New field of sentiment analysis is crunching emotions into hard data to serve the bottom line
Computers may be good at crunching numbers, but can they crunch feelings?
The rise of blogs and social networks has fuelled a bull market in personal opinion: reviews, ratings, recommendations and other forms of online expression. For computer scientists, this fast-growing mountain of data is opening a tantalizing window into the collective consciousness of Internet users.
An emerging field known as sentiment analysis is taking shape around one of the computer world’s unexplored frontiers: translating the vagaries of human emotion into hard data.
This is more than just an interesting programming exercise. For many businesses, online opinion has turned into a kind of virtual currency that can make or break a product in the marketplace.
Yet many companies struggle to make sense of the caterwaul of complaints and compliments that now swirl around their products online.
As sentiment analysis tools begin to take shape, they could not only help businesses improve their bottom lines, but also eventually transform the experience of searching for information online.
Several new sentiment analysis companies are trying to tap into the growing business interest in what is being said online.
“Social media used to be this cute project for 25-year-old consultants,” said Margaret Francis, vice-president for product at Scout Labs in San Francisco. Now, she said, “top executives are recognizing it as an incredibly rich vein of market intelligence.”
Scout Labs, backed by the venture capital firm started by the CNet founder Halsey Minor, recently introduced a subscription service that allows customers to monitor blogs, news articles, online forums and social networking sites for trends in opinions about products, services or topics in the news.
In early May, the ticket marketplace StubHub used Scout Labs’ monitoring tool to identify a sudden surge of negative blog sentiment after rain delayed a Yankees-Red Sox game.
Stadium officials mistakenly told hundreds of fans that the game had been cancelled, and StubHub denied fans’ requests for refunds, on the grounds that the game had actually been played. But after spotting trouble brewing online, the company offered discounts and credits to the affected fans. It is now re-evaluating its bad-weather policy.
“This is a canary in a coal mine for us,” said John Whelan, StubHub’s director of customer service.
Jodange, based in Yonkers, N.Y., offers a service geared toward online publishers that lets the firm incorporate opinion data drawn from more than 450,000 sources, including mainstream news sources, blogs and Twitter.
Based on research by Claire Cardie, a Cornell University computer science professor, and Jan Wiebe of the University of Pittsburgh, the service uses a sophisticated algorithm that not only evaluates sentiments about particular topics, but also identifies the most influential opinion holders.
Jodange, whose early investors include the National Science Foundation, is currently working on a new algorithm that could use opinion data to predict future developments, like forecasting the impact of newspaper editorials on a company’s stock price.
In a similar vein, The Financial Times recently introduced Newssift, an experimental program that tracks sentiments about business topics in the news, coupled with a specialized search engine that allows users to organize their queries by topic, organization, place, person and theme.
Using Newssift, a search for Wal-Mart reveals that recent sentiment about the company is running positive by a ratio of slightly better than two to one. When that search is refined with the suggested term “Labour Force and Unions,” however, the ratio of positive to negative sentiments drops closer to one to one.
Such tools could help companies pinpoint the effect of specific issues on customer perceptions, helping them respond with appropriate marketing and public-relations strategies.
For casual web surfers, simpler incarnations of sentiment analysis are sprouting up in the form of lightweight tools like Tweetfeel, Twendz and Twitrratr. These sites allow users to take the pulse of Twitter users about particular topics.
A quick search on Tweetfeel, for example, reveals that 77 per cent of recent tweeters liked the movie Julie & Julia.
But the same search on Twitrratr reveals a few misfires. The site assigned a negative score to a tweet reading “julie and julia was truly delightful!!”
That same message ended with “we all felt very hungry afterwards” – and the system took the word “hungry” to indicate a negative sentiment.
While the more advanced algorithms used by Scout Labs, Jodange and Newssift employ advanced analytics to avoid such pitfalls, none of these services works perfectly.
“Our algorithm is about 70 to 80 per cent accurate,” said Francis, who added that its users can reclassify inaccurate results so the system learns from its mistakes.
Translating the slippery stuff of human language into binary values will always be an imperfect science, however.
“Sentiments are very different from conventional facts,” said Seth Grimes, the founder of the suburban Maryland consulting firm Alta Plana, who points to the many cultural factors and linguistic nuances that make it difficult to turn a string of written text into a simple pro or con sentiment. “`Sinful’ is a good thing when applied to chocolate cake,” he said.
The simplest algorithms work by scanning keywords to categorize a statement as positive or negative, based on a simple binary analysis (“love” is good, “hate” is bad).
But that approach fails to capture the subtleties that bring human language to life: irony, sarcasm, slang and other idiomatic expressions.
Reliable sentiment analysis requires parsing many linguistic shades of grey.