Using sketchy sentiment to pump up your post count

Online 7 April 2010 | 0 Comments

Finally, a post topic that combines both sentiment analysis and the meta-world of professional blogging!

I usually like TechCrunch for the most part, but these two articles have annoyed me: ‘Sentiment is split on the iPad‘ and ‘More iPad Sentiment Analysis‘. Both use poor, crude methods of sentiment analysis to produce posts full of fluff and pretty graphs. Result? Whatever point the blogger wanted to make. (You know what they say about statistics).

A quick rundown of the problems: Spurious classification algorithms, poor data sizes, and non-credible results. An algorithm that analyses every piece of traffic on Twitter and comes up with “51% positive, 49% negative” is Just Plain Wrong. There’s going to be a ton of stuff in the middle, unclassifiable, undecided, even just retweets of blog posts with the word in the title, and any graph should reflect that as well. Stripping out the neutral, a result of 51/49 just seems completely nonsensical to me, and I’ve been working with Twitter sentiment for a long time now.

It’d also be very interesting to know what methods the classifiers use, probably available with some digging, but I fear it’s manual keyword lists that some poor sod had to draw up — “hmm, I think if someone says the iPad is ’stupid’ that’s probably negative, yah?”.

Attensity does better, but what on earth does “not thrilled” mean (weak negative?) and again, where’s the neutral or noise aspect? It’s valuable to know just how many tweets were about the iPad, and how many of those were about sentiment. What if a TechCrunch headline with a negative word got retweeted 2000 times? That’s what we in the trade call “skew”. Plus, classifying on a small sample is just crazy. Why? Surely it can’t be computational limits; were these the only tweets with sentiment information? That’s useful data! Why throw it all away…

It also looks like there are some great leaps in logic in terms of distinguishing between “Like the iPad because it might replace iPhone” and “Don’t like the iPad because it won’t replace my iPhone”. How do you automatically extract the difference between “Can’t replace battery” and praise for the battery life? Sigh.

Plus, there’s the key mistake of not showing error, accuracy bounds, or mistakes. Both posts assume the algorithms are 100% correct. While that makes for some pretty graphs, it just isn’t true, and with no idea of sample size or result size (e.g. for the battery category above) then a result of 5% could just mean one out of a total of twenty tweets with the word battery in was negative. It’s the same for intent to purchase. Not every tweet will have any kind of intent, so if you just took the tweets containing “will” “buy” “iPad” or “won’t” “buy” “iPad”,

Of course, the reason I’m most annoyed at these posts is that I could have helped put together a custom dataset and classifier to provide much more detailed data, and didn’t. But while I can’t go back in time and change things, I can at least point out the flaws in using off the shelf graphs to meet your daily post quota as a pro-blogger.

Tagged in , , , ,

Leave a Reply