Using sketchy sentiment to pump up your post count

Online 7 April 2010 | 0 Comments

Finally, a post topic that combines both sentiment analysis and the meta-world of professional blogging!

I usually like TechCrunch for the most part, but these two articles have annoyed me: ‘Sentiment is split on the iPad‘ and ‘More iPad Sentiment Analysis‘. Both use poor, crude methods of sentiment analysis to produce posts full of fluff and pretty graphs. Result? Whatever point the blogger wanted to make. (You know what they say about statistics).

A quick rundown of the problems: Spurious classification algorithms, poor data sizes, and non-credible results. An algorithm that analyses every piece of traffic on Twitter and comes up with “51% positive, 49% negative” is Just Plain Wrong. There’s going to be a ton of stuff in the middle, unclassifiable, undecided, even just retweets of blog posts with the word in the title, and any graph should reflect that as well. Stripping out the neutral, a result of 51/49 just seems completely nonsensical to me, and I’ve been working with Twitter sentiment for a long time now.

It’d also be very interesting to know what methods the classifiers use, probably available with some digging, but I fear it’s manual keyword lists that some poor sod had to draw up — “hmm, I think if someone says the iPad is ’stupid’ that’s probably negative, yah?”.

Attensity does better, but what on earth does “not thrilled” mean (weak negative?) and again, where’s the neutral or noise aspect? It’s valuable to know just how many tweets were about the iPad, and how many of those were about sentiment. What if a TechCrunch headline with a negative word got retweeted 2000 times? That’s what we in the trade call “skew”. Plus, classifying on a small sample is just crazy. Why? Surely it can’t be computational limits; were these the only tweets with sentiment information? That’s useful data! Why throw it all away…

It also looks like there are some great leaps in logic in terms of distinguishing between “Like the iPad because it might replace iPhone” and “Don’t like the iPad because it won’t replace my iPhone”. How do you automatically extract the difference between “Can’t replace battery” and praise for the battery life? Sigh.

Plus, there’s the key mistake of not showing error, accuracy bounds, or mistakes. Both posts assume the algorithms are 100% correct. While that makes for some pretty graphs, it just isn’t true, and with no idea of sample size or result size (e.g. for the battery category above) then a result of 5% could just mean one out of a total of twenty tweets with the word battery in was negative. It’s the same for intent to purchase. Not every tweet will have any kind of intent, so if you just took the tweets containing “will” “buy” “iPad” or “won’t” “buy” “iPad”,

Of course, the reason I’m most annoyed at these posts is that I could have helped put together a custom dataset and classifier to provide much more detailed data, and didn’t. But while I can’t go back in time and change things, I can at least point out the flaws in using off the shelf graphs to meet your daily post quota as a pro-blogger.

Tagged in , , , ,

The future of journalism

Startups 17 August 2009 | 1 Comment

will_lion on flickrSilicon Valley-based seed fund and incubator Y-Combinator have started nudging smart people without ideas in the direction of a few pet ideas they’re keen to fund. While I think on one level this is a great idea, I also can see a few problems: people abandoning existing half-formed ideas to pursue something they think has greater chance of being funded, people shoehorning themselves into topics off-kilter with their actual instincts and skills, and people trying to game the system by applying with a ‘request for startups’ idea but turning it into something else once funded.

Fortunately, the guys behind YC are pretty smart, and have almost certainly thought of more ways this could go wrong than I have. On the “great idea” level it is definitely a good way to 1. learn how the guys with the money and experience actually think, and 2. encourage people to focus on something worthwhile — though, as the HN comments say, if you don’t already have a few ideas of your own then are you really the right sort of person to be jumping into the startup world?

However, above all, these ideas get people thinking. The HN comments are already raging, the TC commenters are partly missing the point and partly supportive, and I expect to see plenty more discussion on the topic as more ideas to be funded emerge.

As to the RFS1 itself – ‘the future of journalism’ – it’s interesting to see people who are already getting it, and others who are already shooting far off the mark. At the moment, journalism’s trying to marry the need to make money with processes and a business model that stem back to the printing press, evolving over the years but still firmly rooted in a concept that people nowadays have trouble dealing with — paying for stuff.

It’s actually quite interesting to think how we’re innovating in this space – almost accidentally. FestBuzz is competing with several types of journalism, online and off; some of our ideas for later down the line marry different types of media, new and old, with data-mining and magic to do cool stuff. The thing about FestBuzz in particular is we found a way to make money off something that’s free to consumers without using advertising as our main source of income. I can’t help but think that some of the lessons we’re learning about how to bridge the print and online industries, how to deal with information producers and consumers, how to make information free to all in a way they want to consume it… all of that fits right in with this idea of reinventing journalism as something that might actually make money rather than die out.

But enough about us. There are plenty of other business models and ideas floating around to kickstart any thoughts you might be having on reinventing journalism starting with the need to make money, not the assumption that people will pay:

Pro-blogging.
Obviously a subject dear to my heart, this ticks the box of ‘paying people to write content’ which is something most journalists like to hear, but on the other hand: reduced barrier to entry, low pay rates, constant small trickle of content is more rewarding than occasional big articles (so the concept of a feature/column is somewhat worn away) – yet big flashy content is needed to attract viewers through viral means (digg etc). Less of a focus on daily news and current affairs, partly due to reduced access and budgets to cover them.

Citizen journalism.
A poncy term for “people on Twitter are on the scene of breaking news first”. People submit their news/pictures to a central site, news agencies pay a subscription fee, extra for exclusivity, some of which goes back to the citizen journalists cited/used in a story. Reduces costs for news organisations to have people on the ground in key locations, democratises news, allowing bloggers and online-only organisations to cover breaking news too, but still somewhat reliant on current business models.

Subscription based news access.
Either online or via Kindle/mobile/iPhone, a centralised news gateway that you pay for (possibly freemium). Challenge: Convincing people to pay, and working out where the money goes. Multiple approaches here: personalised aggregation, collaborative news filtering, topic-specific news streams; access to ‘professional’ articles consolidated into one place; cross-media integration with paper headlines, multimedia, known brands; cross-platform access i.e. RSS on steroids, including filtered twitter, facebook, etc, streams.

Would you pay for an iPhone version of HN tailored to your own preferences (no articles on Erlang for me, please!)? Would you pay for a paper, virtual or physical, that consolidated the best of the day’s current affairs as voted by other people – the Times’ political commentary with the Guardian’s media coverage, the FT’s straight-faced finance with a little bit of Daily Mail celebrity-spotting sprinkled in for tea breaks? Would you pay to consume RSS as you do today but with the ability to collaboratively view it, chatting with other readers? The problem is to most of these the answer is never going to be an immediate ‘yes!’. Maybe you’d get a hit from sales of the iPhone app or other one-off costs, but many things along these lines have been tried and have failed admirably.

Topic-specific physical news.
Instead of paying £x for a paper which you don’t read half of, pay £x/2 for two halves of a different paper and build your own. Again, fairly linked to current models, but a sort of physical hybrid of the stuff above with the need (or desire) to consume a dead-tree version.

The final point for today (I could go on all afternoon, but there’s work to be done!) is on how to think about this stuff.

The above are all ideas I’ve been thinking about for a while, in one form or another – and you can tell where my recent thoughts have been focused. But if you’re set on reinventing an industry, you don’t start from an idea or an application, you start from the industry itself. How does journalism work? How do journalists and news providers make money? What do people consume? What do they pay for (note, they may not be paying in money, but in clicks, in eyeballs, in time..)? What other data can you get about the way people consume news and media, and the way it’s delivered to them? Where are the weaknesses in the value chain? Why does the business model look the way it does? (Hint, it’s in pg’s post.)

Then think about how people of 2010 (not that far away any more) might consume news. What’s different? What happens if the news organisations go out of business, or go online-only? Are their current sources of income viable long-term? Short-term? What other ways can they make money? What do they own and what can they sell? How do journalists get paid? How else can they get paid? Who else can write news? Who else can deliver news? Does ‘news’ mean what’s happening now, or anything that someone somewhere finds current and interesting? How do news organisations gain and retain credibility? How do companies and celebrities rely on the news machine to make money and gain fame? How do paparrazzi and news photographers fit in? What would the world look like if there was only one video news channel in each country? What levels of competition and collaboration are necessary to keep ‘good’ reporting alive? What can someone with a BBC badge do that someone with a ‘my-blog.co.uk’ business card can’t? Why?

So many questions. Have a cup of tea and a think. And in, ooh, about two or three months, maybe, I’ll write more about my personal view on these things and how what I’m doing at the moment fits into the picture.

Tagged in , , , , , , , ,