Encapsulating chatter (or why we don’t trust pixies)
I’ve been spending a lot of time playing about with the Twitter API recently, putting together a couple of quick projects which are partially powered by sentiment AI and partially just fun(tm). Along my wanders I appear to be accruing Twitter followers at a rate that, while modest by comparison to most people, means my time spent monitoring the platform and simply engaging in meta activity (checking out new followers etc) is growing.
It gets to the point where, with an unfiltered stream-of-twitter-consciousness, I spend far too long first thing in the morning just catching up on what people have said while I was asleep (a side effect of following Americans and insomniacs). It’s a bit like reading vast numbers of RSS feeds (another habit of mine); my eyes glaze over halfway through and I barely skim read most of the messages, although fortunately due to years of top-notch academic training I’m able to skim read like a pro. Oh yes.
Oh, but a service like filttr would be perfect for me, I hear you cry. A secret, intelligent algorithm that figures out what I want to read so all I have to do in the morning is sit back and trust the magic pixies. Well, that’s the sticking point for me and a lot of people, I guess — and possibly a reason why sentiment classification will never work as a means to cut down personal information overload. I’m an algorithm designer, I have entire farms of pixies at my beck and call, and can I encapsulate an algorithm that will show me exactly what I want to read? Most likely not. My inherent distrust of generalisations means there will always be one tweet that isn’t shown that I would have liked to have read — or even if there isn’t, I’ll think there is, and not want to filter.
Instead, I climb the laborious yet enlightening slope of trying to figure out a system that will more or less get me the information I want without any of that pesky “you might have missed something interesting” feeling nagging at me. I use TweetDeck, as many others do, to get a great visualisation of @replies and DMs as well as the follow firehose. Yet my follows are split into logical (ish) groups, and I can make a couple of friend groups that will streamline the chatter quite effectively. The only problem is keeping said groups up to date — I’m constantly adding new people that seem interesting, and if every follow results in altering two TweetDeck installations’ group settings, urgh. Yet if all new follows end up in the firehose, I haven’t really solved the problem at all.
The other approach which I think will be useful is twofold. Links and conversations. As someone (I forget who) recently noted, as you follow more people on Twitter, you end up being privy to more and more @conversations (assuming you don’t already have ’show all @replies to anyone’ on, which is horrendous to stay on top of and mostly pointless as well). Sometimes it’s useful to join in on these conversations, but at other times, you’re 12 hours late to the party and they’re just not that important. I’ve seen mention of people working on Twitter conversation threading, so I guess this belongs there, but a UI that collapsed conversations (perhaps older than a set time) would save a lot of screen and brain real estate.
Links is the other half of this. It’s quite fun in the morning, while catching up on Twitter, to click on interesting looking links and open them all in tabs then browse through them. Two problems with this. Sometimes people RT the same link and most of the time I forget the link’s origin, so if I want to RT it I can’t give credit. Would pulling out all the links from last night’s twitterstream be useful? I think so, especially if there were some way to associate opened links with the original tweet. Trying to scroll back and see which tinyurl or bit.ly post is actually pointing at the informative Guardian article I have open is a nightmare, but if I had a clean display that pulled out all the links from friends, gave them extra props if linked multiple times, and gathered all relevant comments (possibly including non-friend @replies) while also longurling them for easy back-reference — then hid all the pure-link tweets from last night, I suppose — then URL management might be a little easier.
Of course, a different approach is to manage who I follow. The big question from a NLP/AI point of view here is “can a classifier learn the set {worth following, oh my god no way}?” i.e. if I show an algorithm a twitter account, will it make the same following decision I do?
It seems fairly simple at first pass to jot down a feature set that’s computationally feasible and probably relevant to how I make following decisions:
- userpic present? (more subjective: userpic interesting?)
- twitterer location (Edinburgh, Cambridge or London = likely to be a yes. Rest of world = no preference)
- twitterer language (only follow English tweets, sorry)
- number followers and following (don’t follow followbots; generally follow people with a healthy ratio or people with lots of following i.e. net-celebs)
- volume of updates (don’t follow the super spammy; unlikely to follow people with 1 or 2 updates especially if they plug their product)
- quality of updates (decent number of @replies – person is active participant; lots of URLs to same site – account is a blog bot, unlikely to follow unless I like the blog – ah the irony)
- subject matter, both from bio and from content (if person’s bio matches my interests, likely to be a yes; if person’s recent tweets overlap with areas I’m interested in, similar. maybe they’re using the same hashtag as me, i.e. at the same event.)
- interestingness factor (hardest part to quantify. do they post funny photos? do a lot of people I follow follow them? are they witty? will I benefit from hearing what they had for breakfast?)
Of course, I don’t run through all these factors every time I see a new profile. Unfollowing someone has a low penalty (they might stop following you, oh no!) so it’s very easy to just go “no” at the obvious bots/promoters and “yes” to anyone who seems remotely human and Twitter-savvy, then unfollow them later. But I reckon it would be pretty easy to train up a classifier to decide whether people were worth following, and (a la MrTweet) use shared interests (hello NLP) and friend/follow networks to recommend new people. In fact one of my Google Apps sandbox projects is working on this but it’s not a commercial venture, so I feel quite happy to ramble about it in the hope people will tell me I’m dead wrong and how to fix it.
Another thought which occurs to me when I think about term extraction, frequency and classifying Twitters into subject buckets. Given the total sum of knowledge available to a casual observer (my update history, my network and extended network, my network’s update history) can I use simple clustering techniques to segregate my follow cloud into distinct groups for easier update browsing? I think with some gentle nudging, I could do so for my own network, but establishing an algorithm that could do so for anyone might be more difficult. Perhaps it needs a little Facebook or Friendfeed integration to pull out some more information (I have some seemingly isolated Twitterfriends from university who I guess it’d be hard to cluster at first pass). The question from a commercial point of view is, of course, would anyone use it and would they pay? Generally, as we saw with the magic pixies above, people trust their own judgement better than a computer’s. And yet I think it’d be quite fun to play around with clustering, visualisations and the magic of the Tweetcloud. Especially if we could change the parameters at will to cluster people by similarity (these guys all post loads of links, these all have conversations with each other) as well as overlap in topic/geography/network. Is anyone doing this? Someone must be, surely.
If not, I will.
[Aside: No image, since for some godforsaken reason T-Mobile's mobile broadband service blocks Flickr and then asks for a credit card number over an unsecured connection.]
[Update: Image from dawn_perry]

Recent Comments