Link voting: real-time respect

Featured, Online 22 September 2009 | 1 Comment

By clickykbd on flickr

Sometimes life just moves too quickly, y’know?

This post over at RWW is surprisingly thought-provoking for all it’s sponsored. (Aside: What a strange grammatical construction.) I’m not really sure I trust or even believe their random numbers, but the concept of implicit vs explicit voting for sites and the interaction of realtime vs old-school search are both interesting.

Implicit voting

So, implicit voting is where you give a site a silent thumbs-up. The most common way of implicitly voting for a site is just to visit it; this actually works in two ways, the action of clicking to get to the site, and what you do once you’re there. Explicit voting, on the other hand, is where you actively promote the site — for example, by tweeting/retweeting a link to it, or linking to it yourself.

Where does submission to a social news site such as Digg or Hacker News fit in? Well, my first thought is that submitting is explicit voting, but simply voting up (I agree with the submitter that this site is interesting) is implicit. By this matter you could say that retweeting links falls somewhere between implicit and explicit: if you model Twitter as a kind of Digg, with retweets as ‘votes’, you can see the parallels. Is del.icio.us’ing a link implicit or explicit? Bookmarking locally? Linking in IRC?

Anyway, that’s a case of detail.

Tracking explicit voting is fairly easy: look around for mentions of the URL. OK, there’s some magic involved in de-obfuscating and unifying references, but that’s just techie icing. Once you know who’s mentioned the URL and when you can do all sorts of computations to work out some kind of search ranking system. PageRank is just one approach, but there are modifications and things you can borrow from other search algorithms, especially HITS (one of my favourites!), that exploit the social graph as well. If you have more information — perhaps the entire tweet, or blog post, or whatever — you can even do language analysis and add that extra dimension of understanding on to the link. But fundamentally, you’re just looking at links.

It gets a lot more interesting when you try to work out an implicit measurement system. For votes that are click-throughs, there are ways to measure those, although not perfectly: bit.ly statistics, toolbar trackers, etc. For votes that are based within a site, you’re kind of stuck unless you’re a) the site owner or b) embedded in the user’s browser somewhere. The browser is the best place: there, you can measure if the user has it open in a tab for hours untouched, or if they keep flicking to and from it, etc, etc. But by the very nature of such things, you’re going to get a selective set of data. And what about the aforementioned pasting into IRC/IM/email, what about linking the fact I spent thirty minutes on a site with the fact I tweeted it and then I wrote a blog post about it?

It all comes back to user lifestreams, and the fact that today’s communication is far too disjointed for these types of measurements. Which is a shame, I think. Somehow we must be able to combine the wisdom of the crowd with an individual’s self-knowledge: I know that all these sites belong to me, so I know it’s just me voting for the site with a fairly loud mouth. (Unifying the voter isn’t even a necessary step, but I feel it’s important, especially when you consider fun things such as recommendation algorithms and shill detections).

Real-time information

Let’s assume we have some kind of implicit data about links as well as explicit. A key measurement axis we have is time – so we can spot voting spikes, clusters, etc. The long tail is an interesting quandary, though. Do people searching for a term want the most recent/trendy items, or the ones voted most authoritative over time? It depends on the user, and on the search. Even for a trending topic, a user might be searching for the background, not the latest happenings — so you have to offer both, surely, to satisfy user needs. At what point does a short term voting spike become part of a long-term vote? Would a smoothing function of time work?

There’s also the option to embed implicit voting within the search system itself, something like Google’s SearchWiki. If a site provided the information you wanted, you somehow give it a thumbs up. (Of course, users do this explicitly at the moment by tweeting links, though — at least with my own behaviour — that’s not that frequently linked to searching. I’m far more likely to tweet something I’ve browsed to or been sent). This would provide a trackable form of implicit voting, but still nothing near perfect.

User behaviour could be a problem, of course; what would cause a user to vote up a site? Interestingness? Relevance? I vote things up on Hacker News because they’re interesting, but I’d vote things up on Google if they were relevant. In a way, the real-time, sporadic flurry of retweets is a measure of interestingness and timeliness; the time spent on a site is a measure of interestingness and usefulness; the bounce rate and whether it shows up in search results at all is a measure of relevance. What are we measuring? Until we know that, we can’t rank!

The peer-to-peer system proposed by the RWW article’s author, Faroo, is one way of doing things, but I’m somewhat sceptical. I don’t think it’s going to be possible to get quality implicit voting data in sufficient representative quantities to do anything particularly accurate just yet, but as our habits and the way we search and browse change, it may become so.

Update: This TechCrunch post about the star rating distribution on YouTube — and, as a side link, this post about web reputation systems — are both interesting and vaguely related. Especially when you consider the proposed measure of implicit voting for YouTube videos: how many times you rewatch it, or whether you even finished watching it at all. (Is that accurate? If I watch a video for a few seconds, long enough to identify it as a decent version of the Black Knight scene so I can link it to a friend, does that mean I dislike it? Or are situations like that mere noise?)

Tagged in , , , , , , ,

Rebuttal: 6 Reasons Why Twitter Isn’t the Future of Search

Social Media 22 March 2009 | 1 Comment

Google | Yodal Anecdotal on flickr

I just read an interesting article on the RT wires about why Twitter’s the future of search. A statement that initially got a nod of the head, until I started thinking in a little more detail about the arguments. I think it’s really important here to actually talk a bit about what search is.

@Gyutae’s article seems to simply equate search with ‘finding information’, but there’s a slightly deeper dimension: you want to get all the information, or the most relevant information, or unbiased information, or…

Anyway, it’s not just about finding stuff, but about the quality and source of the stuff you find.

So, six reasons why Twitter isn’t the future of search:

Social isn’t representative

Asking Twitter for an opinion is all very well, but bear in mind you are getting the Twitterverse’s opinion, not everyone. Although Twitter is becoming more ‘mainstream’, you’re still looking at a certain type of user, in a somewhat self-selecting crowd.

If you’re after the best restaurants in New York, you’re likely to get a decent cross-section of Twitter replying, but if I’m looking for recommendations for a nail salon in Birmingham and nobody’s mentioned one on Twitter, I have to poll my own network. Which is great, if I have access to the sorts of people who would know. Otherwise I have to seek out a few likely people and @ them the question, wait for replies (if any), etc. A lot of work.

In short: Search queries that don’t match the Twitter userbase don’t get good answers.

Anti-information overload isn’t always informative

Sometimes you’re simply not searching for something that can be answered in 140 characters. Sure, Twitter encourages people to be concise with their information, but if I’m after a fairly detailed explanation of something – or a howto, or a tutorial – I won’t find that on Twitter. If someone’s tweeted a link with the appropriate text, I might find it, but Twitter just isn’t the platform to search for detail on.

Realtime makes overviews hard to find

Realtime search is great for realtime applications, such as finding out the exact response to an ad that just ran during the Super Bowl, or the latest football score. But if you want historical information as well as ‘the latest’, or an overview of an event rather than the blow-by-blow tweets, you get totally overloaded.

For example, digging through #sxsw tweets to find informative nuggets was just a nightmare. Realtime search definitely has its place, but it won’t ever be the only way we search.

It’s hard to pick out accuracy from the masses

This ties in a little with #1, in that ‘the masses’ is actually ‘the masses who use Twitter’. A level playing field is great, but the advantage of something like PageRank is that you do gain an idea of how respected, influential, popular, accurate, etc. a web page is — generally people linking to it are giving it a silent ‘thumbs up’, pushing its PageRank higher.

That’s just not there on Twitter, and for various search tasks, you actually want that sort of ranking and relevance, rather than just a mass of voices all shouting at once.

Direct contact with sources isn’t always the answer

If you had a question about what it’s like to be a comic, sending Stephen Fry an @ might get you a nice 140-character answer. But if you were doing biographical research, or wanted to ask any sort of question requiring a detailed answer, or actually have an in-depth conversation, you wouldn’t use Twitter search.

Leaving aside the fact that some Twitter celebrity accounts have been known to be fake, how much value from asking someone directly can you really get, compared to reading published information about them?

On the flipside, if you have a question that suits a very specific person – maybe not a celebrity, but how about an entrepreneurial mum from Wisconsin? – you can find that person from Twitter, whereas you’d be lost on a more conventional search engine (until you find WorkAtHomeMomsFromWisconsin.com, of course).

I’m not saying that this level of trust and source interaction is bad, but it’s not ‘the future of search’.

Location awareness is unreliable

Using just the location associated with a tweet and saying ‘every piece of content from this location is related to it’ is just plain silly. A lot of Twitter conversation is location-free, and the only real application of this is to resolve statements like ‘Back from Starbucks. Wow, what nice service!’ to mean ‘Starbucks in Edinburgh has nice service’ because Twitter knows I’m in Edinburgh. A lot of statements posted from a location talk about other locations, even.

Having some form of location-knowledge about a person is great, but it’s got to overcome some serious hurdles before it can accurately be used in search. However, it does make finding the aforementioned work at home mother easier, and location definitely is part of the future of the Web.

If you liked this post, why not tweet it?

Tagged in , , , , ,