- Hide

The challenges of hot-or-not-ness

In between spending time playing the latest World of Warcraft expansion (ah, the joys of a sideline job in gaming) I’ve been working on a demo app, codenamed “hot or not”. Not a great codename, mind, since that’s exactly what it is…

However, it’s kind of interesting to see how many design challenges even something this simple throws up: assuming we have an algorithm that can take something and return a number representing popularity, how do we build an app around that? If our algorithm depends on input but we want to find an overall ‘top n’ list, do we need to resolve named entities or can we cheat a little with term frequency? What data sources do we use? Do others publish data we can use to bootstrap or otherwise kickstart the system? Should we restrict by topic (and thus invoke clustering) or use overall statistics? What about different terms that refer to the same thing (lexicon resolution)? The list goes on.

I’m certainly beginning to see why natural language projects rarely make it out into the wild, though that’s hardly going to get in my way. It’s fascinating what you can do when you have a core algorithm primed and a language toolbox at the ready, in terms of different toys and products you can build — the trouble is pinning everything down!

Post a Reply

*