Another Real-World Example – Web Development


If you’ve used Reddit before, you know it’s a big list of links and you can just vote on those links up and down. The front page is whatever is the most popular. They can be voted up or voted down. And good links stay at the top of the page for a long time, and mediocre links may make an appearance but disappear after a short while. So this is a really cool feature of Reddit, and it’s actually not that hard to compute. What we do is we use a special index. I’ll show you how that works. So we’ve got a table–a link table– that looks something a lot like what we’ve been dealing with. It’s got an id and it’s got a number of up votes, a number of down votes and a date. And it also has this field called score. It’s a floating-point field. And this is the total score of the link. So you may have 10 ups, 1 down, and the score might be actually 25. And the way the Reddit hot algorithm works is there’s an index on the score field–it used to just be called the hot index– and it was really just on the score field. One approach to generating this front page might be to take all the links submitted in the last 24 hours, do some fancy math, sort them by how many up votes and down votes– the spread between these two votes is– and sort it that way. But that doesn’t really capture that hotness– how things rise and fall. So every time–like for example–you add 1 up vote, we actually increment the score by some other amount. And this other amount is computed through this hot function, and that hot function looks something like this. It’s a hot function, and it takes the number of ups, the number of downs– so it can compute a total score, which is ups minus downs– and the submission date of the link–this field. So what happens is, over time, the value of an up vote increases. So an up vote today might be worth 1 point, and that same up vote tomorrow, on the same link, will be worth 10 points. And what this causes is it causes our scores to, over time, be constantly increasing. So newer links always have higher scores than older links. So a link that’s one day old with lots of votes has the same popularity as a link that’s one minute old with just a few votes, and that’s what keeps our front page churning. And it’s really actually very simple, and it’s a very fast query we can do. We can just say effectively something like this. Select * from links order by score descending. And then we get the hottest links on Reddit. And this is a very simple and fast query, because all it does is really just bounce off the hot index. And it’s neat. You get this really cool effect for not a lot of cost. Now, to be fair, it took us a little while to figure this out, and we actually–the first person who read it, took the approach of taking all of the links from the last 24 hours and doing some math on them and sorting them in memory. But later one of our guys came up with this notion of a hot score, and we can just convert that into an index, then there we go. We’ve got a database that keeps our links always sorted in order of hotness whenever we need them. So I thought you might enjoy that. Let’s move along.

Add a Comment

Your email address will not be published. Required fields are marked *