Bayesian Rating - how to implement a weighted rating system

March 30th, 2006 in Basic Tutorials · By Markus Weichselbaum

Many web sites allow users to provide feedback on products, services or other users. In addition to verbal reviews, rating facilities are typically present that allow visitors to rate an item from 0 to 5 (often in conjuction with stars), from 0 to 10, or simply by voting + or -, respectively.

These visitor ratings are then often used to rank the rated items. And when “rank” comes into play, it gets tricky.

Ranking using Bayesian average

Hopefully the headline hasn’t turned you away yet – it smells of mathematical hardcore. But fear not, once you know how, implementing a robust rating and ranking system using the approach discussed here is really quite simple, very elegant, and most importantly, it works really well!

A basic example using simple + and - votes

In fact, the artworks in TheBroth gallery are visitor rated, using a rather simple + and - system. If you like an item, rate it “plus”. If you don’t like it, give it a “minus”.

The rating of an item would then be: number of positive votes divided by number of total votes. For example, 4 + votes and 1 - vote would correspond to a rating of 0.8, or 80%.

Now if you want to rank the items based on this simple equation, the following happens:

Assume you have on item with a rating of 0.93, based on 100s of votes. Now another new item is rated by a total of 2 visitors (or even just one), and they rate it +. Boom, it goes straight to #1 position in the ranking, as its rating is 100%!

A weighty issue

What we want is this:

If there is only few votes, then these votes should count less than when there are many votes and we can trust that this is how the public feels about it. In other texts this value is also refered to as “certainty” or “believability”.

This means, the more votes an item has, the higher the “weight” of these votes.

Thus, we want to calculate a corrected rating that somehow takes the weight of votes into account:

  • The more votes an item has, the closer the corrected rating value would be to the uncorrected rating value.
  • The less votes an item has – and this is the main trick here – the closer its rating should be to the average rating value of all items!

That way, new votes pull the corrected rating value away from the average rating, towards the uncorrected rating value.

There you have it – this is the main algorithm of what we call “Bayesian rating”, or rather “Bayesian ranking” as it is really about the relation of the item ratings to each other, based on the number of votes of each item.

Using a magic value

We now need to apply a “magic” value that determines how strong the weighting (or dampening, as some consider it) shall be. In other words, how many votes are required until the uncorrected value approximates the corrected value?

It really depends on how many votes the items get, in average. There is no point requiring 1000 votes for the item to rank 60% when each item only gets a handful of votes in average.

Thus, we could make this “magic” value exactly that, namely the average number of votes for all rated items, and voila, our Bayesian rating system is complete. By making the magic value dynamic, it will auto adapt to your system.

Finetuning the magic value

You could opt to create an upper limit to your magic value so that your doesn’t come to a grinding halt when there are many votes per item – an evergrowing magic value would make it less and less possible to actually influence the rating of a new item because it takes so many votes before you believe the rating of a new item.

The finetuning will depend on whether your system has a large influx of new items or not. If there are many new items added all the time, this influx will keep the average number of votes per item low. If your system has a fixed number of items, such as “rate your favorite star of The Beatles”, you may not need an upper limit. If you do add the occasional item, then an upper limit makes sense to give new items a chance to rate highly more quickly.

Bayesian rating for everyone

Now, lets summarize this all and provide a working formula for you to use in your code:

Bayesian Rating is using the Bayesian Average. This is a mathematical term that calculates a rating of an item based on the “believability” of the votes. The greater the certainty based on the number of votes, the more the Bayesian rating approximates the plain, unweighted rating. When there are very few votes, the bayesian rating of an item will be closer to the average rating of all items.

Use this equation:

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Legend:

  • avg_num_votes: The average number of votes of all items that have num_votes>0
  • avg_rating: The average rating of each item (again, of those that have num_votes>0)
  • this_num_votes: number of votes for this item
  • this_rating: the rating of this item

Note: avg_num_votes is used as the “magic” weight in this formula. The higher this value, the more votes it takes to influence the bayesian rating value.

How Bayesian Rating is used in TheBroth

We use it to show the “highest rated artworks” in order. We wanted to avoid that a new artwork with 1 vote immediately jumps to first place, as its rating would be 100%. Using Bayesian rating, its starting rating with one positive vote would be a little bit higher than the average rating of all items.

Resources

Share this:
  • del.icio.us
  • StumbleUpon
  • Furl
  • Netscape
  • Reddit
  • Technorati
  • YahooMyWeb

30 Responses to “Bayesian Rating - how to implement a weighted rating system”

ahcoldpizza3 Aug 06

has an upper limit been established yet? i’ve noticed that many excellent works by many people aren’t moving up at all because of the fact that other good but not excellent works have so many more votes because they’ve been on the site for so long.

yet at the same time, its still possible to catapult collaborations to top 20 just by having everyone who worked on it (say liek 10 people) all vote it +.

flipflop3 Aug 06

very well explained, and no doupt that it works. it also makes it nigh impossible to manipulate a mosaic to an undeserved high position.

Keep up the good work, U have a winner with is format

rawc30 Aug 06

Now this seems a very interesting use of Bayesian Theorem. I like the idea. I don’t know if its the most fair one but it certainly does not seem unfair to me..

Albeit it depends on being well-voted(voted by a lot of people) which is not always guarenteed, at this moment i cannot think of anything better.

Nice job.

hasan23 Nov 06

It is useful to describe exactly how to calculate the following variables, e.g SQL statements:

* avg_num_votes:
* avg_rating:
* this_num_votes:
* this_rating:

joe3 Feb 07

Very good explaination of weighted averages. But wonder about all the queries that would take place for a large number of object. For example i want to apply this logic to rate songs. But potentially there could be many thousands of songs. So every time a song is loaded, you’d run this calculation again monay thousands of other songs (everytime). What if votes where collected per song and then a cron could talley up the days events - run/reset the averages once or possible twice a day. You’d reduce the server load and queries by a large factor I image. Of course you’d give up instantaneous results. What is your take on this. Thanks

joe4 Feb 07

for hasan: here is query example, might need to be cleaned up a bit

// Bayesian Rating Calc
$theItem = $_GET[’id’];
if($theItem) {
// all items votes and ratings
$result = mysql_query(”SELECT AVG(item),AVG(vote) FROM itemvotes WHERE vote>’0′ GROUP BY item”) or die(mysql_error());
$row = mysql_fetch_row($result);
$avg_num_votes = $row[0];
$avg_rating = $row[1];

// this item votes and ratings
$result = mysql_query(”SELECT COUNT(item),AVG(vote) FROM itemvotes WHERE item=’$theItem’ AND vote>’0′”) or die(mysql_error());
$row2 = mysql_fetch_row($result);
$this_num_votes = $row2[0];
$this_rating = $row2[1];

if(!$row OR !$row2)
$br = “_”;
else
$br = number_format( ((($avg_num_votes * $avg_rating) + ($this_num_votes * $this_rating))/($avg_num_votes + $this_num_votes)), 1, ‘.’ );
} // end of if item selected

MorningStar5 Feb 07

Hi Joe, you’re quite right. Every single new rating, adding or deleting an item requires all rankings to be recalculated. For this reason, TheBroth just sets a flag if the ranking and rating values have to be recalculated, and recurring cronjob does the rating calculation every N seconds (if the “must_update” flag is set).

For other installations of Bayesian rating: Depending on the server load, capacity and amount of items to be ranked, this cronjob interval will require to be set accordingly to provide a good compromise between processing cost and data freshness.

Tim26 Apr 07

Joe - just wanted to let you know we implemented your algorithm on the kuler website with much success. We are using a mySql table to store avg ratings, total votes, and the Bayesian Ratings. Whenever a user’s rating is submitted or updated in a separate table, an aggregate record is inserted or updated in our Bayesian table. This process is handled via triggers in the database. We currently have almost 9000 items users can vote on, and we have not experienced any performance hits yet.

Ryan18 Jul 07

Markus, thank you so much for this great article. I have been struggling with a very similar issue and was told me Bayesian stats might help. I did a search and ended up here. Exactly what I needed!

Jitendra28 Oct 07

Hey, check out SezWho (http://www.sezwho.com)…We address the issue with weights by assigning a weight to a reputation based on reputation of the rater and then take the aggregate score to judge the quality of the content.

Check it out.

-Jitendra

seo forums and markerplace17 Dec 07

Well Thanks, such that atleast someone think about to implemention of weight rating system.

So Well you also give tutorial in the weight rating system. I need to clarify some queries about it

IntelliJ IDEA中文爱好者博客 » Blog Archive » Bayesian Rating在投票系统中的应用10 Jan 08

[…] 目前很多的网站都提供了投票系统,这个投票系统是和超女的大众团投票不一样,不纯以票数进行PK,你是以打分方式进行投票。通常文章类型的投票都是这样的,你看了一篇文章后,然后从1-5中选择你给的分值,最后网站要给出最优秀文章的排名。让我们看一个例子。CCTV每年都会进行年度人物评选,现在有十名候选人,这次不采用短信投票方式,而是公布在CCTV首页,当你选择某一候选人时,会提示你给其今年的表现打分(1-10)。最后我们要评选一位年度最风云的人物,是根据投票数?还是根据平均得分数?这里面可能需要一个数学的问题。得票数最高的人可能平均分很少,均分高的人可能得票数比较小,究竟如何处理这个情况?这里我们可以考量使用Bayesian Rating来处理,公式如下:br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)参数说明下:avg_num_votes: 平均得票数,这里为候选人的平均得票数,也就是总投票数/10;avg_rating::平均得分数,这里是候选人的平均得分数,也就是 总的分数/投票数this_num_votes:该候选人的得票数this_rating:该候选人的平均分br:贝叶斯得分(Bayesian Rating)在这个公式中,avg_num_votes 作为一个魔幻数,该值越高,那么够票数高的人可能贝叶斯得分也就越高。原文来自: http://www.thebroth.com/blog/118/bayesian-rating […]

Ben11 Jan 08

I’ve implemented your Bayesian weighted rating system on a website that I have built and it works like a charm - http://www.yolo.sg
Thanks for this article it helps heaps!

Bayesian Rating - how to implement a weighted rating system - Developer Blog @ TheBroth.com13 Jan 08

[…] rousejen wrote an interesting post today onHere’s a quick excerpt […]

Paul Harrison14 Jan 08

Ok, so we’re producing a Maximum Likelihood estimate of the parameters of a Gaussian distribution, with a conjugate prior.

Now, one can argue about priors until they invade your galaxy, but I would suggest that using avg_num_votes is a bit harsh. The actual weight given should probably be less. There are ways to estimate the correct weight… if you do this, what you are doing is Empirical Bayesian estimation. It’s a little more involved than what you’ve described here, but worthwhile.

Another neat thing a Bayesian approach gives you is your degree of uncertainty about a rating.

MorningStar14 Jan 08

Paul, you may as well speak in Klingon to me because then I’d understand just as much. :)

What I can say though, empirically, is that our algorithm so far as worked well for us because in our system, avg_num_votes is pretty much a constant.

Ilari Sani14 Jan 08

I think your SQL for item votes could be simplified a bit. The average of votes is their sum divided by the count. This is then multiplied by the count.

AVG(vote) * COUNT(item)

is really

SUM(vote) / COUNT(item) * COUNT(item)

which reduces into

SUM(vote)

You could turn that into a variable called this_sum_votes, and put that in place of this_num_votes * this_rating.

Patrick Moloney14 Jan 08

Thanks Markus,

Very interesting. Two questions:
1. Does it makes sense to apply the same algorithm to a scalar rating?
2. Isn’t the effect of the algorithm that all ratings are pulled towards the mean? So for example assume that we have 1,000 ratings with an average of 0.50/1.00 and one film/song with 100 ratings all or which have a perfect score of 1.00/1.00 that song’s rating is still pulled back to the mean. Does that make sense? This is a genuine question not a critique.

MorningStar14 Jan 08

Patrick,

1. Yes, absolutely.
2. Yes that is the case.

In my algorithm, if the avg_num_votes were 100 and there was one item with 100 votes of rating 1.0, then BR = 0.75. This affects all items equally though, and since the goal is not to get absolute rating for each item but rather relative ranking, the actual BR value doesn’t matter, only the relative BRs to each other so you can rank them.

As many folks have pointed out in comments here and also on reddit, it depends on the system you’re working with. If you have a system that has a steady influx of new items, yet the average number of votes doesn’t change significantly, then you can use avg_num_votes as the magic weight.

And as I’ve written above, for other systems it’s preferable to use a constant or provide an upper limit. In comments on reddit I’ve seen other good suggestions, such as using a function that takes into account, in simple terms, that “N x ‘certain enough’ ” is still pretty much certain enough, and so you could use magic values such as the squareroot of avg_num_votes etc.

links for 2008-01-15 « Talkabout14 Jan 08

[…] Bayesian Rating - how to implement a weighted rating system - Developer Blog @ TheBroth.com “how to implement a weighted rating system” (tags: bayesian rating rank howto how stats statistics viapopular voting) […]

Shannon E. Wells15 Jan 08

Hi Marcus, do you feel an error margin would be appropriate in a ratings system? If so, how would you suggest it be used?

According to the Wikipedia entry on error margins there are a couple of ways of calculating it for polls, but I’m not certain how or whether these would apply to a rating system.

links for 2008-01-19 (Leapfroglog)19 Jan 08

[…] Bayesian Rating - how to implement a weighted rating system - Developer Blog @ TheBroth.com Easy to understand explanation of Bayesian rating. (tags: ratings rankings systems bayes) […]

lucasjosh.com » Blog Archive » Links for 1/30/08 [my NetNewsWire tabs]31 Jan 08

[…] Bayesian Rating - how to implement a weighted rating system - Developer Blog @ TheBroth.com […]

Dan13 Mar 08

Great suggestion. I have some thoughts/questions about the historical relevance of ratings.

1) How would you handle situations where ratings of older things become less and less relevant, regardless of how many people voted for it or how high of a score it got?

For example, a brand new camera comes out and 100 people give it a 10, making it the top recommended camera. A year later the replacement model comes out. If the same 100 people rate it a 10, without any historical weighting, it would rank equally with the older camera.

We probably want the new camera to list higher in any listing, but simply sorting by age may not be appropriate since older cameras that rated highly may still be better than newer cameras that rated lower.

Would we want to keep track of a separate “obsolescence” score that somehow gets combined with the rating score?

2) Another thing to consider… I suspect (I could be wrong) that most people tend to rate things (at least in some categories) based on what else they see. This would indicate that there is probably a difference between when person A rates 1 item out of a total of 10 a year ago and person B rates 1 item out of a total of 100 today. It means that the person A did not consider 90 other items that appeared later. There would now be two ratings in the database for 100 items, but those ratings should probably not carry the same weight.

I think that when applied to art or other submissions, this would encourage fresh submissions by putting more relevance on new things.

Any thoughts?

Analytical Engine » All Reviews Are Not Created Equal26 Mar 08

[…] Weighting a product’s ratings based on the number of ratings received.  When a product has a small number of ratings, these ratings should count less than those for a product rated many times.  To achieve this, you can add a “magic value” into the algorithm that calculates a product’s average ratings.  This “magic value” brings products with few ratings closer to the average ratings of all products, then reduces its effect as more ratings are received, allowing established products’ ratings to float freely and more closely reflect the average of its ratings. […]

Kuutio » Tulosten pisteyttäminen8 Apr 08

[…] Tilastotieteilijä Andrew Gelman on kirjoittanut blogiinsa hyvän, tilastoteknisen intron aiheeseen. Ehkä hieman helpommin lähestyttäviä ovat The Brothin Bayesian Rating - how to implement a weighted rating system”> ja Life with Alacrityn Collective Choice: Rating Systems -artikkelit. […]

the protagonize blog » Blog Archive » Rating system woes9 Apr 08

[…]  I’d like to be very clear about this right off the bat: I will not remove or reject overly negative or positive ratings on anything. While the site is effectively a dictatorship (well, from a management and editorial perspective, at least), people have a right to their opinions, even if they’re extreme. So, there’s definitely no point in asking me to remove ratings. No one has as of yet (though I have received a couple of complaints about what’s been happening, as well as a ream of comments on my profile), but I want to make sure I’m heard loud and clear on that front. I don’t think that’s a real solution to the problem, anyhow. While I’m pleasantly surprised to see people watching the author rankings this closely, it worries me that people are getting so hung up on their ratings. As I’ve reiterated to various people in the last day, the more ratings we have in the system, the less these negative or positive ratings spikes will have an effect. We’re seeing a major impact from it on author rankings right now because several of our top-ranked authors don’t have a huge amount of ratings, so the influence of 5 overly negative or positive votes on their branches or chapters will be much more noticeable. What I want people to keep in mind is that these abnormalities usually iron themselves out over time; as frustrating as they may be in the short term, they will slowly be less and less important the more ratings people add to the system. Now, as many people may already be aware of, 5-star ratings systems such as the one we have on Protagonize are inherently flawed in a variety of ways. At the same time, they are a simple, clean rating mechanic, and they comprise what is likely the most common rating system on the web in the last decade. The flaws with 5-star systems have been discussed to death, but if you’d like an overview, Life With Alacrity is an excellent blog that discusses ratings systems in depth. There are several blog posts there that are worth a read on the subject: Using 5-Star Rating Systems, Collective Choice: Rating Systems and Collective Choice: Experimenting with Ratings. What it comes down to is that 5-star rating systems are in many cases the least of all evils; if you can define the rating values clearly, make it easy to rate content, and accrue a large sample of votes, they become more valuable. However, as the sample size decreases, and the clarity of the different rating values becomes less obvious, votes tend to bunch up at the higher end of the scale and aggregate ratings become less valuable. I’ve also seen several people request that authors no longer be able to rate their own branches and chapters. I tend to disagree with this in principle, but I think that the two solutions I offer below may soften the impact of someone going through and rating all of their own posts 5/5 stars, without removing the ability to rate your own content altogether. Note that many large, well-known existing systems allow you to rate your own material, from Digg, to Reddit, to Amazon. I have a variety of different ways at my disposal to attack this problem, the first being implementing a Bayesian weighted rating system that would work hand-in-hand with the existing ratings. What this essentially means is that items and authors with less ratings would be weighted back towards the average, and statistical anomalies like a batch of 1/5 votes would affect users less. Of course, at the same time, a batch of 5/5 votes would also affect the overall rating less. You see the point. When I make this change, you won’t see anything different on the front-end, but you may notice that average ratings become a little smoother overall. The other option available to me is to move away from a 5-star rating system entirely, and switch over to more of an item popularity system. There are several ways I could do this, but I’m leaning towards more of a Reddit-style system of +/neutral/- votes. There would likely no longer be an average rating on stories, but instead an aggregate count of votes. This would remove a bit of the ambiguity from the current system, and ratings would be a bit more meaningful overall. However, every system has its pitfalls. If we go this route, it may be necessary to institute some kind of karma system to force users to be a little more conscientious when voting. What this means is that you’d be given a certain number of karma points based on the length of your membership, updated regularly, and you’d have to spend those points wisely to rate content. I’d like to hear what our members have to say about this matter; feel free to make yourself heard here and I’ll use this as an informal poll of the community on the subject.  Posted in Collaborative Writing, Features, General, Site mechanics […]

Maxwell17 Apr 08

Joe, I see you say:

SELECT AVG(item),AVG(vote) FROM itemvotes WHERE vote>’0′ GROUP BY item”

however I don’t understand what you mean by AVG(item) ? Presumably item is the unique ID for each vote?

I think there is something I am missing but I don’t know how it’s possible to calculate “the average number of votes for all rated items” because surely it would just be the number of votes that have been made?

Thank you.

snake4 May 08

@Maxwell I think he meant SELECT item, AVG(vote) […]

yabbaYesterday

br = ( (avg_num_votes * avg_rating) + (this_num_votes * this_rating) ) / (avg_num_votes + this_num_votes)

Assuming that this_rating = POSITIVE_VOTES - NEGATIVE_VOTES, then:

1. What happens with this formula if the rating for the item is negative, i.e. there are more negative votes than positive votes? The bayesian rating might not always be negative.

2. What happens if the num of positive votes is the same as the num of negative votes (but non-zero)? I would expect the bayesian rating to be 0 too, but that’s always NOT the case!

Any feedback would be highly appreciated! ;)

Leave a comment