Page 1 of 1

Bayesian Estimate for Top10

Posted: Tue Aug 10, 2004 7:34 pm
by Enforcer
imdb.com is using the following algorithm to assure that the titles in the top250 are only those titles which the majority of users considers good, not those where a single user votes a 10.
imdb.com wrote:weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (currently 1250)
C = the mean vote across the whole report (currently 6.8)
Advantages:
- better estimate of top 10-animes
- number of votes gives a rating more weight

Disadvantages:
- some very good animes drop out of the list because only few people voted for it
- more stress for the db-server

I think this would be more interesting than a top10 where "only" the average of votes is taken into account.

So long,
Enforcer

Posted: Tue Aug 10, 2004 8:01 pm
by imokie
IMDb also only considers votes from regular voters. Perhaps that would be a good idea for AniDB too. It would eliminate the votes from some of the 'fanboys' that only vote once or twice to give the favourite anime a 10.

Posted: Wed Aug 11, 2004 2:22 am
by Elberet
imokie's suggestion seems more practicable, but just thinking about the SQL queries gives me a headache. :|

Posted: Wed Aug 11, 2004 8:10 pm
by egg
Elberet wrote:imokie's suggestion seems more practicable, but just thinking about the SQL queries gives me a headache. :|
It probably wouldn't be that bad... As for the load, it probably wouldn't have a big impact because I think the script is run once every 24h.

Posted: Fri Aug 13, 2004 12:32 pm
by Elberet
No, it is unfortunately quite an expensive query to make at runtime. To determine the rating for an anime, you have to
- find all votes for that anime.
- for each vote, find the user who cast that vote.
- for each user, find all votes that user has made.
- count the user's votes.
- include the vote only, if there are more then n user's votes.

In SQL, you'll even have to use a subselect or temporary table to express this.