Bayesian Estimate for Top10

Forum for discussing AniDB rules & standards. No small talk!

Moderator: AniDB

Locked
Enforcer
Posts: 2
Joined: Wed Nov 13, 2002 4:16 pm
Contact:

Bayesian Estimate for Top10

Post by Enforcer »

imdb.com is using the following algorithm to assure that the titles in the top250 are only those titles which the majority of users considers good, not those where a single user votes a 10.
imdb.com wrote:weighted rank (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C

where:
R = average for the movie (mean) = (Rating)
v = number of votes for the movie = (votes)
m = minimum votes required to be listed in the Top 250 (currently 1250)
C = the mean vote across the whole report (currently 6.8)
Advantages:
- better estimate of top 10-animes
- number of votes gives a rating more weight

Disadvantages:
- some very good animes drop out of the list because only few people voted for it
- more stress for the db-server

I think this would be more interesting than a top10 where "only" the average of votes is taken into account.

So long,
Enforcer
imokie
Posts: 26
Joined: Thu Jan 01, 2004 4:19 pm
Location: The Netherlands

Post by imokie »

IMDb also only considers votes from regular voters. Perhaps that would be a good idea for AniDB too. It would eliminate the votes from some of the 'fanboys' that only vote once or twice to give the favourite anime a 10.
Elberet
Posts: 778
Joined: Sat Jul 19, 2003 8:14 pm

Post by Elberet »

imokie's suggestion seems more practicable, but just thinking about the SQL queries gives me a headache. :|
egg
Posts: 769
Joined: Tue Nov 11, 2003 7:17 am

Post by egg »

Elberet wrote:imokie's suggestion seems more practicable, but just thinking about the SQL queries gives me a headache. :|
It probably wouldn't be that bad... As for the load, it probably wouldn't have a big impact because I think the script is run once every 24h.
Elberet
Posts: 778
Joined: Sat Jul 19, 2003 8:14 pm

Post by Elberet »

No, it is unfortunately quite an expensive query to make at runtime. To determine the rating for an anime, you have to
- find all votes for that anime.
- for each vote, find the user who cast that vote.
- for each user, find all votes that user has made.
- count the user's votes.
- include the vote only, if there are more then n user's votes.

In SQL, you'll even have to use a subselect or temporary table to express this.
Locked