Future of Anime Hint - Anime Referral

egg · Post by **egg** » Mon Jan 16, 2006 5:58 am

[Edit: See AniDB Anime Referral Sample]

Basically I am stuck on what to do with the hint. From my own usage I would say the current system is a combination of surprisingly accurate suggestions and spectacular failures. The things that have been at the top of my personal hints have either been things that I have really liked or things that I have particularly disliked. Although the work I have done has improved the hint, it still is lacking in some of its predictions.

My question is, how should it be improved?

There are a number of options:

Add more logic to the comparisons to try improve accuracy. There are already a number of suggestions listed in [Anime Hint] - Feature Requests. The trouble with these though is a meaningful equation needs to be created to accommodate them AND I believe they will only have a minimal impact on the results (based on the work I have already done).
Create a new algorithm using different logic or assumptions for finding similar users. I had one user who PM’ed me that presented an algorithm for me to review (very thorough 14-Page PDF document). It has been over a decade since I have done math at this level (differential calculus) and I never dug into it deep enough to figure out if it would work or not. Another user PM’ed me and was interested in trying to apply machine learning to improve the results. There are probably other ideas a well. The trouble with these though is that they would take greater resources and until they are implemented it is hard to determine if they would help. I have already tried to do something like this with the Pearson’s logic, but it only made marginal improvements.
Find out what other sites are doing and use their logic as a basis for finding the hint. Along those lines, I found this interesting article: Amazon Recommendations. If anyone else has some information about how it is done other places, I'd be interested in seeing it.

I believe that a major issue with the existing method is that it depends on finding similar users and telling you what they liked. Well this method will frequently find people who have watched the most popular animes and many times have voted highly for it, so popular animes frequently show up in hint results because enough similar users have voted for them. Less known animes have a harder time getting recommended because they don’t have as many votes for them. I believe the affects of this have been reduced since I started working on the hint, but the issue is still there. I think that as long as there is a system that depends on the votes of similar users, there are always going to be skewed results.

Another issue is people have different rating scales, so one user may vote 5, 5, 6 and another 8, 8, 9 for the same animes. That shows a trend that the two users both like the last anime slightly more than the other two, but it is difficult for the system to know if the users liked the animes about the same, just they use a different scale, or if the second user really like the animes better than the first one did. I addressed this somewhat with the Pearson’s algorithm, but it is still a challenge.

If we implemented and anime-to-anime filtering system, that would address these issues. First, let me explain how I would envision it working. What it would do is go through and compare each anime to every other anime, and use the votes the users had made on those two anime and come up with a score of how “similar” the anime are. Using my previous example: user1 votes 5 for A, 5 for B, and 6 for C, user2 votes 8 for A, 8 for B, and 9 for C. The system would first compare A to B and find user1 voted 5 for both and user2 voted 8 for both; these animes are similar because users gave it the same vote, lets say it assigns a value of 100 for a perfect match. Then the system would compare A to C and find user1 voted 5 and 6, user2 voted 8 and 9; this would find the users gave similar ratings, but not as close as A and B, so it assigns a similarity score of 90. It would continue doing this through all of the anime. This IS very intensive processing, but it can be done off line and stored.

Once a similarity value has been determined for all anime, then doing the hint would be very easy, you would go through the anime the user rated highly, find the anime that are similar to those and output the results. This would make the actual hint much less CPU intensive (and scalable as demonstrated by Amazon).

This method also would reduce the recommendations of the popular animes just because they were watched by similar users. Since this would not care about what peoples votes were, just whether or not they were similar to a vote on another anime (people voting 1 on two different animes is a similarity), then it will not be swayed by such votes. It will go through those anime the user has voted highly and show the user what is similar to those.

As far as the scale of the voting (one person voting 5 while another votes 8 for the same anime even if they liked it the same amount) is completely negated because the system does not care what the vote is, just if they are similar or not. Since we would not be comparing a user’s votes to their own votes, they will always be on the same scale.

This also has the benefit of having the capability of now listing similar anime within AniDB, and it would also make the hint useful even for users with relatively few votes.

The biggest drawback of this approach is the amount of processing that needs to be done to find the similar anime. But this processing could potentially be done completely offline (a dump of the voting data could be made, the processing is done on a separate machine, and the results are uploaded back).

Anyway, I would appreciate any feedback. I didn’t know what I wanted when I started writing this, but after working through it, I like the Amazon model. Exp may not want to deal with the processing necessary, so it may not even be an option. I would like to hear any ideas that people have about the hint.

AnimeOtaku · Post by **AnimeOtaku** » Mon Jan 16, 2006 6:46 am

In my opinion you should let it be as it is atm. The results I'm getting now are very good.
The only thing I want ist that you can filter with the source (cd,hdd,....)

Post by **exp** » Tue Jan 17, 2006 6:08 am

I am not very convinced about this approach. Never mind the cpu time required for the offline processing, that can be handled.
What is bothering me is the very concept itself.

How do two animes become similar just because I voted 5 for both of them? What your algorithm would calculate is not similarity of animes, but rather how many ppl liked the anime as much as the other one.
The problem I see there is that this might work for the high votes (8,9,10) but for low votes I don't think you will actually get very useful results out of this.
But this should be fairly easy to test. You could just write up a simple O(n^2) script which does the calculation for similarity and then look at the similarity between animes. I might well be mistaken.

However, I do believe that similarity between animes would be a good additional base for the anime hint. But we might need to come up with a better way to calculate similarity. I.e. we could try to look at the categories added for animes to find animes with similar themes/genres.
We might also go the animeplanet way and have ppl manually add the similar animes.

So in the end i think the current system, although it has it's drawbacks, shouldn't be removed entirely but rather extended by making use of more data sources (similar animes, the users genre preferences, ...).

BYe!
EXP

egg · Post by **egg** » Tue Jan 17, 2006 7:21 am

exp wrote:I am not very convinced about this approach.

I haven't tried to convince you yet, I was just putting the idea out there to get some feedback. If it had merit, THEN I would try to convince you.

But apparently people don't have much to say at the moment.

You are right though, it would not necessarily find similar animes (as far as content or genres). It is more like, people who liked Y also liked Z. So if you voted 9 for Y and people who voted for both Y and Z had similar votes for them (either high or low votes), then you would probably like Z.

I am trying to set up the DB on my windows machine so that I can run some tests to see how it works (my linux box crashed a while ago because of H/W issues).

Post by **fahrenheit** » Tue Jan 17, 2006 1:07 pm

i think that you didn't get many comments, because the anime hint issue is a complex one, people use it, but the math behind it, it's a bit complex, so people leave it to people who realy knows what they are doing, in the case, you, egg.

i would like to see improvements in the anime hint, maybe something like the amazon rec style, but doing that, i would go for getting it's sugestion on the mylist page or the main page, a bit like what is done in amazon, "one or two animes that we think you would like".

anyway, you have my vote of support.

egg · Post by **egg** » Wed Jan 18, 2006 7:46 pm

I used to think I knew what I was doing. After reading much more about the subject, I'm not sure anymore.

OK, I ran an example. I took Fullmetal Alchemist and tried to find what was similar. I took the votes for users who are allowed to use the AnimeHint (i.e. at least 30 votes) who had voted for FMA and then tallied all of the other animes they voted for. For each one of those animes then I calculated a relative score. I kept the results for those animes that had at least 10 users that had voted for both anime. Note this is based on data from August.

Here are the top 10. I think the best way word this is, People who voted on this liked:
aid=1218, score=0.985208762404044, # of votes=35
aid=132, score=0.978077439919998, # of votes=202
aid=5, score=0.976402681676406, # of votes=178
aid=26, score=0.973955921844615, # of votes=262
aid=2052, score=0.972426470588235, # of votes=11
aid=84, score=0.972057813243143, # of votes=235
aid=959, score=0.971839449809286, # of votes=371
aid=1868, score=0.971657862657456, # of votes=90
aid=191, score=0.970553511343736, # of votes=311
aid=1, score=0.968710287763689, # of votes=226
aid=23, score=0.967776851145377, # of votes=352

Many of these still seem like a list of the top anime. But, I guess it makes sense, the people who liked this mostly also liked those, that's why they are at the top...
[Edit: It was artificially inflating popular anime, I think I adjusted it, but if you saw the list right after I set it, it is now changed...]
[Edit: OK, shortened the list since it is now obsolete, for better results, look at: AniDB Similar Anime Sample]

fredall · Post by **fredall** » Thu Jan 19, 2006 9:55 pm

It seems to me like the approach suggested by egg might give a necessary condition for similarity but not a sufficient condition. So if the score is low, then the two titles can reliably be claimed not to be similar but if the score is high this may be because of other reasons and doesn't necessarily imply similarity.

Perhaps a combination of the Anime Planet approach and the suggested approach would work? That is, users are encouraged to suggest similar titles and the suggestions are ranked using the new algorithm and if the score is high enough the suggestion is added as a similar title, ranked according to it's score if there have been several accepted suggestions.

egg · Post by **egg** » Thu Jan 19, 2006 10:39 pm

fredall wrote:It seems to me like the approach suggested by egg might give a necessary condition for similarity but not a sufficient condition. So if the score is low, then the two titles can reliably be claimed not to be similar but if the score is high this may be because of other reasons and doesn't necessarily imply similarity.

Perhaps a combination of the Anime Planet approach and the suggested approach would work? That is, users are encouraged to suggest similar titles and the suggestions are ranked using the new algorithm and if the score is high enough the suggestion is added as a similar title, ranked according to it's score if there have been several accepted suggestions.

This is not a measure of similarity. It is a measure of how similarly people voted on the animes, the idea being, if a people vote the same on the two animes, then if you liked this anime you would probably like the other one. As far as similarity, I could run the same logic through the categories and it would work the same way, I wouldn't even need to change the logic, just the initial tables for the query.

This is meant to be an automated system, and is no way meant to replace the system on Anime Planet. This is just using data that is already there to find patterns that may not have been noticed before.

BTW, I have done some refinement to the algorithm. I included votes from ALL users, not just the ones with 30 votes. Also, I subtracted 5.5 from the votes, so they are now treated as being from -4.5 to 4.5 instead of 1 to 10, this makes differences in the votes more noticable. The results appear to be getting better (and in some cases surprising). Once I resolve some issues on my machine (I am having problems with the perl DB libraries) I will post some updates.

Post by **pelican** » Thu Jan 19, 2006 10:49 pm

The basic (original) anime hint idea is not broken, but the algorithms designed to measure the similarity of users' tastes are.

The faults basic numerical system used in the first anime hint version are obvious, but a good colleration co-efficient should produce better results than it is.

Shortly after the implementation of this alternate method, there was some discussion on #anidb about the problems with said implementation, but unfortunately I cannot recall the details. I would guess that the problem is not compensating for the size of intersection of votes whose colleration is used to determine the compatibility of those users.

One pair of votes is not an event, two pairs is not a very significant one, three pairs is slightly more significant, etc. The size of the sample determines the confidence you should have in the result and that has to be applied to the calculation for the method to be useful.

Guest · Post by **Guest** » Sat Jan 21, 2006 4:49 pm

egg wrote:This is not a measure of similarity. It is a measure of how similarly people voted on the animes, the idea being, if a people vote the same on the two animes, then if you liked this anime you would probably like the other one. As far as similarity, I could run the same logic through the categories and it would work the same way, I wouldn't even need to change the logic, just the initial tables for the query.

This is meant to be an automated system, and is no way meant to replace the system on Anime Planet. This is just using data that is already there to find patterns that may not have been noticed before.

No, I didn't mean that I thought it should replace the Anime Planet system. I meant that if we wanted to have a system for listing similar within AniDB, as you suggested in the first post of this thread, then a combination of this system and the Anime Planet system could perhaps be an approach worthy of consideration.

egg · Post by **egg** » Sun Jan 29, 2006 6:43 am

pelican wrote:The basic (original) anime hint idea is not broken, but the algorithms designed to measure the similarity of users' tastes are.

The basic idea is not broken, but the method is limited because it depends on the most prolific users votes. This tends to lead to false negatives and false positives. The false positives are animes that are recommendations that are made to the user that the user really is not interested in. These occur because the the prolific users have frequently watched many of the same animes and if the animes are highly rated, they have generally give them fairly high votes. This means that the users that are common with you have a high probability of recommending those common animes even if they do not really match your preferences. False negatives are recommendations that ARE NOT made to the users, animes that get filtered out because not enough of the users SIMILAR to you have voted for that anime. This means that potentially there may be a number of things that you may like are out there, you just don't know about it. There are many animes that I have heard about, watched and liked that never showed up or had low scores in the hint. Although I had implemented various things that have helped reduce these issues, my experience has shown me that unless things are done in a drastically different way, these issues will remain prevalent. That does not make the existing system obsolete (I never intended to remove the old one), but I didn't see putting much work into the existing system for only minimal gains.

How does the new system address these? First of all it looks at all votes in the system, not just those of similar users, this gives a larger basis for finding those animes that are not as common. Now the number of users that voted for these animes for a comparison is important, which is why I list it as well, but if you only list the animes that have a LOT of users in common, then you end up missing a lot of possibilities. I did use a minimum of 10 common users for the scores so that they had some relavance, but the score is not increased if the number of users increases. As far as the false positives, there will probably still be some, since we are dealing with people votes, and people aren't like you, then there will always be false positives. Hopefully, since there will be fewer animes that are missed or filtered out, then the false positives will be farther down the lists.

Anyway, enough rambling. I have a sample system for people to see the "similar animes". Go to AniDB Anime Referral Sample, put in an aid and then you can see the results. PLEASE let me know what you think.

DonGato · Post by **DonGato** » Sun Jan 29, 2006 9:46 am

I think it sucks. Sorry to say it that bluntly.

aid: 749 [Devilman Lady]

Code: Select all

aid   Name    Score                                                      # of Votes
1     109     Mahoromatic - Automatic Maiden (Mahoromatic)               948    12
2     1726    DearS                                                      927    11
3     834     Green Green (2003)                                         909    11
4     252     Mobile Suit Gundam SEED (Kidou Senshi Gundam SEED)         900    10
5     74      Bluer than Indigo (Ai yori Aoshi)                          900    16
6     113     She ~ the ultimate weapon ~ (Saishuuheiki Kanojo)          897    11
7     27      Oh My Goddess! (AA! Megami-sama)                           894    15
8     170     Angelic Layer (Kidou Tenshi Angelic Layer)                 891    10
9     105     R.O.D - Read or Die                                        889    13
10    12      Chobits                                                    886    18
11    39      Slayers                                                    880    10
12    1052    Chrono Crusade (Chrno Crusade)                             880    11
13    894     R.O.D -THE TV-                                             877    11
14    409     Tenchi Universe (Tenchi Muyo!)                             875    10
15    52      Love Hina Spring Special                                   873    12
16    1327    My Days With Midori (Midori no Hibi)                       869    11
17    76      Jubei-chan the Ninja Girl: Secret of the Lovely Eyepatch   866    10
18    178     VanDread the Second Stage                                  865    12
19    96      Witch Hunter Robin                                         861    15
20    30      X                                                          857    10
21    751     Please Twins (Onegai Twins)                                855    17
22    1057    Maburaho                                                   851    12
23    93      I My Me! Strawberry Eggs (I My Me! Strawberry Egg)         850    10
24    10      Bastard!! (Bastard!! Ankoku no Hakai Kami)                 848    10
25    630     Stellvia of the Universe (Uchuu no Stellvia)               846    10
26    36      Iketeru Futari                                             841    10
27    177     VanDread                                                   838    12
28    890     Ghost in the Shell 2: INNOCENCE (INNOCENCE)                838    12
29    16      Please Teacher (Onegai Teacher)                            836    20
30    1015    Lunar Legend Tsukihime (Shingetsutan Tsukihime)            836    19
31    49      Love Hina Again                                            829    14
32    1431    Kono Minikuku mo Utsukushii Sekai                          823    10
33    53      Trigun                                                     820    14
34    2400    GANTZ                                                      816    10
35    271     Charcoal Feather Federation (Haibane Renmei)               814    12
36    251     Kiddy Grade                                                810    11
37    1       Crest of the Stars (Seikai no Monshou)                     806    14
38    332     Grave of the Fireflies (Hotaru no Haka)                    798    11
39    51      Love Hina Christmas Special                                795    14
40    206     Ah! My Goddess - The Movie (AA! Megami-sama - The Movie)   794    15
41    275     Nausicaa of the Valley of the Wind                         791    11
42    896     The Eternity You Wish For (Kimi ga Nozomu Eien)            791    15
43    65      Puni Puni Poemy (Puni Puni Poemi)                          788    11
44    5       Banner of the Stars II (Seikai no Senki 2)                 786    13
45    4       Banner of the Stars (Seikai no Senki)                      784    14
46    427     Macross Zero                                               782    11
47    26      The Twelve Kingdoms (Juuni Kokuki)                         779    14
48    25      Quack Experimental Anime Excel Saga (Excel Saga)           778    12
49    90      Samurai X (Rurouni Kenshin)                                771    11
50    99      Serial Experiments Lain                                    770    11
51    67      Macross Plus                                               768    10
52    112     Spirited Away (Sen to Chihiro no Kamikakushi)              751    12

Maybe you better rename the feature... opposite animes? non-related animes?

visnu · Post by **visnu** » Sun Jan 29, 2006 10:43 am

Egg wrote:I had one user who PM’ed me that presented an algorithm for me to review (very thorough 14-Page PDF document). It has been over a decade since I have done math at this level (differential calculus) and I never dug into it deep enough to figure out if it would work or not.

Why not instead suggest to the user to make a (pseudo) SQL implementation of the algorithm. Then you could try it out on your sample database.

Andemon · Post by **Andemon** » Sun Jan 29, 2006 2:17 pm

aid 1544: Elfen Lied

1 778 Mobile Police Patlabor (1989) (Kidou Keisatsu Patlabor (1989)) 919 10
2 387 Boys Before Flowers (Hana Yori Dango) 905 37
3 1667 Gekijouban Air 889 15
4 2651 Kino`s Journey Movie (Kino no Tabi ~the Beautiful World~ -life goes on-) 879 16
5 728 Waga Seishun no Arcadia 878 15
6 688 Ping Pong Club (Ike! Ina-chuu Takkyuubu) 877 20
7 427 Macross Zero 869 333
8 366 Shin Taketori Monogatari ~ Sen-Nen Joou 868 14
9 284 New Kimagure Orange Road - Summer`s Beginning (Shin Kimagure Orange Road - Soshite, Ano Natsu no Hajimari) 861 61
10 377 Those Obnoxious Aliens (Urusei Yatsura) 859 36
...

Quite... I have to agree that it doesn't seem to work too well at the moment.

egg · Post by **egg** » Sun Jan 29, 2006 6:27 pm

DonGato wrote:I think it sucks. Sorry to say it that bluntly.

aid: 749 [Devilman Lady]
Code: Select all
aid   Name    Score                                                      # of Votes
1     109     Mahoromatic - Automatic Maiden (Mahoromatic)               948    12
Maybe you better rename the feature... opposite animes? non-related animes?

It's OK, be blunt. I would rather have bluntness than no feedback.

First of all, I misused the word Similar, I have added this disclaimer:
This does NOT measure how similar the anime are in content, but it measures how similarly people voted for the two animes. If everyone who votes for both of the anime gives the exact same votes for each that would be a score of 1000.

That would mean that the people who have voted for both Devilman Lady and Mahoromatic had similar votes. Let's look at the votes, the left column is votes for Mahoromatic and the right column is Devilman Lady, each row is a different user.

Code: Select all

 vote1 | vote2
-------+-------
   700 |   800
   700 |   900
   900 |   900
  1000 |  1000
   900 |   900
   700 |   700
   700 |   700
   700 |   800
   900 |   800
   800 |   700
   800 |   700
   700 |   600

Thos are fairly similar votes, this appears to mean that people who liked one liked the other. Based on this, I think, that if you voted highly for one of these anime, it would be reasonable to recommend the other one.

Now that I have explained the purpose behind the similarity score, then is that a little less sucky? Or do you think the recommendation would be off base?