Basically I am stuck on what to do with the hint. From my own usage I would say the current system is a combination of surprisingly accurate suggestions and spectacular failures. The things that have been at the top of my personal hints have either been things that I have really liked or things that I have particularly disliked. Although the work I have done has improved the hint, it still is lacking in some of its predictions.
My question is, how should it be improved?
There are a number of options:
- Add more logic to the comparisons to try improve accuracy. There are already a number of suggestions listed in [Anime Hint] - Feature Requests. The trouble with these though is a meaningful equation needs to be created to accommodate them AND I believe they will only have a minimal impact on the results (based on the work I have already done).
- Create a new algorithm using different logic or assumptions for finding similar users. I had one user who PM’ed me that presented an algorithm for me to review (very thorough 14-Page PDF document). It has been over a decade since I have done math at this level (differential calculus) and I never dug into it deep enough to figure out if it would work or not. Another user PM’ed me and was interested in trying to apply machine learning to improve the results. There are probably other ideas a well. The trouble with these though is that they would take greater resources and until they are implemented it is hard to determine if they would help. I have already tried to do something like this with the Pearson’s logic, but it only made marginal improvements.
- Find out what other sites are doing and use their logic as a basis for finding the hint. Along those lines, I found this interesting article: Amazon Recommendations. If anyone else has some information about how it is done other places, I'd be interested in seeing it.
Another issue is people have different rating scales, so one user may vote 5, 5, 6 and another 8, 8, 9 for the same animes. That shows a trend that the two users both like the last anime slightly more than the other two, but it is difficult for the system to know if the users liked the animes about the same, just they use a different scale, or if the second user really like the animes better than the first one did. I addressed this somewhat with the Pearson’s algorithm, but it is still a challenge.
If we implemented and anime-to-anime filtering system, that would address these issues. First, let me explain how I would envision it working. What it would do is go through and compare each anime to every other anime, and use the votes the users had made on those two anime and come up with a score of how “similar” the anime are. Using my previous example: user1 votes 5 for A, 5 for B, and 6 for C, user2 votes 8 for A, 8 for B, and 9 for C. The system would first compare A to B and find user1 voted 5 for both and user2 voted 8 for both; these animes are similar because users gave it the same vote, lets say it assigns a value of 100 for a perfect match. Then the system would compare A to C and find user1 voted 5 and 6, user2 voted 8 and 9; this would find the users gave similar ratings, but not as close as A and B, so it assigns a similarity score of 90. It would continue doing this through all of the anime. This IS very intensive processing, but it can be done off line and stored.
Once a similarity value has been determined for all anime, then doing the hint would be very easy, you would go through the anime the user rated highly, find the anime that are similar to those and output the results. This would make the actual hint much less CPU intensive (and scalable as demonstrated by Amazon).
This method also would reduce the recommendations of the popular animes just because they were watched by similar users. Since this would not care about what peoples votes were, just whether or not they were similar to a vote on another anime (people voting 1 on two different animes is a similarity), then it will not be swayed by such votes. It will go through those anime the user has voted highly and show the user what is similar to those.
As far as the scale of the voting (one person voting 5 while another votes 8 for the same anime even if they liked it the same amount) is completely negated because the system does not care what the vote is, just if they are similar or not. Since we would not be comparing a user’s votes to their own votes, they will always be on the same scale.
This also has the benefit of having the capability of now listing similar anime within AniDB, and it would also make the hint useful even for users with relatively few votes.
The biggest drawback of this approach is the amount of processing that needs to be done to find the similar anime. But this processing could potentially be done completely offline (a dump of the voting data could be made, the processing is done on a separate machine, and the results are uploaded back).
Anyway, I would appreciate any feedback. I didn’t know what I wanted when I started writing this, but after working through it, I like the Amazon model. Exp may not want to deal with the processing necessary, so it may not even be an option. I would like to hear any ideas that people have about the hint.