nstgc wrote:The thing with the "subtract 5 instead of 5.5" is that the equation uses squares, and if your data is off balance it will cause problems. Not that it really matters.
The data is basically off balance because people tend to vote with higher votes overall than lower votes anyway... I don't think that this will have a major impact on the outcome anyway.
nstgc wrote:What that last part was doing was just showing you that this cosines method is the same corrilation method at heart.
I agree that they are similar.
nstgc wrote:The middle part shows that your results will be thrown off becuase your taking the quadratic mean which turns out to be influence by the spread.
OK, I got that, but I'm not sure that would make the result any less value. Anyway, I don't think this is a major concern because if there is a large spread, then the cosine result will probably be fairly low anyway and that value will be filtered out.
nstgc wrote:As mentioned before, I hate making examples. I'll try to make an example, but I don't think I can. If you need a better explination I'll do that. The point I'm making is that it will be off. To show that it will be off with an example you need to be able to compare it against something and that something doesn't exist. The problem can only be expressed properly in the manner that I have provided.
OK, how about I give you an example and you give me a method to resolve the issue.
Let us take the following scenario, there are 6 animes a1-a6 and 3 users, u1-u3 (the number of users is not really important, I just chose three so that there are not that many users to deal with, if you need more, add more). All of the users vote exactly the same. We are trying to find the the anime with the closest voting pattern to a1.
Code: Select all
Anime u1 u2 u3
----- -- -- --
a1 8 8 8
a2 10 10 10
a3 9 9 9
a4 8 8 8
a5 7 7 7
a6 6 6 6
With the cosine, all of these animes would have a perfect score (as demonstrated in my previous example). But intuitively a4 should have a better score than the other because it is an EXACT match. I also believe that a4 and a3 should be close, a2 and a6 a little farther away. With the current system, these would have the following values: a2 = 600, a3 = 750, a4 = 1000, a5 = 666, a5 = 333. If I hadn't subtracted 5 from the values first, it would not have been that big of a difference... I think that current system needs to be changed, but the question is how...
nstgc wrote:Also, that first equation, was that basicly how the system works? It doesn't really matter much now since as long as there is that ratio its wrong, but I would like to make sure we're on the same page.
Yes, my explanation should be the way it works.
I just had an idea. Another issue with the current system is that it depends on votes that it does not quite fit what I am looking for at the moment. What I am calculating (ideally, barring errors with logic) is how similarly people voted between two animes. Although this is useful, it is not quite what I want. What I want is to know if someone likes this anime, what other animes will they like. Well, with the current system, lets say a number of users vote 1s and 2s for two anime. Their voting is similar and would result in a high score, but that does not necessarily mean that someone who liked one of the anime would like the other one. This is something that I have been thinking about for a while, I had implemented one idea in a previous trial, but that turned out to cause other problems. So, I just had a thought. Since I want to know how people who like this amine will like, maybe I should just look at the users who like or recommend the anime (or a vote of 7 or more). Lower votes for this anime would get filtered out. If someone votes 7 for this anime, but 1 for another anime, that one will remain... So let's look at this set of votes:
Code: Select all
Anime u1 u2 u3 u4 u5 u6 u7 u8 u9
----- -- -- -- -- -- -- -- -- --
a1 1 7 8 5 9 8 3 7 9
a2 10 6 10 5 9 6 1 5 8
When comparing a1 to a2, the following votes would be used:
Code: Select all
Anime u2 u3 u5 u6 u8 u9
----- -- -- -- -- -- --
a1 7 8 9 8 7 9
a2 6 10 9 6 5 8
But when comparing a2 to a1, the following votes would be used:
Code: Select all
Anime u1 u3 u5 u9
----- -- -- -- --
a1 1 8 9 9
a2 10 10 9 8
That would mean that a1's referral score for a2 would be different than a2's referral score for a1.
Any thoughts? The other logic will not go away, it will probably be used for finding similar animes and/or averaged somehow with the new result. I will probably mock this up to see what happens. I know this will sigificantly reduce the data set, but if the other data is just noise, then that shouldn't be a problem.