I folks.
I finally resolved the problems around Love Hina by ADX. Because there are several version of the episodes of LH, and because people often enough download a whole series without watching them, stuff like this can happen.
What happened: we had four episodes. Two times there were two files that had the same filesize. One time what was labeled Episode 20 was actually another version of Episode 15 (same filesize, different hash and therefor different ed2k).
Another time a person was so was and simply only added CRC and MD5 for episode 12 because that file was already in the DB as episode 20.
I would therefor like to request that there should be two types of warnings when a person adds a file to the DB that has the same filesize as a file that is already in the DB.
Warning number 1:
is displayed to the user who adds the file. He/she should be told, that there is a file "Anime, Episode, file id, name" that is already in the database with the same filesize. If the user has additional questions, he/she should better check the forums and ask a question there, file a creq OR add this file nonetheless if it is for a different Anime than "Anime" OR if it is for the same Episode as "Anime" and simply a different version of the file.
(red letters somewhere in this warning).
The file should only be possible to be added if the user provides information that is _different_ in all hashed from that file _if_ he want to add it to the _same_ Anime. For instance if a user wants to add another file to Love Hina that has the same filesize than another file that is already in the DB that is already of Love Hina there should be a check that ALL hashes (ed2k, SHA1, MD5) are different.
(maybe)
Warning 2:
(if the above is hard to implement)
to an admin. Somebody just added a file.
File 1 info
File 2 info
The admin can simply check whether the files are from different animes or whether it is the exact same episode of the exact same anime.
If someone added a file with the same filesize for two different episodes of the SAME anime, then alarm bells should ring.
It took me almost five hours to find out what the problems with the ADX Love Hina files 12,17,19 and 24 were. Í do not want to go through this again, and I guess nobody else wants to.
Warning when adding file w size of file aready inDB [DENIED]
Moderator: AniDB
Yeah, things like this should be avoided, but...
- Two files can have the same filesize, but belong into entirely different animes. I would really not be surprised if that's the case already, with several 10k files in the DB and the majority of them between 150 and 200 MB.
- For the reason above, the chance of false warnings is too high to make that feauture worthwhile. Sooner or later, it could become near to impossible to add new, perfectly good files without triggering the warning.
- Don't forget that providing CRC, MD5 and SHA1 hashes is optional. For your suggestion to work, these would have to be calculated and added for all files that are already in the DB, and providing these hashes would be required for adding any new files. Needless to say, that's a lot of work.
- The admin who receives the second warning cannot check the contents of the file, since all he sees is the anime, episode number and name, size and ed2k hash. In order to tell if the file is in fact only a broken copy of another file, he'd have to download the two files and view them.
- Two files can have the same filesize, but belong into entirely different animes. I would really not be surprised if that's the case already, with several 10k files in the DB and the majority of them between 150 and 200 MB.
- For the reason above, the chance of false warnings is too high to make that feauture worthwhile. Sooner or later, it could become near to impossible to add new, perfectly good files without triggering the warning.
- Don't forget that providing CRC, MD5 and SHA1 hashes is optional. For your suggestion to work, these would have to be calculated and added for all files that are already in the DB, and providing these hashes would be required for adding any new files. Needless to say, that's a lot of work.
- The admin who receives the second warning cannot check the contents of the file, since all he sees is the anime, episode number and name, size and ed2k hash. In order to tell if the file is in fact only a broken copy of another file, he'd have to download the two files and view them.
There is a certain (small, but increasing) chance for that, but if the size-check would be restricted to the one anime one wants to add the file to, the amount of unneccessary warnings should become negligible, although it still exists (e.g.: Hyper Police/aod, ep 5 and 7).Elberet wrote:- Two files can have the same filesize, but belong into entirely different animes. I would really not be surprised if that's the case already, with several 10k files in the DB and the majority of them between 150 and 200 MB.
I like the idea anyway, it's just a warning after all

About comparing some/all hashes other than ed2k:
I think that's already too much action and should anyway (one day) be solved by not allowing lame files at all.
But: How did the user end up adding ep 12 with "only" md5 and CRC? - Probably because he(/she/it) tried it with the ed2k-link first and then got the "file with same hash+size already exists"-error. Since adding that file as ep 12 was totally right, adding it without the ed2k-link is understandable.
I think it'd be enough to make that error-message more user-friendly:
- Include the info of the file that matches the hash/size (to make it possible/encourage requesting a change for the file with the wrong hash)
- Put a link to the creq-forum and/or
- Let the user send some kind of notification to the mods
(to at least tell that there's something wrong)
Elberet wrote:- The admin who receives the second warning [...]'d have to download the two files and view them.

That's my guess too, and look how long it takes me to tell the admins what the problem is and how to resolve it. I filed creqs for the fids a week ago and was told they are now on hold, then I deleted them and posted thread into the DBcreq forum and at the moment one of three files are changed half a day later.wahaha wrote: But: How did the user end up adding ep 12 with "only" md5 and CRC? - Probably because he(/she/it) tried it with the ed2k-link first and then got the "file with same hash+size already exists"-error. Since adding that file as ep 12 was totally right, adding it without the ed2k-link is understandable.
I guess there is no way to prevent it. Maybe the DB should require at least one entry of the same category? Like if the already present file in the DB only has an ed2k link then the user would have to add an ed2k link as well which is different? I mean it shouldn't be a big problem to get an ed2k link, right? That user was able to get the MD5 and CRC but unable to get the ed2k right? The early alpha of anidb-o-matic can get you that, you won't even have to share the file in edonkey or emule to get the ed2k link.
On a sidenote:
I doubt that a user in the future will be willing to go through this just to get the filesizes of a HKDVDrip right (that's what the ADX release of Love Hina is). With that, the accuracy of AniDB gets lower and lower and in the end we simply have a DB that has a high percentage of simply wrong entries. It would not be a problem if there are multiple entries in the DB since at the moment there is the possibility of multiple versions of a file on the network and they all should be in the list. But if this leads to something like this, that different versions of an episode are entered as different episodes I think it is a problem.
And don't get me started on the low-quality Love Hina files. There is so much wrong about these files it hurts. People simply assumed that the files are from a group and added it to the DB. I really ask myself whether I want to file another 50 creqs to get all those right in the DB as well.
I agree.wahaha wrote: I think it'd be enough to make that error-message more user-friendly:
- Include the info of the file that matches the hash/size (to make it possible/encourage requesting a change for the file with the wrong hash)
- Put a link to the creq-forum and/or
- Let the user send some kind of notification to the mods
(to at least tell that there's something wrong)
Yeah you are right, I think that is what was on exp's and everyone else's mind when they got my creqs for the LH files. I mean I really sat here and watched the whole 25 files again to check that I really have the right episodes.Elberet wrote:- The admin who receives the second warning [...]'d have to download the two files and view them.
Disallowing lame files should do this already... Poll anyone? ^^Skywalka wrote:I guess there is no way to prevent it. Maybe the DB should require at least one entry of the same category? Like if the already present file in the DB only has an ed2k link then the user would have to add an ed2k link as well which is different?
No, I think the user had the ed2k-link, but couldn't add it due to the error-message (and having no idea how to correct the other file's wrong ed2k-link).Skywalka wrote:I mean it shouldn't be a big problem to get an ed2k link, right? That user was able to get the MD5 and CRC but unable to get the ed2k right?
i think so too, the question is if forcing the user to enter an ed2k links wouldn't lead to faked links? i.e. the user changes just one char or types in some random letters/numbers? i think that would hurt anidb way more than some files without ed2k link.wahaha wrote:No, I think the user had the ed2k-link, but couldn't add it due to the error-message (and having no idea how to correct the other file's wrong ed2k-link).Skywalka wrote:I mean it shouldn't be a big problem to get an ed2k link, right? That user was able to get the MD5 and CRC but unable to get the ed2k right?
BYe!
EXP
i've read it, but was too lazy to do anything back thenElberet wrote:This has happened before. I posted about it in the change req forum, but the thread was abandoned.![]()

never the less, i fear that this problem would become more common once we force users to add a (faked) ed2k link on each file add.
BYe!
EXP
I think that once adding files via a client is working (and probably easier to use than the webinterface + various utilities), adding fake hashes (out of stubbornness) should become less of a problem.
Moreover, if the amount of fake entries encreased, the client (= adding the file to ones mylist with a full set of hashes) could help double-checking new entries by building up a list of "trusted" (automatically reviewed) files... but I don't think that this would really be neccessary as long as noone's starting to actively fake entries.
Moreover, if the amount of fake entries encreased, the client (= adding the file to ones mylist with a full set of hashes) could help double-checking new entries by building up a list of "trusted" (automatically reviewed) files... but I don't think that this would really be neccessary as long as noone's starting to actively fake entries.