File add sanity check [DONE]

old granted and denied feature requests

Moderator: AniDB

Locked
Skywalka
Posts: 889
Joined: Tue Sep 16, 2003 7:57 pm

File add sanity check [DONE]

Post by Skywalka »

Hi.

I want the following:

Every file that is added should be crosschecked with all the other files from that episode.

If there is another file with the same filesize that is CRC ok/checked/verified then the person who tries to add this new entry with only the ed2k link and no further information should get a very loud ship horn audio file playing, a red flashing background on the next page and the little sentence "HAVE YOU CHECKED IF THIS FILE IS CORRUPT?!?" in <H1>.

Ok you can forget everything after the "...other files from that episode" but please pretty please help me out here. I can't stand it anymore. I don't demand that it will be prohibited for those users to actually add the files but they need to be reminded _strongly_ that they should re-think adding unecessary files to the DB.

Thx.
wahaha
AniDB Staff
Posts: 1497
Joined: Sun Nov 17, 2002 3:33 pm

Post by wahaha »

That's a good idea, IMO. (Well, the second part ^^)
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

Personally I don't like saying I have a file I don't, as this leads to giving out missleading information about a file's availability to others. Also in many cases, the correct crc is not nessersarily clear. A few examples from files I've added:

http://anidb.ath.cx/perl-bin/animedb.pl ... 5#eid_1255
When I ran my copy of Perfect Blue through AOM, it came up with no matches. Turns out there is a file of the right size in AniDB, but strangely not matching the edk-link of the original Gowenna file of Sharereactor. I added a duplicate file, with correct hash info, and marked it crc correct.

A more complicated one:
http://anidb.ath.cx/perl-bin/animedb.pl ... xpandall=1
Ran all thirteen episodes through AOM, four did not have matching entries. Looked at the file list: the file sizes obviously match the a.f.k. files, but eps 2 and 3 have no 'crc correct', and my crc is different. Somewhat hesitant here, as over a hundred people list the other file, but as pre-AOM days people would check the filesize is the same (but not the hash) before adding files, added my versions. Then realised the other two eps 5 and 6, did have 'correct crc', so it seems likely it's just my versions that are wrong. Thought 'Skywalka will probably kill me for adding redundant files' - but hey, I'd already done it.

Anyway, back to the main point, if we are going to encourage users not to add files that are duplicates but different hashes, I would require a 'have this file but different hash' option, so I'm not giving out missleading iinfomation. Possibly where official crcs are not clear, we would want to have a list of the different hashes too. Oh dear, sound like major db restructuring to me, do at the same time as shadow files maybe. :)

Rar
Skywalka
Posts: 889
Joined: Tue Sep 16, 2003 7:57 pm

Post by Skywalka »

Rar wrote:Personally I don't like saying I have a file I don't
You lost me here already.

"I have a file I don't" ?

Do you mean "I don't have a file"? I guess that would make sense, so I assume you meant that :-)
Rar wrote: , as this leads to giving out missleading information about a file's availability to others. Also in many cases, the correct crc is not nessersarily clear.
Sure, I can understand that. That's what AniDB is there for, and I guess me wanting a warning message for _possibly_ duplicate files is not against this, don't you think?
Rar wrote: A few examples from files I've added:

http://anidb.ath.cx/perl-bin/animedb.pl ... 5#eid_1255
When I ran my copy of Perfect Blue through AOM, it came up with no matches. Turns out there is a file of the right size in AniDB, but strangely not matching the edk-link of the original Gowenna file of Sharereactor. I added a duplicate file, with correct hash info, and marked it crc correct.
See, my warning would simply tell you that there already is a file of similar size, and inform you that you _possibly_ got a wrong file, and you should proceed with caution (and not simply add the file so you have the specific series "complete" for I think this is the main reason while files with those insane values are added to the DB (just the ed2k, no group, nothing changed, just dumping the ed2k and pressing the "add" button, that's what happened most of the time and those are the files I am complaining about) :-)
Rar wrote: A more complicated one:
http://anidb.ath.cx/perl-bin/animedb.pl ... xpandall=1
Ran all thirteen episodes through AOM, four did not have matching entries. Looked at the file list: the file sizes obviously match the a.f.k. files, but eps 2 and 3 have no 'crc correct'
Which makes this example not apply to my request :-)

I only want that message to pop up if a file has the same size of another file _for the same episode_ (not the whole of AniDB), with the file that is already in the DB beeing marked "CRC verified".
Rar wrote: Anyway, back to the main point, if we are going to encourage users not to add files that are duplicates but different hashes, I would require a 'have this file but different hash' option, so I'm not giving out missleading iinfomation. Possibly where official crcs are not clear, we would want to have a list of the different hashes too. Oh dear, sound like major db restructuring to me, do at the same time as shadow files maybe. :)
Well actually it's not. I simply want a reminder so people are reminded they should act in a _sane_ manner and not just using the "add" function like a braindead zombie.

I really would not mind this because the "deprecated" marker usually hides all those idiotic entries after a while and all the entries do not really clutter up the DB, but this way or the other I think adding files is so easy (which is good) that it is also easy to make AniDB fit your own collection of files a bit better so that you won't have to download files again - see most users don't thrive so much to have a collection of CRC verified files (which in my opinion is the main goal of AniDB - to have the _right_ files listed, not a collection of files that are generally available). I think that since the groups started giving out hashes to make people share the right files we should help them reach that goal - and adding files to the DB while the correct files are already listed is a bit useless.

See most of the time those files were shared two years and more and it has been a real bitch to get the correct CRC values. When you file a CREQ, the admins check the request very thoroughly. I think the fact that you can simply add files to the DB that (mostly) very obviously not necessary to be added kind of contradicts this.

But mainly those files simply annoy me because I get notifies. If you, at one time, had files by "no group" of an Anime you will get notifies every time there are files by "no group" added to the DB. I gave the example Inuyasha. Many people follow that series and many (sorry) morons add bogus files to this series. I almost every day get a notify that there is something new for Inuyasha, I visit the page and find upto 8 excamation marks and because there are already so many episodes and so many files it then takes me about five minutes to find out that actually _no_ file is actually interesting for me.

Since in the beginning there were not real groups publishing Inuyasha files I think this applies to almost every user who got Inuyasha files in the past two years. And only enabling the most restrictive notificatioin (group) does not get rid of this problem so I thought it would be good to get the number of files by "no group" a bit down.

Anyway, I only want the users to be reminded what the idea behind AniDB is, and what should be added and what not. I guess (and I think I am right) that most of the time users simply don't take the "add" function to seriously and just need a little reminder what's good and what not since when I first started filing CREQs I was pretty much dragged down to earth by the admins with denies because they did not think my ideas were that good - and I really tried reading up what's good and what not before filing CREQs.

:-)
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

Skywalka wrote:
Rar wrote:Personally I don't like saying I have a file I don't
You lost me here already.
There is a file in the db that matches the one I have, apart from the hash, so in fact I'm sharing a different ed2k link:
This was in the database with good crc and 133 users
ed2k://|file|Aquarian_Age_-_05_-_[a.f.k.](24e57bfd)[AniDB].avi|183265280|dceeded5317ed6855f31e161e239345f|
I added this, bad crc and just me:
ed2k://|file|Aquarian_Age_-_05_-_[a.f.k.](invalid_crc)[AniDB].avi|183265280|1a8d4b6fda5af3c6af4a8763c37714f0|
(edit by wahaha: shortened the links - think of the poor people on 1024x768 ;)

You would say I shouldn't do this, as it clutters the database and is generally annoying.
However, if I just added the first file to mylist
1) I have an inconsistensy with AOM
2) I'm advertising to the community I have a file that I could not share. This doesn't really matter with popular files, but can be crucial for smaller numbers of users.
For instance, in my Perfect Blue example, people had added the first file based on file size being the same as the one they got from sharereactor, but in fact there was no availability on edonkey as the link was a different one.

Anyway, all I was saying is that if you want to discorage users from adding duplicates of the same file, there should be system in place to indicate you do no actually HAVE that file, to avaoid displaying misleading information.

Rar
Skywalka
Posts: 889
Joined: Tue Sep 16, 2003 7:57 pm

Post by Skywalka »

I just want to prevent users from adding files to the DB because they have a file they _think_ is the right one.

Your sharereactor example is pretty neat. Thought about those files I got via SR a lot actually.

You know it might really be possible that when Gowenna was still alive, there usually were no CRCs given out by the encoders/fansub groups and that gowenna simply added a file she got herself and found that it was _working_ and that was just that. So taking the SR ed2k links as a hint (or envirosphere for that matter) are not so good ideas. Enviro was proven unreliable and I really think SR is not so much better. So you brought up a valid point, your example is good - and it shows you thought about it quite a bit when you found out that there was an issue with your file, so you added it. But I bet my ass you added more information to the entry than just pasting the ed2k link and then added it to the DB - so this example would also not apply to my feature request because it aims at the lazy users - obviously not at you :-)

See there is always a story behind a file and I simply want the users to _think_ before adding a file.

If there already is a file in the DB that has a "crc correct" status, it is there for a reason. The obvious way is now _not_ to simply add another file and have two files in the DB of the same size by the same group - this is irritating. While it is still somewhat valid for files that were releaseda couple of years ago it still is a goal of AniDB to point out which file is actually the one that has the encoders harddisk as origin and is in no way flawed.

I don't want to prevent users from adding files that might be corrupt either. In some cases these files are most shared - but! You can bet your ass it is now shared mostly by people who started downloading it in the last couple of weeks. Those folks who make up the first couple of hundred users in the DB (let's take the Inuyasha example - file by nogroup, 200 users have the file, release date sometime in february 2002) don't share this file anymore. So actually the current userbase sharing this file simply share it for the lack of beeing able to get a better version fast - usually it will take them longer to get the correct file by asking others to patch their corrupt version or because getting files that have less users simply take longer to load.

But that will result in this file beeing shared most forever and I (in my very very humble opinion) think this is something AniDB tries to get rid of. If there is a file that actually is valid and CRC checked and can _not_ be taken as acutally beeing wrongly marked so, this file should be shared, not a corrupt other one.

Now you try making a point that there actually might be files that are not really the CRC correct version of a file but just marked so because the file was ed2k linked somewhere else and there actually is no proof that this is the correct version of a file - NICE! I most of the time think the same about older files and some newer files might also rise suspicions. What I do then is I try to investigate about it and when I have proof (or I think I have proof) I go and file a CREQ and get things changed.

I really think that this minor request is just pointing out to users that they should not take the "add" feature too lightly, and it will make them investigate their own situation a bit instead of simply adding a file to the DB just because they have a version that is currently not in the DB. A new file to the DB in my opinion needs to have at least (even pretty small) use, and the files I mentioned above in a lengthy manner usually are of no use at all.

It cannot be a good idea to add files to the DB that are of the same size as a file that currently has 300 users and that was released two years ago. There is a very strong chance that the file the user has is simply corrupt and if he think that the new file is actually the right one, then it should be marked CRC correct in the process of adding that new entry - additionally the entries usually have nothing else than an ed2k link - no release date, no group, no crc status, just "add", ed2k link paste in the right field, and "add" again. That can't be it, sorry.

And honestly, if users are prevented from doing this, I am fine with it.

AniDB is not a petting zoo or a shelter for lost animals. I would not want anybody from adding something to a petting zoo or an animal shelter. But even those have rules.

I know, bad example, but come on, you can't really be _that_ concerned about users not adding files anymore - we now have over 15000 users, there are really enough people to keep the DB going now and we really should think about at least some restrictions - or a bit more complete Userguide for that matter so people now what to do and what not.
Rar
AniDB Staff
Posts: 1471
Joined: Fri Mar 12, 2004 2:41 pm
Location: UK
Contact:

Post by Rar »

I do sympathise Skywalka, and agree with your suggestion:

Big red banner saying 'Are you sure? [link to guidlines in AniDB Documentation - Website]' when user tries to add file of same size as another one in that ep (whole series maybe?)*
If they do submit, perhaps send a message to admins (like creq) so they can check it over.

PetriW probably has a better long term solution though.

Ideally I'd like to spread only the better rips, but I must admit to seldom managing this. With a shared connection and rare files under priorotised on p2p, I tend to pick a group's file based largely on availability, thus following and encouraging the fanboy crowd. When I do come across a good, rare series, like SDFMacross by afiends, I try to keep it shared. Only when ISPs become a little better over upload bandwidth, and, dare I say it, the anime community matures a little, I thinks things will start to change, and AniDB will become increasingly useful.

Rar


* Out of interest, how many ligitimate identical sized files are there in AniDB - same xxx.xxx.xxx but totally different anime?
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Rar wrote:Big red banner saying 'Are you sure? [link to guidlines in AniDB Documentation - Website]' when user tries to add file of same size as another one in that ep (whole series maybe?)*
...
PetriW probably has a better long term solution though.
Why not have both? :D
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

first version of warning page implemented.

warnings are issued if:
- no ed2k link was given
- another file exists for the ep with the same size

BYe!
EXP
Fixy
Posts: 26
Joined: Tue Apr 27, 2004 8:59 pm
Location: Sweden, Boras
Contact:

Post by Fixy »

But now when you request edit it says the file already exist of the same size do you still want to add the file. I think it shouldn't warn like that when you request a change since you don't add the anime again right?

Great, Good work :)
Last edited by Fixy on Sun May 16, 2004 9:50 pm, edited 1 time in total.
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

Fixy wrote:But now when you request edit it says the file already exist of the same size do you still want to add the file. I think it shouldn't warn like that when you request a change since you don't add the anime again right?
already fixed.

BYe!
EXP
Locked