AICH extended tag support [tracked]

old granted and denied feature requests

Moderator: AniDB

L'Eliminateur

AICH extended tag support [tracked]

Post by L'Eliminateur » Tue Sep 14, 2004 12:31 am

dunno if anidb supports it now(extended ed2k tags), it would be a great idea to allow the new AICH tag from emule 44, it would help A LOT and it's transparent to old non compliant clients

L'Eliminateur

User avatar
DonGato
Posts: 1296
Joined: Sun Nov 17, 2002 9:08 pm
Location: The Pampas, The land of the Gaucho!
Contact:

Post by DonGato » Tue Sep 14, 2004 7:53 am

I think that first it should prove to be really useful.

rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 » Tue Sep 14, 2004 12:50 pm

I've seen ICH recover pretty large ~6-8Mb chunks lately. So it IS useful, I think. However, I'm not really sure if it really should be in DB, as it is generated and passed around by eMule automatically anyway. Only files where AICH from DB would be useful are old/rare files with only old clients as sources that do not give you AICH.

User avatar
DonGato
Posts: 1296
Joined: Sun Nov 17, 2002 9:08 pm
Location: The Pampas, The land of the Gaucho!
Contact:

Post by DonGato » Tue Sep 14, 2004 1:00 pm

rowaasr13 wrote:I've seen ICH recover pretty large ~6-8Mb chunks lately. So it IS useful, I think. However, I'm not really sure if it really should be in DB, as it is generated and passed around by eMule automatically anyway. Only files where AICH from DB would be useful are old/rare files with only old clients as sources that do not give you AICH.
Get a better client then...
So it IS useful != I think. Or is or is not, but not maybe. That's why I said to wait.
About the old records. What's the point to build an AICH entry if you don't have the info for it and the client doesn't support it? Remember the AICH string in the ed2k link is a mere ed2k hash to the SHA1 stored in the client having the file (or so they say in the "documentation").

rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 » Tue Sep 14, 2004 2:25 pm

Just to be clear: I'm not original poster and I'm not advocating for AICH.
Get a better client then...
Maybe I dindn't said it clear enough, so I'll try to reword it. As I understand AICH implementation (didn't had time to inspect its code yet) only downloading client must have AICH in order to use it to restore corrupted data. So it could be useful if I, with NEW client, will try to download rare file that only uploaders with OLD clients have. I won't get AICH from them, no matter how good MY client is, because normally AICH is provided by uploader. So, for such files external (stored in DB) AICH might be useful, but I don't really think that there are too many such files.
Last edited by rowaasr13 on Tue Sep 14, 2004 2:35 pm, edited 1 time in total.

User avatar
DonGato
Posts: 1296
Joined: Sun Nov 17, 2002 9:08 pm
Location: The Pampas, The land of the Gaucho!
Contact:

Post by DonGato » Tue Sep 14, 2004 2:34 pm

Please read before giving opinions. ;)

http://forum.emule-project.net/index.ph ... opic=58114

You're requesting that AniDB reads the known2.met and store "about 24000 hash for a 4GB files and 48000 hashs for a complete hashtree". :roll:

rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 » Tue Sep 14, 2004 2:48 pm

Got it. Quite simple and works like a chram.

Well, in that case AniDB could be source of "trusted hash", adding AICH master to link automatically. Since most of hashing is done automatically by AoM on locally stored files it will be quite hard to compromise AICH stored here and since exchanged AICH is not saved and not trusted until at least 10 AICH sources agree, once again, this could really help with rare files.

PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW » Tue Sep 14, 2004 10:24 pm

This sounds like a really bad idea to me, it's far too much data to validate it's usefulness.
20kb may not sounds much for one file but there's a lot of files on anidb and many of them are quite large which would lead to AniDB hosting like 1.5gb of hashes.... (13049gb files * 6000 hashes * 20 byte)

Elberet
Posts: 778
Joined: Sat Jul 19, 2003 8:14 pm

Post by Elberet » Wed Sep 15, 2004 1:34 am

Is it just me, or does thing whole thing sound exactly like how BitTorrent works?

- Each 9MB chunk is split into many smaller chunks of about 175KB.
- Each mini-chunk is hashed using SHA1.
- The hashes are once again hashed using SHA1.
- The resulting hash is called the root-hash.

The difference between eMule and BitTorrent is that these hashes are only used to repair broken chunks. The old ed2k-hash, which identifies a file in the network, is, together with the filesize, still the one and only bit of information that determines which file a user is going to receive.

The new AICH root-hash is either obtained through an ed2k:// link that includes it, or from the network itself - a statistical analysis determines if it is the right hash or not. The hashset itself, however, is never not part of the ed2k:// link and is never transferred over the network - at least not completely. A client may, if it has detected a corruption in a downloaded chunk, ask another client to send some of the AICH hashes for the corrupted chunk.

So, for AniDB, this means that it doesn't have to store the sub-hashes if it wants to support AICH. If supporting it is desired, AniDB would only have to store one additional hash in it's database: the AICH root-has for a file. Doing more, e.g. storing the complete AICH hashset, is neither needed nor possible.

Guest

Post by Guest » Wed Sep 15, 2004 4:40 am

exactly as Elberet said, i don't know why the rest of you went to assume that it implied some hash hosting by anidb, you're wrong
the AICH hash doesn't needs the complete hashset in the ed2klink, and the recovery hashes are passes P2P between clients.

anidb only needs to add the additional AICH hash, for example in appleseed:
ed2k://|file|Appleseed.ogm|724916344|A82CA58D41BD737DD6F41933970221C8|h=D5V6ZM54TG2FOXQEKT3WPF3ZFD3SNY4Q|/

the AICH hash is the h= at the end, that's all that is needed for AICH to work on anidb.

and if your read the documentantion for aich(for the one that said that it propagates automatically), it needs to have at least 10 different FULL EQUAL sources for that AICH, and even then it doesn't trusts it and only uses it in runtime(until you finish the file and your client starts propagating the AICH you generated).
If you provide a link with an AICH hash it is deemed trusted and it's propagated right away, thus helping prevent corruption when there are not much full spources


L'Eliminateur

User avatar
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp » Wed Sep 15, 2004 7:03 am

as this seems to be a proprietary emule thing I don't see why we should implement it atm.
I wonder why they had to create yet another hashing method anyway.
Doesn't tiger tree or rather THEX in general work almost like this?

BYe!
EXP

wahaha
AniDB Staff
Posts: 1497
Joined: Sun Nov 17, 2002 3:33 pm

Post by wahaha » Wed Sep 15, 2004 7:56 am

exp wrote:I wonder why they had to create yet another hashing method anyway.
Doesn't tiger tree or rather THEX in general work almost like this?
I wonder aswell, but maybe they didn't want to create confusion with DC's implementation of TTH, since emule's (leaf) segments are 180KB - probably because emule's compression uses the same segment size. Assuming that DC's segments are smaller, this could quickly create confusion as there would then be two different valid "TTHs" for the same file.
L'Eliminateur wrote:and if your read the documentantion for aich(for the one that said that it propagates automatically), it needs to have at least 10 different FULL EQUAL sources for that AICH, and even then it doesn't trusts it and only uses it in runtime(until you finish the file and your client starts propagating the AICH you generated).
Well, AICH is still new and leaves room for improvements. One could, for example, easily keep track of (emule) clients who are known to provide valid data / hashes, so the information they send could be considered more trustworthy.
Elberet wrote:Is it just me, or does thing whole thing sound exactly like how BitTorrent works?
Hash trees are no invention of BT. More so as BT doesn't use such - at least no more than ed2k did ever since.
I'd rather say that BT works (in that respect) like ed2k with smaller chunks: Both use a "flat" hashset (chunks fixed in ed2k: 9500 KiB, "pieces" variable in BT: powers of 2 bytes, claims the documentation). This hashset is then again hashed to create the ed2k-hash / info-hash (BT).
(BT actually hashes more than just the hashset, see the docs)

rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 » Wed Sep 15, 2004 8:18 am

PetriW wrote:This sounds like a really bad idea to me, it's far too much data to validate it's usefulness.
Only AICH root hash is added to link. Tree is exchanged between clients and cannot (and need not to) be added.

Check link DonGato provided. As described there and as I and Guest already said, root hash is exchanged between clients too, but you need at least 10 clients to send you same hash. And this is where AniDB could help, serving as source for trusted hash.

User avatar
DonGato
Posts: 1296
Joined: Sun Nov 17, 2002 9:08 pm
Location: The Pampas, The land of the Gaucho!
Contact:

Post by DonGato » Wed Sep 15, 2004 10:21 am

exp already stated why this won't be added soon to AniDB. So, we can stop discusing if it should or not be added for now.

But for the people that cry it's usefulness here some comments by the current admin of eMule Plus project that some months ago was working with that part of the code and even made some changes latelly...
Aw3 wrote:Unfortunately, ICH improvement is nothing to do with AICH. It was just a fix to minimize amount of redownloaded data.
Before that works that way:
* Received the whole 180k block (which fixes current corruption)
* Buffer it
* Send request for the next 3 blocks
...
* Received the next (unrequired block)
* Buffer it
* Send request for the next 3 blocks
...
* It's time for data flushing (enough data was buffered or timeout [3 minutes])
* check a part hash -- it's fine -- mark a part as complete and remove all requested blocks for this part.
* Received 1st of before requested blocks -- already written
* Received 2st of before requested blocks -- already written
* Received 3st of before requested blocks -- already written

In this example we received at least 4 additional unrequired 180k blocks. This number can be bigger depending on data buffer size and download speed.

How it works now:
* Received the whole 180k block (which fixes current corruption)
* Buffer it
* Force flush
* Send request for the next 3 blocks
...
* Data flushing
* check a part hash -- it's fine -- mark a part as complete and remove all requested blocks for this part.
* Received 1st of before requested blocks -- already written
* Received 2st of before requested blocks -- already written
* Received 3st of before requested blocks -- already written

eMule with new AICH still has this problem. I took moment look at AICH implementation (2 minutes), and it's badly designed. They use SHA hashes for 180k blocks. There are currently two used and transferred hash sets (one of them is huge). Standard 32-bit CRC would be enough for that. There is even faster 32-bit CRC in zlib library used to protect zstream. I used that possibility for "improved compressed stream handling" to protect from rx data corruption.

Sometimes I feel that we must also add protocol extentions. I don't know maybe we can add own AICH and use it between e+.

The stuff which is similar (only in case of compressed stream) to AICH is already in our features list -- "improved compressed stream handling".

I need to study AICH, but I think that in the most cases for compression stream "improved compressed stream handling" is still better and much faster than AICH. As AICH will redownload more additional data as probably data hashing is implemented during flush (as before). That will produce the situations I described in details above.
The only weak place in "improved compressed stream handling" is impossibility of checking not full compressed stream (when there is a disconnection before receiving full zstream), but this isn't happen very often.

rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 » Wed Sep 15, 2004 11:33 am

DonGato wrote:exp already stated why this won't be added soon to AniDB. So, we can stop discusing if it should or not be added for now.
Why? Continued discussion might reveal additional pros or cons.
Aw3 wrote:The only weak place in "improved compressed stream handling" is impossibility of checking not full compressed stream (when there is a disconnection before receiving full zstream), but this isn't happen very often.
Heh... This happens to be most important difference in AICH vs ICSH for me. I often get lots of files from other sources than ed2k (like someone's else HDD, so I do mean "lots"). I use simple mod for eMule I written myself which can import data from local files into running downloads, thus allowing me to quickly fix files that doesn't match correct ed2k hash by redownloading only corrupt parts instead of entire file. Before I had to redownload entire 9,28 block for every bit of corruption and with AICH this amount is reduced dramatically, which ICSH, alas, can't do. Not that there are lots of people who do that, but still...

Moreover, AICH should work in all mods based on latest version of eMule and ICSH works only in e+ and mods that borrowed this feature.

Locked