Other hashes and bitprints like SHA1 [DONE]

darkfader · Post by **darkfader** » Wed Jul 23, 2003 10:40 pm

Any chance someday the database will also include other filehashes such as SHA1?
There are other sites that might be hooked up with anidb... for example
http://bitzi.com/lookup/ed2k:xxxxxxxxxx ... xxxxxxxxxx will give info and other hashtypes. They have XML queries too.
It's no problem if ppl want to stick with eDonkey for now.
But I hope anidb is prepared for future expansion and such.

kidan · Post by **kidan** » Wed Jul 23, 2003 11:18 pm

I think for future development ed2k-network might have to change from md5 to a more secure hash-algo, so when it is time (perhaps some time right ahead) it would be a nice thing to have the db so flexible, that the user could pick his favourite hash/link-type in preferences and it should be easy for exp to add new hash types.

One thing I can say right now: I'd rehash all stuff I have on CD and add the new hashes, when it is time for such action (hopefully not to soon). It is quite sad, that ed2k seems to be on it's way down already (as RIAA and co. try to destroy the network). I'm quite curious what the next generation of p2p will bring up to counter their bullsh**

.

Post by **exp** » Thu Jul 24, 2003 8:39 am

hm, do they offer some way to get xml output when querying by ed2k hash?
seems to me as if they require sha hashes there

BYe!
EXP

wahaha · Post by **wahaha** » Thu Jul 24, 2003 8:58 am

Looks like it

Well, I guess you can leave the hashing to the users (anidb should have enough ^^)...

kidan wrote:I think for future development ed2k-network might have to change from md5 to a more secure hash-algo[...]

*picky mode*: md4 ^^

Discussion-part:
I second the request to add more hash-fields, which should have the lowest importance though, like the md5-field (md4/ed2k > CRC > rest).
SHA-1 (20 bytes) would be the best start, since it'd make it possible/easier to deal with BT-files (one could then find out the ed2k-link for a seedless .torrent for example).
On a positive sidenote, it's also used by some Gnutellas and Freenet.

Although, judging by the emule-devs opinions, I don't think that they really feel the urge to switch to another hash, adding SHA-1 would broaden the (future) use of anidb, which is a really nice idea

Implementation-part:
It won't work without an IRC-command to add the SHA-1 or some other way to automatically add the info - issuing 16000 creqs is most likely out of question, I guess

kidan · Post by **kidan** » Thu Jul 24, 2003 8:51 pm

wahaha wrote:*picky mode*: md4 ^^

Right, MD4 is cracked, MD5 is still secure (but it is in question how long it will last). I could now blame it on my keyboard, but well I guess I was just a little bit tired when typing MD5 in my above post

.

darkfader · Post by **darkfader** » Mon Aug 04, 2003 2:23 am

Some elements of MD4 have indeed been compromised, but that doesn't mean it shouldn't be used anymore. It's still perfectly safe for P2P imho, plus it's fast.
MD5 & SHA (not SHA-1) algorithms have theoretical flaws or something, so might be cracked 'soon' too.
But if it requires days of cracking, it will be targeted to latest warez and rips anyway.

Elberet · Post by **Elberet** » Mon Aug 04, 2003 1:20 pm

Speaking of which, what do you all mean with "cracked"?

MD4 is a hashing algorithm, not an encryption. For every given hash string, there's an infinite amount of input files that will have the same hash - hence reverting the algorithm to obtain the file from the hash is simply impossible.

As far as I know, the "cracking" refers to the SMB Protocol which uses MD4 to hash passwords before sending them over the network. If an attacker obtains the hash, he can revert MD4 (or rather, brute-force) to create a password that will be accepted by the SMB server since the MD4 hash of the user's password and the attacker's created password will match. However, even tho the password will work, it doesn't have to be the same after all. (It's like copying a fingerprint. Your thumb would have the same fingerprint as your victim's, but it'd still be a different thumb...)

Post by **exp** » Mon Aug 04, 2003 7:23 pm

"cracked" in this case means that it is possible to calculate a file with the same size but different content which has the same size.
this could be used to generate fake files with the same hash to disrupt spreading of a file. However, I don't think we'll ever see stuff like that on anime files :o)

BYe!
EXP

Elberet · Post by **Elberet** » Mon Aug 04, 2003 11:02 pm

Hehe, let's hope.

However... isn't there actually more then one hash for each file? I believed there were individual hashes for each 10MB chunk ("hashset"), so that clients can validate the integrity of individual chunks during the download.

If that's the case, one would need to create a file that matches not only the global hash but has also identical hashes for each 10MB of the file. Otherwhise, victim clients will start downloading corrupt data but consider all whole chunks they received to be invalid and re-download them from a different source that's hopefully spreading the original file.

And on top of that, what if a chunk is downloaded from different sources? If one of the sources was spreading a manipulated file that has matching chunk hashes, then the chunk that results from the combination of the good and manipulated data would have a different hash once again.

Post by **exp** » Mon Aug 04, 2003 11:57 pm

the first one wouldn't really be a problem as it would be enough to just create a file where the hashes for the internal 9MB chunks match those of the original file. the main hash can easily be "faked" by a client anyway.
but it is of course true that downloading one chunk from multiple sources would give you invalid crcs again.
as i said before i don't think we have to fear anything like that.

BYe!
EXP

Guest · Post by **Guest** » Tue Aug 05, 2003 12:03 am

Somehow they ended up with chosing 9500KB blocks. But since it's a flat hierarchy, you need to download a whole block before you can determine wether it's ok or not. For accidental errors, it can easily be fixed with the so called ICH (Intelligent Corruption Handling), which does something like that rsync command in linux does. i.e. only copy the parts that differ. Ofcourse a bad person could try to send corrupt data over and over again, but I think that client will get blocked for a while.
Instead of flat, you can use a tree. Like in the TigerTree, the blocks are only as small as 1024 bytes (hashing occurs on one byte more to differentiate between node and block hashes). With smaller blocks, bad senders can be excluded much faster.

darkfader · Post by **darkfader** » Tue Aug 05, 2003 12:07 am

arf, guest... s/like that/like what/g

wahaha · Post by **wahaha** » Tue Aug 05, 2003 6:22 am

It is a possible attack to send a chunk that (only) matches the partial hash: Since the hash matches, the client wouldn't redownload the chunk or identify a "bad client". Even worse, the client would spread it. When the file "completes", the client could only tell that there's a corruption that can't be fixed...

Post by **exp** » Tue Aug 05, 2003 8:49 am

wahaha wrote:It is a possible attack to send a chunk that (only) matches the partial hash: Since the hash matches, the client wouldn't redownload the chunk or identify a "bad client". Even worse, the client would spread it. When the file "completes", the client could only tell that there's a corruption that can't be fixed...

yeah but that would only work if the client downloads the chunk in question from exactly one other (corrupted) client. in most cases your client will probably download one chunk from multiple locations and would therefore be able to detect the corrupted chunk.
however it would be impossible to tell who send the corrupted data in this case.

BYe!
EXP

Elberet · Post by **Elberet** » Tue Aug 05, 2003 2:30 pm

Good point(s). However, I wonder whether this is really a problem that's specific to MD4. It doesn't sound like a big difference, time-wise assuming that the "attacker" relies on a brute-force method, whether one creates a chunk of data that matches a given MD4, MD5 or SHA(1) hash.

Well, maybe this might actually be an idea for one of the next eMule/donkey/Plus releases: When a junk is completed, download random 50K blocks from at least two different sources (in different IP subnets, if possible) and verify that the chunk's hash is the same with the alternative sources.