TTH hash support [tracked]

old granted and denied feature requests

Moderator: AniDB

ender
Posts: 7
Joined: Thu Jan 15, 2004 6:24 pm
Location: The Sunny Side of the Alps
Contact:

TTH hash support [tracked]

Post by ender »

Now that DC++ and Valknut (ex DCGui-Qt) support TTH hashes for their files, it would be nice, if AniDB added support for those, too.
exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

if AoM would be able to support those, we could think about it.

BYe!
EXP
ender
Posts: 7
Joined: Thu Jan 15, 2004 6:24 pm
Location: The Sunny Side of the Alps
Contact:

Post by ender »

So I need to poke AoM devs?
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

Hmm, AOM can generate Tiger hashes but I don't really know anything about Tigertree (or well, the terminology)... Give me a link to an explanation and I'll say yes or no. :D
ender
Posts: 7
Joined: Thu Jan 15, 2004 6:24 pm
Location: The Sunny Side of the Alps
Contact:

Post by ender »

exp
Site Admin
Posts: 2438
Joined: Tue Oct 01, 2002 9:42 pm
Location: Nowhere

Post by exp »

the question here would also be how much additional overhead those new hashes would mean @ hashing of files.
if that hasing algorithm is really cpu intensive, it might not be worth it.

BYe!
EXP
analogued
Posts: 54
Joined: Mon Jul 12, 2004 6:53 am

Post by analogued »

You could take a look at Bitcollider. Since it is released in the Public Domain there shouldn't be any issues with "stealing" the code for the magnet links. Note that TTH is a different thing (though it is related to the magnet links which use SHA1).

The SHA1 in the magnet links is in BASE32 I think, as opposed to the SHA1 hashes from AniDB which are in BASE16. There was also a post with the alphabet they use on the Shareaza forum (I'll try to find it). I don't know if it's possible to convert between the BASE16 and BASE32 if you have one of them. That would mean that you could generate the BASE32 SHA1 hash from the BASE16 hashes which are already added to Anidb.

The Magnet-URI speciffication is universal and not bound to Gnutella2 (Shareaza first implemented it), Gnutella or Direct Connect clients. It can be implemented by any program.

Using the SHA1 hash in BASE32 and the filesize it would be possible to generate Magnet links, beside the Edonkey links which are already generated; this would open the door for a whole new bunch of file-sharing clients.

The most basic Magnet URI has the following structure:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
where YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C is the SHA1 hash in BASE32. Note that it doesn't use forward slashes (//) like the Edonkey links, but a question mark (?).

You can also add a name to the Magnet URI, like this:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&dn=A+Nice+Anime.avi
Plus signs (+) are used instead of spaces (I think you can also use %20; it depends on the client I guess).

I won't go into explaining the TTH issue here, but just remember that TTH hashes can be added to the Magnet URI's as an extra information (however, they are not necessary). Also, on their own, TTH hashes are useless. I may be wrong here, since it's been a while since I read about the subject, but I think the chances of me being wrong on this are pretty slim.

Currently, Magnet-URI's are implemented by Shareaza (Gnutella2), a bunch of Gnutella clients, Kazaa and DC++. I may have missed a few clients but that's the majority of them.

Some misc links:
- Some SHA1 stuff in PERL
- Some more PERL and PHP SHA1 stuff. Also some conveverion between bases and the alphabet used for the SHA1 hashes
- The RFC that describes the BASE32 SHA1 encoding and the alphabet used
ender
Posts: 7
Joined: Thu Jan 15, 2004 6:24 pm
Location: The Sunny Side of the Alps
Contact:

Post by ender »

analogued: magnet links don't mean SHA1 hashes, they can contain any hash that can more-or-less uniquely identify a file, be it SHA1, md5 or TTH. DC++ and Valknut only support TTH hashes, so a magnet without one is useless to them.

As for hashing speed, I started hashing my complete Valknut share today (1.4TiB), and I'll report how long will take (btw, I read on DC++ forum that shareaza folks have released assemlber-optimized TTH hasher, which is supposed to be 80% faster than the implementation in DC++).

I got these results on my Linux box, while Valknut was hashing files in the background (tried twice, results were about the same):

Code: Select all

# ll Ah\!\ My\ Goddess\ CD1.mkv
-rw-rw----  1 ender sambaadmin 732177518 apr 12 08:24 Ah! My Goddess CD1.mkv
# time nice -n -19 md4sum -e Ah\!\ My\ Goddess\ CD1.mkv
ed2k://|file|Ah! My Goddess CD1.mkv|732177518|4f3fde6483f1c90266ae4fba2bbf3c92|

real    0m47.770s
user    0m7.623s
sys     0m4.577s
# time nice -n -19 tthsum Ah\!\ My\ Goddess\ CD1.mkv
VVTVDOWFUWJTUAIRJJJLJAR3MOTZIR6APXGKJZY  Ah! My Goddess CD1.mkv

real    1m0.307s
user    0m20.633s
sys     0m4.094s
tthsum's manpage says that it's using tth code directly copied from DC++.
analogued
Posts: 54
Joined: Mon Jul 12, 2004 6:53 am

Post by analogued »

ender wrote:analogued: magnet links don't mean SHA1 hashes, they can contain any hash that can more-or-less uniquely identify a file, be it SHA1, md5 or TTH.
Yes, you're right about that... I forgot to mention it in my post. However, I think TTH has a different meaning and use in the case on the SHA1 Magnet-URI's (or maybe this is just Shareaza'a case).
ender wrote:DC++ and Valknut only support TTH hashes, so a magnet without one is useless to them.
Just checked the DC++ forum and that's correct. I don't know why I remembered seeing SHA1 over there.. Oh well

However, between the two, I think the SHA1 Magnet-URI is more popular.
PetriW
AniDB Staff
Posts: 1522
Joined: Sat May 24, 2003 2:34 pm

Post by PetriW »

I would prefer to not want to implement this soon. ;)
Question is, how important is it? I mean, how many are using the sha1 and md5 hashes? Anyone?
pelican
AniDB Staff
Posts: 234
Joined: Wed Aug 11, 2004 11:19 pm

Post by pelican »

PetriW wrote:I would prefer to not want to implement this soon. ;)
Question is, how important is it? I mean, how many are using the sha1 and md5 hashes? Anyone?
MD5 is actually my standard file checking mechanism, so I appreciate having it. Certainly just having CRC-32 wouldn't be good, as that's nowhere near being a secure hash.
DonGato
Posts: 1296
Joined: Sun Nov 17, 2002 9:08 pm
Location: The Pampas, The land of the Gaucho!
Contact:

Post by DonGato »

Have you ever found a problem by using CRC-32?! 8O
I think people are over-exaggerating the problems of it...
Yes, MD4 is still insecure, just stop using ed2k!
Snakes
Posts: 60
Joined: Wed May 12, 2004 2:45 pm
Location: Norway

Post by Snakes »

I'd love to see TTH hashes in AOM as some of the ep's I get come from DC++, where there are times its a gamble if I get the right file or a corrupted one.

example:
I search for 'some series - ep 04' and the results might look like this
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
(note, the 'hashes' provided here are purely fictional, hampered the keyboard a little, its only for demonstration :))

the only thing I know for certain is that the filesize of the file I want is the correct size, but as for which of the four diffrent files would be the uncorrupted one, thats a pure gamble. if I knew what TTH hashes to look for it would be easy to spot the right one.
besides, if AOM had TTH hashes I could make my DC++ client search for the hash instead for the file and then get the right file right away. one other thing to think about is that even more ppl could be persuaded to start using AOM if it provided TTH hashes, as the Anime community on DC++ is fairly large and if they had access to a huge database that provided TTH hashes and let them find the perfect files. as a sideeffect DC++ in time might be cleaned up from all the corrupted stuff thats shared over there.

as for the sha and md5 hashes, Ive never used them.. ;)
rowaasr13
Posts: 415
Joined: Sat Sep 27, 2003 4:57 am

Post by rowaasr13 »

DonGato wrote:Have you ever found a problem by using CRC-32?!
I had problems with stupid people, who think that if they have corrupt file that doesn't match crc32, all they have to do is to flip few bits to make it match, corrupting file even futher - just search google for such "checksum fixers". It's (almost) impossible to do it with md5, and as pelican already said, md5 is widely used for corruption checking (especially in *nix world), so having them is really good thing(tm).
Skywalka
Posts: 889
Joined: Tue Sep 16, 2003 7:57 pm

Post by Skywalka »

/sidenote: I know why I kept all my files on HDD. I knew a new hash would come along some day, and it came earlier than a year after MD5 and SHA1 were accepted as valid fields :-)

I really think if TTH is accepted as another hash we should as well think very hard about other possible hashsums that might be necessary. From a certain point on I think I won't be willing to pull out all the old discs and hash my whole collection again to get the info right (and since I thrive to keep AniDB up to date it would really really really bug me if I knew there was a new hashsum implemented and and I would be able to update the fileentries by running AoM over them.

So please if you really consider adding this hash please add all other possible hashes that might one day be necessary.
Locked