TTH hash support [tracked]
Moderator: AniDB
TTH hash support [tracked]
Now that DC++ and Valknut (ex DCGui-Qt) support TTH hashes for their files, it would be nice, if AniDB added support for those, too.
Here are some links that you may find useful:
http://www.open-content.net/specs/draft ... ex-02.html
http://www.dslreports.com/faq/9677
http://dcplusplus.sourceforge.net/forum ... hp?t=11910
http://wallie.selwerd.nl/source/tthsum/
http://www.open-content.net/specs/draft ... ex-02.html
http://www.dslreports.com/faq/9677
http://dcplusplus.sourceforge.net/forum ... hp?t=11910
http://wallie.selwerd.nl/source/tthsum/
You could take a look at Bitcollider. Since it is released in the Public Domain there shouldn't be any issues with "stealing" the code for the magnet links. Note that TTH is a different thing (though it is related to the magnet links which use SHA1).
The SHA1 in the magnet links is in BASE32 I think, as opposed to the SHA1 hashes from AniDB which are in BASE16. There was also a post with the alphabet they use on the Shareaza forum (I'll try to find it). I don't know if it's possible to convert between the BASE16 and BASE32 if you have one of them. That would mean that you could generate the BASE32 SHA1 hash from the BASE16 hashes which are already added to Anidb.
The Magnet-URI speciffication is universal and not bound to Gnutella2 (Shareaza first implemented it), Gnutella or Direct Connect clients. It can be implemented by any program.
Using the SHA1 hash in BASE32 and the filesize it would be possible to generate Magnet links, beside the Edonkey links which are already generated; this would open the door for a whole new bunch of file-sharing clients.
The most basic Magnet URI has the following structure:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
where YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C is the SHA1 hash in BASE32. Note that it doesn't use forward slashes (//) like the Edonkey links, but a question mark (?).
You can also add a name to the Magnet URI, like this:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&dn=A+Nice+Anime.avi
Plus signs (+) are used instead of spaces (I think you can also use %20; it depends on the client I guess).
I won't go into explaining the TTH issue here, but just remember that TTH hashes can be added to the Magnet URI's as an extra information (however, they are not necessary). Also, on their own, TTH hashes are useless. I may be wrong here, since it's been a while since I read about the subject, but I think the chances of me being wrong on this are pretty slim.
Currently, Magnet-URI's are implemented by Shareaza (Gnutella2), a bunch of Gnutella clients, Kazaa and DC++. I may have missed a few clients but that's the majority of them.
Some misc links:
- Some SHA1 stuff in PERL
- Some more PERL and PHP SHA1 stuff. Also some conveverion between bases and the alphabet used for the SHA1 hashes
- The RFC that describes the BASE32 SHA1 encoding and the alphabet used
The SHA1 in the magnet links is in BASE32 I think, as opposed to the SHA1 hashes from AniDB which are in BASE16. There was also a post with the alphabet they use on the Shareaza forum (I'll try to find it). I don't know if it's possible to convert between the BASE16 and BASE32 if you have one of them. That would mean that you could generate the BASE32 SHA1 hash from the BASE16 hashes which are already added to Anidb.
The Magnet-URI speciffication is universal and not bound to Gnutella2 (Shareaza first implemented it), Gnutella or Direct Connect clients. It can be implemented by any program.
Using the SHA1 hash in BASE32 and the filesize it would be possible to generate Magnet links, beside the Edonkey links which are already generated; this would open the door for a whole new bunch of file-sharing clients.
The most basic Magnet URI has the following structure:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C
where YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C is the SHA1 hash in BASE32. Note that it doesn't use forward slashes (//) like the Edonkey links, but a question mark (?).
You can also add a name to the Magnet URI, like this:
magnet:?xt=urn:sha1:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C&dn=A+Nice+Anime.avi
Plus signs (+) are used instead of spaces (I think you can also use %20; it depends on the client I guess).
I won't go into explaining the TTH issue here, but just remember that TTH hashes can be added to the Magnet URI's as an extra information (however, they are not necessary). Also, on their own, TTH hashes are useless. I may be wrong here, since it's been a while since I read about the subject, but I think the chances of me being wrong on this are pretty slim.
Currently, Magnet-URI's are implemented by Shareaza (Gnutella2), a bunch of Gnutella clients, Kazaa and DC++. I may have missed a few clients but that's the majority of them.
Some misc links:
- Some SHA1 stuff in PERL
- Some more PERL and PHP SHA1 stuff. Also some conveverion between bases and the alphabet used for the SHA1 hashes
- The RFC that describes the BASE32 SHA1 encoding and the alphabet used
analogued: magnet links don't mean SHA1 hashes, they can contain any hash that can more-or-less uniquely identify a file, be it SHA1, md5 or TTH. DC++ and Valknut only support TTH hashes, so a magnet without one is useless to them.
As for hashing speed, I started hashing my complete Valknut share today (1.4TiB), and I'll report how long will take (btw, I read on DC++ forum that shareaza folks have released assemlber-optimized TTH hasher, which is supposed to be 80% faster than the implementation in DC++).
I got these results on my Linux box, while Valknut was hashing files in the background (tried twice, results were about the same):tthsum's manpage says that it's using tth code directly copied from DC++.
As for hashing speed, I started hashing my complete Valknut share today (1.4TiB), and I'll report how long will take (btw, I read on DC++ forum that shareaza folks have released assemlber-optimized TTH hasher, which is supposed to be 80% faster than the implementation in DC++).
I got these results on my Linux box, while Valknut was hashing files in the background (tried twice, results were about the same):
Code: Select all
# ll Ah\!\ My\ Goddess\ CD1.mkv
-rw-rw---- 1 ender sambaadmin 732177518 apr 12 08:24 Ah! My Goddess CD1.mkv
# time nice -n -19 md4sum -e Ah\!\ My\ Goddess\ CD1.mkv
ed2k://|file|Ah! My Goddess CD1.mkv|732177518|4f3fde6483f1c90266ae4fba2bbf3c92|
real 0m47.770s
user 0m7.623s
sys 0m4.577s
# time nice -n -19 tthsum Ah\!\ My\ Goddess\ CD1.mkv
VVTVDOWFUWJTUAIRJJJLJAR3MOTZIR6APXGKJZY Ah! My Goddess CD1.mkv
real 1m0.307s
user 0m20.633s
sys 0m4.094s
Yes, you're right about that... I forgot to mention it in my post. However, I think TTH has a different meaning and use in the case on the SHA1 Magnet-URI's (or maybe this is just Shareaza'a case).ender wrote:analogued: magnet links don't mean SHA1 hashes, they can contain any hash that can more-or-less uniquely identify a file, be it SHA1, md5 or TTH.
Just checked the DC++ forum and that's correct. I don't know why I remembered seeing SHA1 over there.. Oh wellender wrote:DC++ and Valknut only support TTH hashes, so a magnet without one is useless to them.
However, between the two, I think the SHA1 Magnet-URI is more popular.
MD5 is actually my standard file checking mechanism, so I appreciate having it. Certainly just having CRC-32 wouldn't be good, as that's nowhere near being a secure hash.PetriW wrote:I would prefer to not want to implement this soon. ;)
Question is, how important is it? I mean, how many are using the sha1 and md5 hashes? Anyone?
I'd love to see TTH hashes in AOM as some of the ep's I get come from DC++, where there are times its a gamble if I get the right file or a corrupted one.
example:
I search for 'some series - ep 04' and the results might look like this
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
(note, the 'hashes' provided here are purely fictional, hampered the keyboard a little, its only for demonstration )
the only thing I know for certain is that the filesize of the file I want is the correct size, but as for which of the four diffrent files would be the uncorrupted one, thats a pure gamble. if I knew what TTH hashes to look for it would be easy to spot the right one.
besides, if AOM had TTH hashes I could make my DC++ client search for the hash instead for the file and then get the right file right away. one other thing to think about is that even more ppl could be persuaded to start using AOM if it provided TTH hashes, as the Anime community on DC++ is fairly large and if they had access to a huge database that provided TTH hashes and let them find the perfect files. as a sideeffect DC++ in time might be cleaned up from all the corrupted stuff thats shared over there.
as for the sha and md5 hashes, Ive never used them..
example:
I search for 'some series - ep 04' and the results might look like this
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <no TTH hash>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <4f3fde6483f1c90266ae>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <65fds98h70j9804jlj099>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
some series - ep 04 189.432.500 <432jh7khj5fhg5432345>
(note, the 'hashes' provided here are purely fictional, hampered the keyboard a little, its only for demonstration )
the only thing I know for certain is that the filesize of the file I want is the correct size, but as for which of the four diffrent files would be the uncorrupted one, thats a pure gamble. if I knew what TTH hashes to look for it would be easy to spot the right one.
besides, if AOM had TTH hashes I could make my DC++ client search for the hash instead for the file and then get the right file right away. one other thing to think about is that even more ppl could be persuaded to start using AOM if it provided TTH hashes, as the Anime community on DC++ is fairly large and if they had access to a huge database that provided TTH hashes and let them find the perfect files. as a sideeffect DC++ in time might be cleaned up from all the corrupted stuff thats shared over there.
as for the sha and md5 hashes, Ive never used them..
I had problems with stupid people, who think that if they have corrupt file that doesn't match crc32, all they have to do is to flip few bits to make it match, corrupting file even futher - just search google for such "checksum fixers". It's (almost) impossible to do it with md5, and as pelican already said, md5 is widely used for corruption checking (especially in *nix world), so having them is really good thing(tm).DonGato wrote:Have you ever found a problem by using CRC-32?!
/sidenote: I know why I kept all my files on HDD. I knew a new hash would come along some day, and it came earlier than a year after MD5 and SHA1 were accepted as valid fields
I really think if TTH is accepted as another hash we should as well think very hard about other possible hashsums that might be necessary. From a certain point on I think I won't be willing to pull out all the old discs and hash my whole collection again to get the info right (and since I thrive to keep AniDB up to date it would really really really bug me if I knew there was a new hashsum implemented and and I would be able to update the fileentries by running AoM over them.
So please if you really consider adding this hash please add all other possible hashes that might one day be necessary.
I really think if TTH is accepted as another hash we should as well think very hard about other possible hashsums that might be necessary. From a certain point on I think I won't be willing to pull out all the old discs and hash my whole collection again to get the info right (and since I thrive to keep AniDB up to date it would really really really bug me if I knew there was a new hashsum implemented and and I would be able to update the fileentries by running AoM over them.
So please if you really consider adding this hash please add all other possible hashes that might one day be necessary.