Use NTFS ADS (for 0.6)
Posted: Fri Nov 12, 2004 11:18 pm
ADS are Additional Data Streams. They are a feauture of the NTFS file system on Windows XP (not sure about 2000) and later. These streams are stored alongside of the file and are persistent during the file's life, even if copied or moved to another NTFS volume. Regular file access, e.g. sharing it with a P2P program or watching it, do not interfere with ADS at all.
Now for the actual suggestion...
If a file known to AOM is renamed or moved to a different location (e.g. whenever I move a series to a different folder as soon as it has been completed), AOM needs to re-hash the file again and it will think that it's now a duplicate. The hashing is a fairly expensive operation, however. Hence my suggestion: After hashing a file, store the ed2k-hash and filesize in an ADS for the recently hashed file. After discovering a new file, check if an ADS with an ed2k-hash and filesize exists for the file and, if present, use that hash to identify the file (if the filesize matches). Possibly, also store the file's location where it was first hashed, and if present, check the known-files list for that file, check it's existence, and if not present, assume that the file has been moved and update the known-files list accordingly.
Now for the actual suggestion...
If a file known to AOM is renamed or moved to a different location (e.g. whenever I move a series to a different folder as soon as it has been completed), AOM needs to re-hash the file again and it will think that it's now a duplicate. The hashing is a fairly expensive operation, however. Hence my suggestion: After hashing a file, store the ed2k-hash and filesize in an ADS for the recently hashed file. After discovering a new file, check if an ADS with an ed2k-hash and filesize exists for the file and, if present, use that hash to identify the file (if the filesize matches). Possibly, also store the file's location where it was first hashed, and if present, check the known-files list for that file, check it's existence, and if not present, assume that the file has been moved and update the known-files list accordingly.