flat assembler
Message board for the users of flat assembler.
Index
> Projects and Ideas > file compare utility, loop through drive's folders |
Author |
|
kalambong 23 Jan 2013, 05:58
Sleepsleep, when you are awake, maybe you might consider expand your filecompare utility a bit:
What is needed right now is to have a bit-by-bit compare utility comparing an ISO-file (essentially a DVD image file) to the image that has written into a DVD-R (or DVD-RW) Disk burning utilities such as Nero has this function built in. Unfortunately, there is no standalone utility that can compare an ISO file with the DVD (or CD) disk itself. |
|||
23 Jan 2013, 05:58 |
|
revolution 23 Jan 2013, 06:14
If you have a database of your drives contents then you only need to store the file's hash and associate that with the name and path. After that, finding duplicates is easy with a simple function to sort the hashes and expose the duplicates.
You could also extend this to backup drives to ensure that that backups have a copy of each file by comparing the hashes and exposing singletons that exist in only one place. I don't know how you can make the hash table be synchronised with the file system updates. This may the the really difficult part: to make sure that the current hashes are up-to-date each time a file is changed, deleted or added. Perhaps if you use an FS that supports alternate streams then the hash can be put into a new stream with a date/time field to show when the hash was last computed? But that will require you to periodically scan the drive to find outdated hashes. However this suffers from the problem that the normal file attribute of the last modified date/time is writeable by applications and might be a false value. I know that truecrypt can be set to do this and a simple date/time last modified search would not show any new updates. |
|||
23 Jan 2013, 06:14 |
|
LocoDelAssembly 23 Jan 2013, 06:35
|
|||
23 Jan 2013, 06:35 |
|
ejamesr 25 Jan 2013, 22:53
Comparing hash codes does NOT tell you that two files are identical, but it CAN, however, tell you they are not identical (when the hash values do not match). When the hash values do match, however, it is still possible the files are not the same, and so they should be compared bit by bit to determine whether they are, in fact, duplicates. Comparing the file sizes helps, too (if they are different, of course the files are different). But even when the file size and hash match, they can still be different, so a byte comparison is needed.
As revolution pointed out, you need some fool-proof way of making sure that whatever heuristics you use are based upon correct, up-to-date information. Otherwise, you're subject to accidentally destroying data. |
|||
25 Jan 2013, 22:53 |
|
revolution 29 Jan 2013, 05:51
ejamesr wrote: Comparing hash codes does NOT tell you that two files are identical ... |
|||
29 Jan 2013, 05:51 |
|
baldr 29 Jan 2013, 17:27
sleepsleep,
Beware of hard/symlinks and mount points (in general, reparse points for NTFS). |
|||
29 Jan 2013, 17:27 |
|
pelaillo 29 Jan 2013, 19:01
Use git for that. Fast and reliable.
|
|||
29 Jan 2013, 19:01 |
|
hopcode 29 Jan 2013, 23:48
ejamesr wrote: ...some fool-proof way of making sure that whatever heuristics you use... cluster,sectors etc, a general glossary http://www.cnwrecovery.com/html/ntfs_forensic.html Cheers, _________________ ⠓⠕⠏⠉⠕⠙⠑ |
|||
29 Jan 2013, 23:48 |
|
ejamesr 30 Jan 2013, 01:44
revolution wrote: Indeed this is theoretically true. But for practical purposes this won't be an issue when using "good" hashes like SHA256, Whirlpool, etc. Hashes have traditionally been used to determine whether a file has been changed at all, and the more hash output bits, the greater the confidence (assuming you have a good way of determining the hash algorithm is good). But it is still possible for two totally different files to have the same hash... You are probably right, I don't know the real probabilities here, but I'm not so sure if this "won't be an issue" or if it "shouldn't be an issue". To me, it still seems safer to perform a bit comparison before deleting a file that a hash comparison says is an exact duplicate. Or at least, in a commercial product, let the end user determine which method to use to determine duplicates, therefore shifting the burden onto the user. |
|||
30 Jan 2013, 01:44 |
|
sleepsleep 21 Jun 2018, 19:59
i still think i need such tool, after 5 years,
|
|||
21 Jun 2018, 19:59 |
|
< Last Thread | Next Thread > |
Forum Rules:
|
Copyright © 1999-2024, Tomasz Grysztar. Also on GitHub, YouTube.
Website powered by rwasa.