Enhanced version of finddupe, a duplicate file detector and eliminator for Windows, originally by Matthias Wandel.
I really like finddupe when I look for duplicate files. It is fast and clever. The match candidates are clustered according to the signature of the first 32k, then checked byte for byte. It can also create and find NTFS hard links. Creating hard links saves you disk space. Listing all existing hard links is very difficult otherwise.
Please refer to Matthias' site for full description. My favourites are
finddupe -bat d:\ImageLibray\Hardlinks_to_be_created.bat -ref d:\ImageLibray\originals1\** -ref d:\ImageLibray\originals2\** d:\ImageLibray\**\*.jpg
to remove duplicates in an image collection and finddupe -listlink d:\ImageLibray
to list them.
However, Matthias' current version 1.23 is not supporting my requirements. And it is ASCII-only and fails on non-ASCII filenames, as is often the case nowadays.
I added the following features to finddupe:
- multiple reference directories that shall not be touched (v1.24)
- unicode support (v1.25)
- alert message if order of options is wrong (v1.26)
- support for ignoring files by patterns (v1.26)
- checking for NTFS file system in batch and hardlink mode (v1.27)
- performance optimizations (especially for very large amounts of files) (v1.28)
- new option to skip linked duplicates in output list (v1.30)
- 64-bit version for addressing more memory (for large amounts of files) (v1.33)
It works for me, but some more testing is desirable.
I've udated the project to use Visual Studio 2019.
finddupe v1.32 compiled Jan 27 2024
an enhanced version by thomas694 (@GH), originally by Matthias Wandel
This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you
are welcome to redistribute it under certain conditions; view GNU GPLv3 for more.
Usage: finddupe [options] [-ign <substr> ...] [-ref <filepat> ...] <filepat>...
Options:
-bat <file.bat> Create batch file with commands to do the hard
linking. run batch file afterwards to do it
-hardlink Create hardlinks. Works on NTFS file systems only.
Use with caution!
-del Delete duplicate files
-v Verbose
-sigs Show signatures calculated based on first 32k for each file
-rdonly Apply to readonly files also (as opposed to skipping them)
-z Do not skip zero length files (zero length files are ignored
by default)
-u Do not print a warning for files that cannot be read
-sl Skip linked duplicates and show only unlinked ones
-p Hide progress indicator (useful when redirecting to a file)
-j Follow NTFS junctions and reparse points (off by default)
-listlink hardlink list mode. Not valid with -del, -bat, -hardlink,
or -rdonly, options
-ign <substr> Ignore file pattern, eg. .bak or .tmp (repeatable)
-ref <filepat> Following file pattern are files that are for reference, NOT to
be eliminated, only used to check duplicates against (repeatable)
filepat Pattern for files. Examples:
c:\** Match everything on drive C
c:\**\*.jpg Match only .jpg files on drive C
**\foo\** Match any path with component foo
from current directory down
Latest release can be found here.
- originator: Matthias Wandel
- additional features: thomas694
finddupe by thomas694
is licensed under GNU GPLv3.
Based on a work at https://www.sentex.ca/~mwandel/finddupe/.