diff --git a/README.md b/README.md index 90958e1..fac39ea 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,112 @@ # finddupes -finds duplicate files based on hash and deletes them based on a given pattern + +Finds duplicate files based on hash and deletes them based on a given pattern. + + +## Background + +`finddupes` tries to be efficient by + +- comparing file size before running expensive hash caluculations +- using hash tables to find duplicate sizes/hashes in constant time on avg +- using the fast [xxHash](https://github.com/Cyan4973/xxHash) algorithm to calulcate hashes +- running things in parallel. However, this only really helps if directories to be searched for reside on different media +- using an optional "cache" that can be reused and extended for multiple searches/deletions + +What does `finddupes` not do + +- try to find very similar files (fuzzy search) + +## Usage + +Run finddupes with the `-help` flag to get all options: + + finddupes -help + +The exection can be interruped with `Ctrl-c`. This will gracefully finish all calulcation +and write operations before shutting down. + +### Find duplicates in given directories + +This will list all found duplicates. + + finddupes [path...] + +Depending on the amount and size of files this can take a long time. For a large amount of files +it is recommended to index all duplicates and store them in a database file. +See next section. + +### Index files + +Index all files from given directories recursively and store them in a database file. + + finddupes -verbose -storeonly -path [path...] + +e.g. + + finddupes -verbose -storeonly -path pics.db ~/Pictures ~/Videos ~/DCIM + + +After indexing files one or more actions can be run to delete duplicates. +A single last file will be always kept, regardless if there's a match or not. + +The default is a dry run. To actually delete files, add the `-delete` flag. + + +Alternatively to indexing first, all actions can be run on the fly by not passing +the `-path ` parameter. + + finddupes -delmatch ~/Pictures ~/Videos + +See the next sections for a list of possible actions. + + +### Delete duplicates based on a pattern + +Delete duplicates whose path matches the given regex. + + finddupes -path -delmatch + +e.g. + + finddupes -path pics.db -delmatch '\.jpe?g$' + + +#### Keep duplicates based on a pattern + +Keep duplicates whose path matches the given regex. + + finddupes -path -keepmatch + +e.g. + + finddupes -path pics.db -keepmatch '_orignal$' + + +### Keep most recent duplicate + +Keep the most recent duplicate, delete all others. Based on modification time (mtime). + + finddupes -path -keeprecent + + +### Keep oldest duplicate + +Keep the oldest duplicate, delete all others. Based on modification time (mtime). + + finddupes -path -keepoldest + + +### Keep first duplicate + +Keep the first duplicate based on lexically sorted file *paths* (not file names), delete all others. + + finddupes -path -keepfirst + + +### Keep last duplicate + +Keep the last duplicate based on lexically sorted file *paths* (not file names), delete all others. + + finddupes -path -keeplast +