Skip to content

Commit

Permalink
Add some basic docs
Browse files Browse the repository at this point in the history
  • Loading branch information
lixmal committed Sep 14, 2023
1 parent 8c6ce5e commit 7e86790
Showing 1 changed file with 111 additions and 1 deletion.
112 changes: 111 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,112 @@
# finddupes
finds duplicate files based on hash and deletes them based on a given pattern

Finds duplicate files based on hash and deletes them based on a given pattern.


## Background

`finddupes` tries to be efficient by

- comparing file size before running expensive hash caluculations
- using hash tables to find duplicate sizes/hashes in constant time on avg
- using the fast [xxHash](https://github.com/Cyan4973/xxHash) algorithm to calulcate hashes
- running things in parallel. However, this only really helps if directories to be searched for reside on different media
- using an optional "cache" that can be reused and extended for multiple searches/deletions

What does `finddupes` not do

- try to find very similar files (fuzzy search)

## Usage

Run finddupes with the `-help` flag to get all options:

finddupes -help

The exection can be interruped with `Ctrl-c`. This will gracefully finish all calulcation
and write operations before shutting down.

### Find duplicates in given directories

This will list all found duplicates.

finddupes <path> [path...]

Depending on the amount and size of files this can take a long time. For a large amount of files
it is recommended to index all duplicates and store them in a database file.
See next section.

### Index files

Index all files from given directories recursively and store them in a database file.

finddupes -verbose -storeonly -path <db file path> <path> [path...]

e.g.

finddupes -verbose -storeonly -path pics.db ~/Pictures ~/Videos ~/DCIM


After indexing files one or more actions can be run to delete duplicates.
A single last file will be always kept, regardless if there's a match or not.

The default is a dry run. To actually delete files, add the `-delete` flag.


Alternatively to indexing first, all actions can be run on the fly by not passing
the `-path <db file path>` parameter.

finddupes -delmatch <pattern> ~/Pictures ~/Videos

See the next sections for a list of possible actions.


### Delete duplicates based on a pattern

Delete duplicates whose path matches the given regex.

finddupes -path <db file path> -delmatch <pattern>

e.g.

finddupes -path pics.db -delmatch '\.jpe?g$'


#### Keep duplicates based on a pattern

Keep duplicates whose path matches the given regex.

finddupes -path <db file path> -keepmatch <pattern>

e.g.

finddupes -path pics.db -keepmatch '_orignal$'


### Keep most recent duplicate

Keep the most recent duplicate, delete all others. Based on modification time (mtime).

finddupes -path <db file path> -keeprecent


### Keep oldest duplicate

Keep the oldest duplicate, delete all others. Based on modification time (mtime).

finddupes -path <db file path> -keepoldest


### Keep first duplicate

Keep the first duplicate based on lexically sorted file *paths* (not file names), delete all others.

finddupes -path <db file path> -keepfirst


### Keep last duplicate

Keep the last duplicate based on lexically sorted file *paths* (not file names), delete all others.

finddupes -path <db file path> -keeplast

0 comments on commit 7e86790

Please sign in to comment.