(actually only two types of file, lol)
A command line tool, written in Rust, to redact texts in file(s) that match a certain regex.
It is inspired by the following:
- Given a file in
.txt
and.docx
and a list of regexes that one wishes to redact,raf
redacts it with random alphanumeric characters in place of all texts that match any of the regexes. - It also provides a mapping in the form of a
.json
file between the original text and the redacted text.
Disclaimer:
Sadly, for now, you can only redact .txt
and .docx
files.
You need to have Rust installed first.
- Git clone this repo
$ git clone https://github.com/jymchng/raf.git
cargo run
it
Example, to redact all phone
and emails
texts for all files in a folder, use:
$ cargo run -- folder ./tests/test_files -t phone emails
Note the folder
subcommand.
Or you can use:
$ cargo run -- folder ./tests/test_files -t phone email
This is because email
and emails
are categorized under the list of types
in the patterns.json
file, i.e.:
{
"pattern": "([a-z0-9_+]([a-z0-9_+.\\-]*[a-z0-9_+\\-])?)@([a-z0-9]+([\\-\\.]{1}[a-z0-9]+)*\\.[a-z]{2,6})",
"type": [
"email",
"emails"
]
}
To redact files in sub-directories of your folder
, recursively, use:
$ cargo run -- folder ./tests/test_files -t phone -r
Note the -r
parameter.
A .json
file is also provided as a mapping between the redacted text and the original text.
Example:
[
{ "unredacted_text": "jane-doe@my-university.edu",
"redacted_text": "QLfyzaoQxQGHDWmaECMl1Gxdf8“
},
...
{ "unredacted_text": "john.doe@example.com",
"redacted_text": "nM0F4AehzbLDw4RYhGn4"
},
…
]
You can compile and build the binary using:
$ cargo install --git https://github.com/jymchng/raf.git
Then use the commands as discussed in the previous section. Example:
$ raf file ./tests/test_files/file1.txt -t phone email
Use the --help
command to read the help
section of raf
.
Example:
$ raf folder --help