domain-sift
is a Perl script that extracts unique domains from
at least one provided file and prints them to standard output in a
given format. If no file is provided, domain-sift
reads from
standard input instead.
One use of this utility is to extract domains from blocklists that contain known malicious or otherwise undesirable domains, and then format them in such a way that those domains can be blocked by a DNS resolver.
- Project structure
- Installation
- Documentation
- domain-sift and unwind
- domain-sift and unbound
- domain-sift and unbound (RPZ)
- Regarding blocklist sources
- Caveats
- License
|-- Changes
|-- LICENSE
|-- MANIFEST
|-- Makefile.PL
|-- README.md
|-- bin
| `-- domain-sift
|-- lib
| `-- Domain
| |-- Sift
| | |-- Manipulate.pm
| | `-- Match.pm
| `-- Sift.pm
`-- t
|-- 00-load.t
|-- Domain-Sift-Manipulate.t
|-- Domain-Sift-Match.t
|-- manifest.t
|-- pod-coverage.t
`-- pod.t
To install domain-sift
, download the most recent
release and run
the following commands inside the source directory. Note that
domain-sift
requires Perl 5.36 or later, since subroutine signatures
are no longer experimental in that release.
$ perl Makefile.PL
$ make
$ make test
# make install
After installation, you can read the documentation with perldoc
.
man
often works as well.
$ perldoc Domain::Sift
$ perldoc Domain::Sift::Match
$ perldoc Domain::Sift::Manipulate
$ perldoc domain-sift
Here's how to use domain-sift
with
unwind(8)
on OpenBSD.
- Extract domains from your blocklist source:
$ domain-sift /path/to/blocklist_source > blocklist
- Move your blocklist to
/etc/blocklist
:
# mv blocklist /etc/blocklist
- Then, modify your
unwind.conf
to include your new blocklist:
block list "/etc/blocklist"
- Restart
unwind
:
# rcctl restart unwind
Here's how to use domain-sift
with
unbound(8)
on OpenBSD.
- Extract domains from your blocklist source:
$ domain-sift -f unbound /path/to/blocklist_source > blocklist
- Move the blocklist to
/var/unbound/etc
.
# mv blocklist /var/unbound/etc/blocklist
- Then, modify your
unbound.conf
to include your new blocklist:
include: "/var/unbound/etc/blocklist"
- Restart Unbound.
# rcctl restart unbound
domain-sift
also supports the Response Policy Zone (RPZ) format.
RPZ is defined in this Internet
Draft.
By using RPZ, you can define DNS blocking policies in a standardized
way. A nice perk of using RPZ is the ability to block wildcarded
domains (*.example.com
will also block subdomain.example.com
,
subdomain.subdomain.example.com
, and so on).
Here's how to use domain-sift
with Unbound and RPZ on OpenBSD.
- Extract domains from your blocklist source:
$ domain-sift -f rpz /path/to/blocklist_source > blocklist
- Then, modify your
unbound.conf
:
rpz:
name: rpz.home.arpa
zonefile: /var/unbound/etc/rpz-block.zone
#rpz-log: yes
rpz-signal-nxdomain-ra: yes
NOTE: rpz.home.arpa
is just an example. The name entry may be
different in your case. In a local access network (LAN) where Unbound
runs on the gateway/router, ensure that a local-data
entry is
present somewhere so that the name you chose resolves. Something
like this should work:
local-data: "rpz.home.arpa. IN A x.x.x.x"
You'll need to replace x.x.x.x
with the machine's actual IP
address.
- Create
/var/unbound/etc/rpz-block.zone
:
$ORIGIN rpz.home.arpa.
$INCLUDE /var/unbound/etc/blocklist
- Make sure that you move
blocklist
to the correct location:
# mv /path/to/blocklist /var/unbound/etc/blocklist
- Restart Unbound:
# rcctl restart unbound
To keep things simple, domain-sift
only deals with extracting
domains from text files and formatting them. It doesn't fetch
blocklists or provide them.
This is an explicit part of its design for a few reasons.
-
It follows the Unix philosophy: do one thing well; read from a file or STDIN; print to STDOUT.
-
It allows
domain-sift
to use a minimum set ofpledge(2)
promises throughOpenBSD::Pledge(3p)
. -
The simple design makes it much more flexible and portable.
Here is more or less what I use to fetch blocklists:
$ grep -Ev '^#' blocklist_urls | xargs -- ftp -o - | domain-sift > blocklist
You can find blocklist sources in many places, such as firebog.net.
If you've pulled in a lot of domains, Unbound may fail to start on OpenBSD because it doesn't have enough time to process all of them. You can fix this by increasing Unbound's timeout value.
$ rcctl get unbound timeout
30
# rcctl set unbound timeout 120
$ rcctl get unbound timeout
120
This software is Copyright © 2023 by Ashlen.
This is free software, licensed under the ISC License. For more
details, see the LICENSE
file in the project root.