What it does

A python project for the modio housekeeper.

What it does

It runs Zabbix housekeeping tasks in a Postgres database.

housekeeper

Creates new (Future) partitions for the history tables
Cleans out large indexes (btree) from older partitions
Creates smaller (brin) indexes on older partitions
Cleans out older data (removed, not used) from all partitions

Retention

Takes a configuration variable for how long (in days) to keep data, via the environment. It will remove partitions older than that.

Deletion

Maintenance executes in practice at two times (but is scheduled for every day):

the 14th of a month
When archiving data

expiration is handled at both these times, with different retention times: Items with a retention time (item.history < retention) less than the current retention are selected for expiration, and data points that are older than retention days old at this time are deleted.

This makes it that we right now only care for three kinds of values of item.history:

Below 14 days (Cleaned the 14th every month)
Below MODIO_ARCHIVE days (cleaned when data is migrated)
Above MODIO_ARCHIVE days (never expired)

Archiving is defined as migrating the months that are older than the MODIO_ARCHIVE retention period. This makes every value in the migrated data set older than MODIO_ARCHIVE days.

At this point, since we're already touching all parts of the table, we'll run an expiry of old data (items with a history defined as smaller than retention period), and will delete all posts for those items that are older than expected, but only on the tables we will migrate.

The 14th of the month, we run maintenance on last months data. To do that, we reindex the table, expire data, clean out duplicates, and cluster the table in (itemid,clock) order for efficient queries.

If you do not run the archiver, then expiration only happens for data sets < 14 days old.

Archiver

The archiver tool moves data into an archive database.

This requires a LOT of setup between the two, parts of which is documented in the one time setup job(s).

To get the one-time setup output for the archive db, run migration setup_archive. This creates a default user with a default password. Replace this password consistently.

Then, on the "primary" database ( the one hosting your live data), run migration setup_migration to set up the same (matching) users & databases

After this, you can run the migration oneshot_archive to generate tables for the year 2014 and until today's date. 2014 was picked because we don't have older data than that.

Then, you can do a one-time archive ( move all existing data ) by running migration slow_migrate. This is really REALLY slow, you're usually better off importing a backup ( psql copy to | psql copy from ).

The actual offloading assumes you're already using a partitioned database with our namingscheme for partitions. Make sure it's run with the correct user, or your permissions will be off.

DB setup notes

archive DB needs pg_hba setup with users, database an others from:

main db
machine where maintenance job goes

Since the maintenance job (archiver) connects to both main and alternate db to transfer data

Setup use performs:

create db user on archive db server
create db on archive db server
create foreign data wrapper on primary server
create user mappings on primary server

Oneshot archive

Outputs SQL to create archive tables on archive db

cron use:

Set up tool to run weekly or similar, and set up a time limit for your retention. The cron job will iterate over all partitions older than the archive retention, and migrate them to the archive DB. This will be done first with a COPY job, and then a second pass select/insert, and clean out of the table.

Open questions:

This doesn't quite work together with the retention tool. Should the two be merged together? (either you have archival db, or you have retention. Don't do both, as the retention tool doesn't know about archive tables )

Name		Name	Last commit message	Last commit date
Latest commit History 193 Commits
housekeeper		housekeeper
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.md		README.md
build.mk		build.mk
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What it does

housekeeper

Retention

Deletion

Archiver

DB setup notes

Setup use performs:

Oneshot archive

cron use:

About

Releases

Packages

Contributors 3

Languages

License

Spindel/housekeeper

Folders and files

Latest commit

History

Repository files navigation

What it does

housekeeper

Retention

Deletion

Archiver

DB setup notes

Setup use performs:

Oneshot archive

cron use:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages