You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a couple of weird issues restoring a backup made on our production
instance (v1.8.6) to one of our staging environments (v1.8.9). This is an
automated process in our case and worked until 2-3 weeks before.
Since I was hit by #21991 I updated from 1.8.6 to 1.8.9, otherwise I was not
able to even start the backup. Long term we want to upgrade to 2.0, but until
that happens we're stuck with 1.8.x for some time. The whole idea was to test
the switch to tsi1 on staging before doing it on our prod instance, therefore
the backup/restore cycle investigation listed below. But the "real" issue is that
we want to save some memory on our prod instance.
I'm happy to deliver more info if needed. Any idea how to progress here?
Steps to reproduce:
Take backup of prod instance using influxd backup -portable /some/dir
Transfer to staging instance using rsync.
Try to restore using influxd restore -portable /var/lib/influxdb/backup using various config options.
(NOTE: actual commands are slightly longer, due to dockerized env, but effectively the same)
1.8.9 seemingly first works, but eats huge amounts of data (I had to increase
instance size to 32G + swap to progress further). After importing roughly 20G
of data, it crashed with an out-of-memory error (see log). Despite the
error it still had enough memory available.
After setting the indexing back to "inmem" the excessive memory consumption was gone,
but the restore still crashed halfway through (see other log).
After reading this up, I followed a few suggestions which were:
Add vm.max_map_count=2048000 to /etc/sysctl.conf and activate it.
Set "max-concurrent-compactions" to 0.
With this setup the restore worked (in the sense that the restore command
returned successfully), but still produced a OOM error shortly after. After
a restart of the influxd process the data was (mostly?) there though. I'm
not 100% certain the two previous command did an effect, maybe just "luck".
I forgot to save that log, but it looked pretty much like the previous ones,
except different timestamps.
When trying to restart now with tsi1 enabled, the insane memory consumption happens
again. This seems to be a more general issue in our case.
In all cases starting from 2 to 4 I also see plenty of those logs:
lvl=warn msg="Error while freeing cold shard resources" service=storeerror="engine is closed" db_shard_id=23510
Environment info:
Linux 5.4.0-1029-aws x86_64
InfluxDB v1.8.9 (git: 1.8 d9b56321d579)
I use the *-alpine variant of the docker images.
The size of the backup is roughly 31G.
The cardinality of our series is 4206 (as shown by SHOW SERIES CARDINALITY),
which does not seem that high...
Config:
Config is pretty much default, except the modifications described above.
The text was updated successfully, but these errors were encountered:
sahib
changed the title
[1.8.9: troubles with restore + tsi1 + insane memory usage]
[1.8.9]: troubles with restore + tsi1 + insane memory usage
Oct 11, 2021
i have same error like "2022/03/30 21:41:37 Error writing: [DebugInfo: worker #0, dest url: http://test217:8086] Invalid write response (status 500): {"error":"engine is closed"}"
After a disaster recovery, we cannot import the data from one node to another node.
Influx takes up all SWAP/RAM and after a while the error described above occurs.
Hello,
I have a couple of weird issues restoring a backup made on our production
instance (v1.8.6) to one of our staging environments (v1.8.9). This is an
automated process in our case and worked until 2-3 weeks before.
Since I was hit by #21991 I updated from 1.8.6 to 1.8.9, otherwise I was not
able to even start the backup. Long term we want to upgrade to 2.0, but until
that happens we're stuck with 1.8.x for some time. The whole idea was to test
the switch to tsi1 on staging before doing it on our prod instance, therefore
the backup/restore cycle investigation listed below. But the "real" issue is that
we want to save some memory on our prod instance.
I'm happy to deliver more info if needed. Any idea how to progress here?
Steps to reproduce:
influxd backup -portable /some/dir
influxd restore -portable /var/lib/influxdb/backup
using various config options.(NOTE: actual commands are slightly longer, due to dockerized env, but effectively the same)
Expected behavior:
Restore would work with 1.8.6 or at least 1.8.9.
Actual behavior:
It does not.
1.8.6 restore fails immediately with an error message similar to the ones in this ticket Backup/restore fails with a lot of databases #9968.
1.8.9 seemingly first works, but eats huge amounts of data (I had to increase
instance size to 32G + swap to progress further). After importing roughly 20G
of data, it crashed with an out-of-memory error (see log). Despite the
error it still had enough memory available.
After setting the indexing back to "inmem" the excessive memory consumption was gone,
but the restore still crashed halfway through (see other log).
After reading this up, I followed a few suggestions which were:
With this setup the restore worked (in the sense that the restore command
returned successfully), but still produced a OOM error shortly after. After
a restart of the influxd process the data was (mostly?) there though. I'm
not 100% certain the two previous command did an effect, maybe just "luck".
I forgot to save that log, but it looked pretty much like the previous ones,
except different timestamps.
When trying to restart now with tsi1 enabled, the insane memory consumption happens
again. This seems to be a more general issue in our case.
In all cases starting from 2 to 4 I also see plenty of those logs:
Environment info:
Linux 5.4.0-1029-aws x86_64
InfluxDB v1.8.9 (git: 1.8 d9b56321d579)
*-alpine
variant of the docker images.SHOW SERIES CARDINALITY
),which does not seem that high...
Config:
Config is pretty much default, except the modifications described above.
The text was updated successfully, but these errors were encountered: