-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup/restore fails with a lot of databases #9968
Comments
I can confirm this behavior.. Experiencing exactly the same with v1.5.2 and v1.5.4 |
I can confirm too in v1.6.4. Is there workaround? |
Any word on a fix for this? issue still present in v1.7.2 Edit: Just to say, original database was v1.3.6, recently upgraded to v1.7.2, indexes rebuilt to TSI1 immediately afterwards and operating fine since then. Backups withy -portable appear okay, but first time attempted to restore in a separate instance of InfluxDB, with 7 other databases already present, but not the one attempting to restore, failed immediately with the "illegal tag 0 (wire type 0)" error. |
Issue is still present in v1.7.4 Also restore of legacy backup with "-online" fails with same error. Legacy backup and legacy restore (using -metadir and -datadir) is working but it seems that some data is not restored correctly. |
I found that, restore on real machine (laptop) works, but not on any server with virtual disk. Log from client:
Log from server:
|
Same issue as @hongquan on Google Cloud's Kubernetes Engine on 1.7.7. Why is restoring on Docker/Kubernetes so hard? Even this new method requires tons of setting up for something that should be easy. |
I'm trying to restore just one database, total size of the backup is about 3.2G (Portable version) or 4.9G (legacy version). The intended restore server is empty, brand new install. I've tried both the -host transport option and plain old rsync of the backup directory, both result in the same error as noted above. |
Restoring remotely on influxdb 1.7.8, running in docker influxd cli error message
influxdb log error message
UPDATE_1: according to this comment, it might be related to docker networking and restore works when executed inside the container, not remotely as i do UPDATE_2: (docker/network_mode: host) running the restore from within the influxdb docker container AND from the docker host (not inside the container) worked, while it fails when running from a remote host UPDATE_3: (docker/network_mode: bridge) running the restore from within the influxdb docker container failed from the docker host worked, while it worked when running from within inside the container "SOLUTION": i tested a number of different combinations of remote hosts and influxd versions and restore always failed when running remote, so I ended scripting with ansible a restore script that runs local on the container host. If using docker network_mode bridge i need to run the restore command inside the container. cc: @dgnorton, have you seen anything like this before? |
Hello, I run a standalone influxdb 1.7.8. (i have different instances, some of them have only 1 DB, others could have a hundred DBs, but all in 1.7.8) In the influxdb log: I don't even know what to do to managed incremental backup / restore now.... :( |
We meet with the same issue when I restore influxdb.
Nov 01 16:01:36 xxxxxx influxd[28340]: ts=2019-11-01T05:01:36.659436Z lvl=info msg="failed to decode meta: proto: meta.ShardGroupInfo: illegal tag 0 (wire type 0)" log_id=0IqLGmyW000 service=snapshot Update: I've tried the legacy offline backup and restore as well. The restoration is working, but unfortunately, the restore or the backup is not complete, there are some data points missing. (we visualized and compared the original and restored influxdb data) Update2: We tried to use better performance servers to restore and the restore was working then and won't see the error again. |
This seems pretty critical. Is this prioritized? |
We had this issue and we found that using a faster server (more cores, ram and iops) made the restore work |
same problem . backup 1.7.8 and restore 1.7.9 . ERROR: failed to decode meta: proto: meta.ShardGroupInfo: illegal tag 0 (wire type 0) . Restore to 1.7.8 same error too. |
@superbool please read my comment: #9968 (comment) |
I have solved the problem by update system. But I did't not know which software effect. sudo yum update
sudo reboot |
Supercool Superbool :) We were previously seeing failures on 1.7.8 running over Amazon Linux 2. Just applied updates and bumped to 1.7.9, and hey presto, portable backup from one of our AWS servers has just imported into another server running the same software levels. Happy days! |
This is still happening on 1.7.9 on Kubernetes. Even on a ridiculous 96 vCPU 256GB RAM VM trying to restore a measly 43GB dump.
|
my influxdb version is: 1.7.6
}` I have modified and compiled this part of code in my environment. After modification, there will be no such problem as "proto: meta.data: illegal tag 0 (wire type 0)". The restore command is executed successfully. |
And my solution is:
} |
…file is too large, the influxdb server cannot fully receive the meta file sent by the client. issue link: influxdata#9968 (comment)
I submitted the patch above. Anyone interested, please review so we can get this merged as quickly as possible. |
Did you try to adjust the tcp window size, like in the example below to 8Mbytes?
It worked for me. |
Same problem on versions 1.8.0, 1.8.2, haven't tried 1.8.1. |
The fix from PR #17495 has been merged to master-1.x on 31 Mar. So the fix was not released neither with 1.8.0 (Jun), nor 1.8.1 (Jul), nor 1.8.2 (Aug) 😐 |
Is there any indication when this will be fixed? |
Since this is still not released I made my own build + docker image I use for restores only. Do not use this for anything else. Works on Kubernetes also. Here's how I did it: Building influx (requires go installed on your computer):
Building docker image:
Modify Dockerfile to this:
|
I just had to move a database and had the same issues as described here. The easiest solution for me was to use the legacy backup approach, without the |
@alexferl Tks, it work with the 1.x branch. you need to build the binary with this command to be static :
|
Again the fix is not included in 1.8.3 (Sep). While influxdata doesn't care, here is all-in-one Dockerfile to build a new release from master-1.x:
docker build -t yourimage:1.8.x . |
@roman-vynar I've tried it and this does not work for me. |
@teu what exactly doesn't work, docker build or backup tool? |
@roman-vynar I've modified the dockerfile a bit, perhaps I've made a mistake:
I am doing a backup on v1.7.9 on our production env, then trying to restore it with this image, via remote restore over port 8088. Still getting the issue with metadata. |
@roman-vynar DIsregard last message. The dockerfile is mixing alpine and ubuntu images (which wont work obviously now). I've build the influxd and copied it onto an image. I've tried every combination, still same issue:
Note that my backups are being made on 1.7.11 from a tag, not master. |
Getting this crap actively on a Circle CI machine executor. Does not happen when a job is re-run with SSH, happens with a probability of 80% of a regular commit. Also started happening after migration to Ubuntu 20.04 image from 16.04. Wonder if that could be a culprit... |
We just ran into this issues. It would be nice if the fix #17495 gets backported to 1.8.x |
We had the same problem. Luckily the original server was still available, with line protocol exports this worked. However it took over a week to import everything, and would have been impossible if the original server crashed or deleted. For us not being able to restore databases reliably was the final straw to decide to move away from influx. Lots of valuable business data could have been lost because of this |
Same here. We are moving away from influx which is much behind any other competitor. |
@roman-vynar @gusutabopb
Am I missing something? UPDATE Thank you very much for providing this solution that allowed me to restore a bunch of databases that otherwise would have been lost! |
the problem still exists on 1.8.5 (?) |
Yes, tried yesterday. |
same here, just asking to be sure |
It seems #17495 will be merged in next release v1.9.0. https://github.com/influxdata/influxdb/blob/b26a2f7a0e41349938cec592a2abac4d93c9ab1c/CHANGELOG.md |
Yes, it helps.
|
This should be fixed by a combination of #21991 (in 1.8.9) and #17495 (in 1.8.10). I was able to duplicate with #9968 (comment) on some tries with v1.8.0 I was not able to duplicate on latest 1.8 including #22427 (coming in 1.8.10). Ran a script to run the repro 20x. Will close this when the 1.8 backport for #22427 closes. |
#22427 is merged - closing. |
Bug report
System info: InfluxDB v1.5.3, installed from
brew
on Mac OS X 10.12.6Steps to reproduce:
where
dummy_data.pl
istest
Expected behavior: The database
test
is restored astest_bak
Actual behavior: Restoring the database fails (most of the time...) with the message
error updating meta: DB metadata not changed. database may already exist
, even iftest_bak
does not exist.I wasn't able to understand to resulting log line, where
RetentionPolicyInfo
isn't always the same:Additional info: This behaviour seems to depend on the amount of metadata. If I add only 100 dummy databases instead of 500 (
curl -X POST http://localhost:8086/query --data-urlencode "q=$(perl dummy_data.pl 1 100)"
), everything works well.Me trying to restore a few times, where the 6th attempt worked:
The corresponding logs:
The text was updated successfully, but these errors were encountered: