-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attached keys deleted before lease expired #14758
Comments
Thanks for providing steps to reproduce, I will take a look today as it might be relevant to the upcoming v3.5 patch release. cc @ahrtr |
Quick command to find out which leases contain no key keys="$(etcdctl --user root:password lease list | grep -v found)"
for key in ${keys}; do etcdctl --user root:password lease timetolive $key --keys | grep -E "attached\ keys\(\[\]\)"; done |
@serathius Are you still working on this issue? |
@serathius @ahrtr - Do you folks think this could be worked on by a person who has very little context? |
@ajityagaty Don't think there is anything to do here. I plan to close it as works like intended after getting confirmation. Please check out good first issue or help wanted labels. |
Just to be clearer, there are two reasons for why the keys disappear in the
Please let's know you belong to which case. If neither, then it's an issue. In that case, please provide detailed steps to reproduce this issue. thx. |
@serathius @ahrtr It's none of above cases, I use
But can you please explain me why it disappears randomly for some keys only? |
Please note that I put those keys once and those keys will not be updated, so it's not 2nd case. I can't provide reliable way to reproduce that. Since it happens so random, I will need more time to find the way to reproduce reliably. |
I did not reproduce the issue, please provide a complete program and detailed steps to reproduce this issue. Is it easy for you to reproduce this issue? |
Based on currently presented description it looks like a bug, however we need more details to confirm it. My current understanding:
Normally I would expect that both key and lease are deleted. If there is no other changes to key and lease, the only way key could be deleted was by lease running out of ttl. This can easily happen if there is a network issue and keep alives don't get delivered. However in that situation, lease should also be deleted as later KeepAlives should fail. It's either a bug on etcd server/client side or client has send a new LeaseGrant request after lease was deleted. We need to confirm that this issue happen without additional LeaseGrant request. Could you provide:
|
@ahrtr It's very hard to reproduce the issue @serathius Yes, you are correct. Etcd version is in first comment
Above is all etcd-related code, it's so simple, think of it as implementing a kind of service registry, then each client creates lease once, keep it alive and just put some keys on start and never touch them again, nothing special. I can log key deletion because I use watch api using key prefix. I see nothing abnormal from etcd logs, no crash, no disruption, clients (pods) are invoked and terminated frequently, but I see no concurrent issue here because clients work with different keys. I cannot reproduce this for ~5 days, can we keep this issue open then when it occurs, I'll collect and post full logs of both etcd and client when I received keys deleted events. |
Asked about go module version in your binary as example you sent didn't use etcdctl. |
I expect this might be a race condition related to leases not being linearizable. #13915. Would be good to verify correctness of the solution. Based on problems with reproducing I expect this is a rare race based on time. I think it could be reproduced with adding additional sleep failpoints in lease logic. Would appriciate any help with repro as I don't think I will have time to test it. |
Sorry, this is client & api version:
go module version:
Don't mind, I'll monitor it |
After long time monitoring this issue, I cannot reproduce it. |
Thanks for confirming! |
What happened?
I use Watch API and observe that sometimes all keys attached to one lease are deleted without any warning/error log.
When I check with
lease timetoleave
, I expect keys remain, as below:shoud output:
But I actual:
Note that it happens randomly with some leases
What did you expect to happen?
Attached keys remain when lease still alive
How can we reproduce it (as minimally and precisely as possible)?
Anything else we need to know?
I run etcd in k8s
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
ETCD_ON_K8S=yes
ETCDCTL_API=3
ETCD_LOG_LEVEL=info
ETCD_AUTH_TOKEN=jwt,priv-key=/opt/bitnami/etcd/certs/token/jwt-token.pem,sign-method=RS256,ttl=10m
ETCD_DISASTER_RECOVERY=no
ETCD_START_FROM_SNAPSHOT=no
ETCD_DATA_DIR=/bitnami/etcd/data
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
The text was updated successfully, but these errors were encountered: