-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's possible to get "wal: max entry size limit exceeded" with recommended values #14025
Comments
There are two reasons:
In summary, the flag Does this cause any issue to you? Do you see any real issue in production environment, or you just intentionally tested it? |
@ahrtr I think there are several concerning issues revealed by this bug report:
I'm pretty sure that once you start enforcing this limit on the write path (1), a lot of production installments will start seeing transaction failures which are now hidden. (2) and (3) are the solutions for the problems exposed by (1). |
@ahrtr, thanks for the quick response. I think I should arrange accents better to clarify my concern. It's allowed to set any --max-request-bytes, but it could lead to lack of possibility of node restarting.
It's happening because hardcoded limit in WAL implementation. Maybe it's possible to make this hardcoded value configurable? |
Thanks for all the feedback, which basically makes sense to me. I agree we should never bring down the etcd server. Proposed actions:
cc @ptabor @serathius @spzala @xiang90 @gyuho for opinions. |
@ahrtr Do you have any updates? We can work on PR from our side, but we need to know preferable solution. |
Let me try to do a bold move and get the ball rolling on proposal 2. |
etcd is designed to serve small-sized metadata. And all the write happens sequentially in etcd. What is your use case for storing large values into etcd? Would something like S3 be a better fit for you? |
We are using etcd for that purpose, and S3 would not be a good fit for us. The limit we are encountering is not on the key size, but on the transaction size, as the 10Mb limit applies to the transaction size. If you consider the (configurable) default limit of a transaction size of 128, you can easily reach the 10Mb limit with a transaction involving 128 keys of 70kb, which IMHO is not too far off from the typical etcd use case. Notice that both transaction sizes and max request sizes are configurable, so I find it weird to allow that configurability, but then hit a hardcoded limit in the wal record read path. |
Comments:
|
Since the WAL file limitation is 64MB, so the simplest solution could be just to use the SegmentSizeBytes as the each WAL entry's limitation directly? |
I like the idea around capping the limits around SegmentSizeBytes.
|
To my understanding, the SegmentSizeByte is only a pre-allocated size of the wal, but it doesn't mean that the actual wal size is <= SegmentSizeByte. I'm not knowledgeable with the codebase, but to me, the wal is written to support any size. I feel that the limits should be enforced in the upper layer (etcd server) more than in the wal package itself.
Maybe I'm missing the point, but I don't understand why we only talk about the decoding. If the wal has to enforce some limit (segment size or entry size), then it has to do it on the write path as well, or we risk running again into a situation where the server cannot start because it cannot read what it has written in the wal. From what I have understood so far, however, I don't see the reason to put such limits in the wal package itself, given that the etcd server is already protected against large transactions through the |
Copy one of my previous comments:
And read 14057#discussion_r878844991 . Please update the PR per ptabor's comment above. |
Guys, I understand, you're owners and don't have to explain anything. But maybe you will find couple minutes to explain the way you're thinking?
I still can't understand your concerns about making parameters configurable. You've mentioned OOM, but those who operate particular etcd instance know more about their environment limitations? |
Thanks all for the feedback.
The proposal doesn't break anything. Note that the current WAL entry size limit is 10MB, so it is NOT possible to configure a bigger value for Also keep in mind that it's just a proposal, and it's open for discussion.
Can you elaborate your use case? @algobardo mentioned " |
Currently it's possible to set any request size, WAL also will accept entries of any size. I bet most of people don't restart theirs etcd nodes very often, 'cause in general in just not needed as etcd works remarkably stable. We found discussed issue purely by chance. That is mean that a lot of people could operate clusters with 10+ MB entry sizes in theirs WAL files, and most of the time everything will be perfect. Until they decide to restart node :) If we introduce hard limit for
Yes, in one transaction. It's not part of our daily routing with etcd, of course. But occasionally we have to make whole "fleet" update - mostly during incidents mitigation. We use transaction because such changes should be atomic. That's how we get 10k keys in one transactions in some cases. Of course we're thinking about separating such transactions into bunch of smaller ones. But in this case we lost some important guarantees about consistency and atomicity. You know, it's the first step to writing straight to files from your code instead of well-tested db :) |
Have you tested what's the request size when a transaction has 10K keys? I think we should have a limitation on both the WAL entry limit and request size, because we need to make sure they are under control and well tested. Regarding to the values, it's open to discussion. Based on current design & implementation, the WAL entry limitation can be set to SegmentSizeBytes (64MB) by default, and the good candidates for requestSize can be 16MB, 32MB and (64MB - some_overhead). Note that the new feature (in progress of reviewing) may also throttle big request. |
Our transaction size exceeds 10MB, and is below 16MB, but having a hardcoded limit to 16MB means that we would have in front of us a hard limit to the scalability of the solution we have. If the limit would be 1000%, with a gradual degradation of performance, I would feel comfortable with it, as we would have enough time to figure out if we need to find another product to support our needs, but with 60% margin I don't feel comfortable. The problem I see is that introducing a non-configurable limit on the WAL size is a breaking change. That limit was not there before #11793 (comment), and when it has been introduced (undocumented) it might have passed unnoticed, as it only shows up on node restarts. We noticed ourselves 2 years after the solution was in production. My opinion is that we should set the WAL entry size limit to 120% the request size, as that would impose some limit, would match the actual size used by people in the wild, and it would not be a breaking change. |
It's in my to do list. I will get this sorted out and ask for opinions from other maintainers and users sometime later. |
This issue should have already been resolved in release-3.5 and main. Please refer to 14114 |
The fix will be included in 3.5.5 and 3.6.0. |
Thank you! |
What happened?
I run clean etcd node as followed:
etcd --max-request-bytes=10485760
Then through the go client I
put
to keyk1
value consists of10*1024*1024-27
bytes.Then I stopped server and try to start it again, but it failed with error
wal: max entry size limit exceeded
(https://github.com/etcd-io/etcd/blob/main/server/storage/wal/decoder.go#L88).What did you expect to happen?
How can we reproduce it (as minimally and precisely as possible)?
Run clean etcd instance:
etcd --max-request-bytes=10485760
Run this go-code:
Anything else we need to know?
Let's assume you run code above, you will not be able to restart server.
But you can call
etcd snap save
while server is still running, delete all WAL files and the start server and receive saved value.Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
--max-request-bytes=10485760
Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
The text was updated successfully, but these errors were encountered: