-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Great amount of failed put due container lock? #1108
Comments
Sorry for mistake. I can purge delete queues. But I still cannot understand of unavailable cause. |
From the client side I see the following:
|
The unavailable error caused by "a container being locked" can happen only while the compaction is ongoing so if no compaction is ongoing then there is another reason for unavailable errors. I've summarized when an unavailable error can happen at the below comment.
I think the most probable one you've hit is the first one "Any watchdog got triggered". In order to confirm whether my guess is correct, Can you share leo_storage.conf used by your cluster? if some watchdog is enabled then disabling it may solve the problem. The second probable one is the last one "No available nodes found during the read-repair process". This can happen in case leo_storage nodes are under heavy load. Can you check how much system resources (CPU, Memory, Disk/Netwrok I/O) have been consumed on leo_storage nodes? if it's too high then suspending the queue by "leofs-adm mq-suspend ${storage_node} leo_per_object_queue" for a while may solve(ease) the problem. |
@mocchira Thank you for your answer. It seems I did not provide enough info here. Sorry.
But I will try suspend queues once more and report here. My leo_storage.conf is here: |
@vlakas Thanks for the info. Your leo_storage.conf looks good (no watchdog enabled) so my second guess
might be the reason why unavailable errors have happened.
I see.
Although it's possible to add 4 nodes and rebalance at once, as it causes the whole cluster to be under very high load, I'd recommend you to attach a node and rebalance one by one to keep the cluster relatively low load during the rebalance. |
@mocchira Thank you for the rebalancing tip. It makes sense.
Unfortunately suspending MQ queues makes no difference. There are no any compaction tasks running. I'm now investigating bottleneck according to system metrics.
It would be helpful. |
I see. Just in case, Please tell us the output of mq-stats on every storage node.
OK. Please tell us any system metrics if you find something weird.
Please try the below tool out on your storage nodes (especially the ones under higher load) This would enables us to find how many times each module:function is called in a certain period (CPU bound processes can be found) and also find how many queues (remained tasks) each erlang process has (I/O bound processes can be found). The below link is an leofs_doctor example another LeoFS user provide us to look into a problem. I hope you find it helpful to run leofs_doctor on your environment. Let me know if you have any questions/problems. |
mq-stats-active.txt mq-stats-active.txt shows only active queues. And full output of mq-stats command on all storage nodes just in case.
I am afraid that main bottleneck may be disk subsystem. On each storage node I have 3x4TB HDD (mdadm, stripe) and 1x1TB SSD as cacheing device (bcache in writeback cache mode). But in most cases disk usage is no more than 70%. But I have 2 nodes (of total 18) that are really slow with 100% disk utilization (because of small SSD cache disk - 0.5TB). But suspending them or shutting down leofs sotrage does not make any difference. Formerly we had riak with bitcask (without active-entropy) on the same nodes. And it woks almost perfect (bitcask stores key names in RAM). Currently no more than 30% of 48GB RAM is consumed on storage nodes; average - 15-20. Maybe I need some tuning of leveldb backend? Does it make sens? My current storage config assumes that 50% of RAM may be used by leofs (other - filesystem cache):
@mocchira Thank you very much. I'll give it a try. |
Here is the result of leofs_doctor for 2 loaded nodes (high disk usage): Here is the result for nodes with non-empty Here is the result for normal nodes (no system metrics anomalies detected; empty |
Maybe, however your leveldb setting described below looks good to me. so I think some other tuning might be needed.
Suggestions based on the results of diagnostic tools and system metrics.
From leofs_docktor_mamba.txt, I can see logging related processes are bursting probably due to massive amount of PUT errors you've faced now and this can result in high disk utilization on that node. So the below procedure might ease the problem. # do the below procedure on all storage nodes
## login remote console on leo_storage
${LEOFS_ROOT}/leo_storage/bin/leo_storage remote_console
## do the below procedure in remote_console
## Limit the number of messages per second allowed from error_logger
Handlers = gen_event:which_handlers(lager_event),
lists:foreach(fun(Handler) ->
lager:set_loghwm(Handler, 50)
end, Handlers).
application:set_env(lager, killer_hwm, 100). This configuration changes can drop log messages when the rate writing log messages exceeds a certain threshold so it could avoid high disk utilization on storage nodes. From other leofs_docktor_.txt, I noticed leo_object_storage_read_ processes run frequently compared to the normal LeoFS cluster so many GET/HEAD operations might cause frequent disk seeks and result in high disk utilization on storage nodes. also I noticed your cluster have so many leo_gateway nodes so the cache hit rate on leo_gateway would probably low (as cache will be scattered across multiple leo_gateway nodes). As a result, many RPC calls to leo_storage nodes from leo_gateway can happen. therefore I'd like to suggest
which will result in reducing RPC calls to leo_storage nodes and also reduce disk utilization on leo_storage nodes. I hope this will work for you. |
@mocchira I appreciate your help. Good point about leo_gateways. I will try to reduce amount of leo_gateways (with cache tuning) and will try some balancing tricks (i.e. some sort of ip hash balancig or so). Currently "small" cacheing disks are replaced it may help a bit. |
And it seems to me that nothing happens at all unfortunately (I mean errors rate). Can you tell me where exactly container locks take place: locally on storage nodes or on leo_manager nodes (globally)? |
One more question. There is many reqests of non-existent objects (1/4) in leofs (all not_founds we try to write to leofs from backup amazon S3 bucket). Is it an issue for eleveldb? Is is safe to purge |
@vlakas looks good to me on what you did > #1108 (comment) other than
Please try to set this field to "false". This may reduce the number of read operations on leo_storage(s).
As I said at the previous comment, "container locks" is not the root cause. (This can happen only in case compaction(s) running) For more details, please read the comment here: #1108 (comment) Given that many read operations(GET/HEAD) running on leo_storage(s) according to the result of leofs_doctor, I think "No available nodes found during the read-repair process" can be the root cause of unavailable errors. So could you run leofs_doctor again and share the result on github? and also I want to know the current disk utilization by iostat or something else which give us the detailed info about the load of disk. also the full error log on one of leo_storage(s) would be helpful to dive into further.
Yes it can as it may cause additional disk seeks at multiple leveldb files on leo_storage(s) (the number of files to be sought depends on the level)
As long as there are no broken nodes and every object stored in LeoFS has at least one replica, it's safe. however if the node having the replica gets broken without any redundancy then the data loss could happen so I'd recommend you issue recover-consistency on all leo_storage nodes one by one ASAP.
Nothing. |
@vlakas In order to replicate your situation on our testing environment, it would be great if you tell us
Once we get those info, we will try to replicate the problem on as much similar env to yours as possible. |
@vlakas I've just remembered another user faced the very similar problem you're facing now and their problem has been solved by lowering obj_containers.num_of_containers (In their case, they changed from 384 to 64). so it might work for you (for example changing it from 128 to 64 or 32). Please bear in mind that changing obj_containers.num_of_containers on leo_storage means all objects stored in that node will get lost so you need to issue recover-node after changing the configuration and restarting the node. |
@mocchira Thank you for detailed explanation
Got it.
I forget to say that I have dedicated leo_gateway servers currently. So it will not help a lot. Regarding to replicating the situation. I really do not want to bother you for now. First of all it seems that I have servers performance issue (in particular disk subsystem). We decided to add more nodes to the cluster (7 or 8 nodes). It may take a week because of great amount of small objects. Then I will post here results. |
@mocchira Thank you for valuable info. Unfortunately procedure of changing containers count will be very slow (because of great amount of stored objects). But I think that I have to give it a try. If I change containers number on couple of nodes can we undarstand if it would be positive changes? Or we need to change it on every storage node in the cluster? Also do I correctly understand procedure of changing containers number:
|
Numbers on requests and system metrics I will post here a little later. |
Yes you can.
After starting the storage node, issue |
leofs-adm status
|
@vlakas Thanks for the info. The result of leofs_doctor(s) looks nothing bad to me.
In order to check if num_of_containers is the cause, it would be more better to check the iostat between a storage node with 32 containers and another one with 64 containers. if the disk util is lower on the node with 32 containers then it works. |
@mocchira I have a good news. LeoFS works as expected. The reason of fails is overtuned config of leo_storage. I've reverted to default config with minor modifications and reassembled leofs cluster. Later I will post diff of configs (currently I do not understand the root cause of my problem). |
Great to hear that.
Yes please, that will help a lot to find out the root cause for me. |
leo_storage.conf-new.txt @mocchira New (actual working config) and old (non-working config) configs are attached here. Also |
@vlakas Thanks for the great feedback. > ## e.g. Case of plural pathes
> ## obj_containers.path = [/var/leofs/avs/1, /var/leofs/avs/2]
> ## obj_containers.num_of_containers = [32, 64]
>
41c45
< num_of_vnodes = 168
---
> #num_of_vnodes = 168
61c65
< max_num_of_procs = 3000
---
> #max_num_of_procs = 10000
66c70
< ## num_of_obj_storage_read_procs = 3
---
> #num_of_obj_storage_read_procs = 100
79c83
< watchdog.rex.is_enabled = true
---
> watchdog.rex.is_enabled = false
186c190
< compaction.limit_num_of_compaction_procs = 4
---
> compaction.limit_num_of_compaction_procs = 1
218c222
< mq.num_of_batch_process_max = 3000
---
> mq.num_of_batch_process_max = 10000
227c231
< mq.interval_between_batch_procs_max = 3000
---
> mq.interval_between_batch_procs_max = 1000
237,238c241
< backend_db.eleveldb.write_buf_size = 62914560
< #backend_db.eleveldb.write_buf_size = 268435456
---
> backend_db.eleveldb.write_buf_size = 268435456
241c244
< ## backend_db.eleveldb.max_open_files = 1000
---
> backend_db.eleveldb.max_open_files = 10000
333c336
< ## rpc.server.acceptors = 128
---
> rpc.server.acceptors = 5186
339c342
< ## rpc.server.listen_timeout = 30000
---
> #rpc.server.listen_timeout = 30000
342c345
< ## rpc.client.connection_pool_size = 8
---
> #rpc.client.connection_pool_size = 192
345c348
< ## rpc.client.connection_buffer_size = 8
---
> #rpc.client.connection_buffer_size = 192
442c445
< ## snmp_conf = ./snmp/snmpa_storage_0/leo_storage_snmp
---
> ## snmp_conf = ./snmp/snmpa_storage_0/leofs_storage_snmp The difference which may affect the performance and error rate is
Since those two settings control how fast the background processing (recover/rebalance) is proceeded (In other words, how much system resources the background processing consumes), the old configuration caused the background processing to consume much more resource than the frontend(handling PUT/GET/DELETE/HEAD) and resulted in the problem you faced (many put failures). I'm going to add this information to FAQ section in our official document for those who may face the same problem in future. Thanks again. |
@vlakas if there is no problem remained then please close the issue. |
@mocchira Thank you very much for amazing support. This issue is completely solved. |
Hello.
Summary
I'm trying to investigate issue with great amount of failed PUTs. I have cluster with 18 nodes (27 in the future). LeoFS version - 1.4.2
Gateway logs:
On storage nodes I see multiple log records:
I think that my problem is here https://github.com/leo-project/leo_object_storage/blob/v1/src/leo_object_storage_server.erl#L431
Those this problem is due container locks? I am not sure because I am not so good in Erlang.
State of cluster
It seems to me that there is no any issues with ring inconsistency:
But
leo_per_object_queue
after cluster rebalancing - 4 nodes has been added to cluster (it seems to me ok)I've made an attempt to clear delete queue with no success. It is still not empty.
According to haproxy logs (balancer behind leo gateways) ther is no DELETE requests at all.
I think that this may affect PUT/DELETE requests.
How can I investigate this problem further?
The text was updated successfully, but these errors were encountered: