-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce mapped memory with many bulk clients #963
Reduce mapped memory with many bulk clients #963
Conversation
With this commit we share a parameter source for all bulk indexing clients per Rally worker. As all clients run in the same asyncio event loop they can also share a parameter source. This reduces the number of mmap system calls and thus virtual memory usage significantly: we only map the bulk data file(s) now only once per process instead of once per client. We can also better take advantage of prefetching as multiple clients within a process read now linearly from the mapped file. We have also changed the assignment rules slightly so successive client ids get assigned to a worker in order to read a continuous range of data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I tested some things and it lifts the problem of high mem usage with mmap.
worker = host["workers"][worker_idx % len(workers)] | ||
worker.append(client_idx) | ||
host["worker"] = worker_idx + 1 | ||
assert remaining_clients == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raise an exception or error out if this is not true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was that this is a lightweight check for violation of an invariant due a logical error in the implementation (in the sense of "this thing should never fail"). Should that ever fail it would be easy to reproduce and that's why I used an assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see used as a debugging aid then ++
Thanks for the review! :) |
With this commit we share a parameter source for all bulk indexing
clients per Rally worker. As all clients run in the same asyncio event
loop they can also share a parameter source. This reduces the number of
mmap system calls and thus virtual memory usage significantly: we only
map the bulk data file(s) now only once per process instead of once per
client. We can also better take advantage of prefetching as multiple
clients within a process read now linearly from the mapped file. We have
also changed the assignment rules slightly so successive client ids get
assigned to a worker in order to read a continuous range of data.