Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce mapped memory with many bulk clients #963

Merged

Conversation

danielmitterdorfer
Copy link
Member

With this commit we share a parameter source for all bulk indexing
clients per Rally worker. As all clients run in the same asyncio event
loop they can also share a parameter source. This reduces the number of
mmap system calls and thus virtual memory usage significantly: we only
map the bulk data file(s) now only once per process instead of once per
client. We can also better take advantage of prefetching as multiple
clients within a process read now linearly from the mapped file. We have
also changed the assignment rules slightly so successive client ids get
assigned to a worker in order to read a continuous range of data.

With this commit we share a parameter source for all bulk indexing
clients per Rally worker. As all clients run in the same asyncio event
loop they can also share a parameter source. This reduces the number of
mmap system calls and thus virtual memory usage significantly: we only
map the bulk data file(s) now only once per process instead of once per
client. We can also better take advantage of prefetching as multiple
clients within a process read now linearly from the mapped file. We have
also changed the assignment rules slightly so successive client ids get
assigned to a worker in order to read a continuous range of data.
@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Apr 15, 2020
@danielmitterdorfer danielmitterdorfer added this to the 1.5.0 milestone Apr 15, 2020
@danielmitterdorfer danielmitterdorfer self-assigned this Apr 15, 2020
@danielmitterdorfer danielmitterdorfer marked this pull request as ready for review April 16, 2020 08:43
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I tested some things and it lifts the problem of high mem usage with mmap.

worker = host["workers"][worker_idx % len(workers)]
worker.append(client_idx)
host["worker"] = worker_idx + 1
assert remaining_clients == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raise an exception or error out if this is not true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was that this is a lightweight check for violation of an invariant due a logical error in the implementation (in the sense of "this thing should never fail"). Should that ever fail it would be easy to reproduce and that's why I used an assertion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see used as a debugging aid then ++

@danielmitterdorfer
Copy link
Member Author

Thanks for the review! :)

@danielmitterdorfer danielmitterdorfer merged commit 1fbd0bb into elastic:master Apr 27, 2020
@danielmitterdorfer danielmitterdorfer deleted the shared-mmap-bulk branch April 27, 2020 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants