Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip coercing to bytes in merge_frames #3960

Merged
merged 1 commit into from
Jul 14, 2020

Conversation

jakirkham
Copy link
Member

@jakirkham jakirkham commented Jul 14, 2020

As the frames we receive are typically mutable, non-bytes objects like bytearrays or NumPy ndarrays, coercing to bytes at this stage triggers a copy of all frames. As we are going to toss those copied versions anyways when joining them into a larger bytes object, this ends up being wasteful with memory. Fortunately bytes.join(...) accepts any and all bytes-like objects. So instead just pass them all through as-is to bytes.join(...), which is free and doesn't require a copy. Should cutdown on the memory usage in this part of the code.

As the frames we receive are typically mutable, non-`bytes` objects like
`bytearray`s or NumPy `ndarray`s, coercing to `bytes` at this stage
triggers a copy of all frames. As we are going to toss those copied
versions anyways when joining them into a larger `bytes` object, this
ends up being wasteful with memory. Fortunately `bytes.join(...)`
accepts any and all `bytes`-like objects. So instead just pass them all
through as-is to `bytes.join(...)`, which is free and doesn't require a
copy.  Should cutdown on the memory usage in this part of the code.
@jakirkham
Copy link
Member Author

jakirkham commented Jul 14, 2020

FWIW here's a little example showing this works.

In [1]: import numpy as np                                                      

In [2]: b"".join([ 
   ...:     b"abc", 
   ...:     bytearray(b"def"), 
   ...:     memoryview(b"ghi"), 
   ...:     np.arange(106, 109, dtype="u1") 
   ...: ])                                                                      
Out[2]: b'abcdefghijkl'

This was raised in Python issue ( https://bugs.python.org/issue15958 ) and implemented in commit ( python/cpython@cfc22b4 ) as part of Python 3.4+.

@quasiben
Copy link
Member

I kicked the CI here so test should be passing now

@mrocklin mrocklin merged commit 1d92125 into dask:master Jul 14, 2020
@jakirkham jakirkham deleted the avoid_copy_merge_frames branch July 14, 2020 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants