Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIXES #81: 400x faster slice assignment #134

Merged
merged 53 commits into from
Mar 16, 2023

Conversation

muayyad-alsadi
Copy link
Contributor

for more information check #81

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Could you be sure to add some tests for this in the test suite, additionally for the cases that throw an error?

@muayyad-alsadi
Copy link
Contributor Author

Thanks! Could you be sure to add some tests for this in the test suite, additionally for the cases that throw an error?

thank you for your quick response.
I hope this will get merged soon.

I have so far demonstrated two cases one is the histogram matching image filter
the other is a simple hashing algorithm taken from the public domain constant CDB

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/

I believe the later one is a good use case for tests (the wasm is ~500 bytes only)

I think it would be better to not have this form of storage and take the store as input to the methods below. Additionally could they perhaps be renamed to read and write to be similar to the Wasmtime embedding and convey that copies are being performed as opposed to acquiring views?

unfortunately we can't change the signature of __getitem__() and __setitem__()
if you have a policy to decouple memory from storage, we can create a new class like this one I called it FastMemory

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/wasmtime_fast_memory.py

which is used like this

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/cdb_djp_hash.py

store = Store()
module = Module.from_file(store.engine, 'cdb_djp_hash.wasm')
instance = Instance(store, module, [])
memory = instance.exports(store)['memory']
fast_mem = FastMemory(memory, store)
# ...
fast_mem[start:stop] = input
output = fast_mem[start_output:stop_output]

any suggestion for its name?

I think this logic is the same as above and here, so could this be refactored out?

let me double check start, stop, step = key.indices(size) because it maybe handles all the edge conditions for us.

@muayyad-alsadi
Copy link
Contributor Author

I've added test, please let me know if you want me to move my logic into a different class and please suggest its name

@muayyad-alsadi
Copy link
Contributor Author

I don't know how do you want to solved those two type hints

91: error: Argument 1 to "len" has incompatible type "Union[int, bytearray]"; expected "Sized"  [arg-type]
86: error: Returning Any from function declared to return "Union[int, bytearray]"  [no-any-return]

should I change return to Any?

@alexcrichton
Copy link
Member

unfortunately we can't change the signature of

I realize this since they're Python builtins, which is why I was wondering if something like renaming the methods to read/write would work? I realize you wouldn't be able to use [...] syntax but that seems ok in the sense that ergonomics aren't of the utmost importance for something like this.

if you have a policy to decouple memory from storage

Personally I think the "stateful" approach you have right now where set_store must be called first is a bit of a footgun. I think it would be better to express this through a different API. One alternative is what I was mentioning where read and write method names were used and one of their arguments is a Storelike. Another alternative is what you mentioned where, from a Memory, you can pair it with a Store and then read/write that returned object. I wouldn't call it a "fast memory" per se but instead probably something like:

class Memory:
    def view(self, store: Storelike) -> MemoryView:
        # ...

where MemoryView has the __{get,set}item__ methods you currently have, and MemoryView internally retains both the Store and Memory.

In either case though I think what's important is to avoid the stateful setting-of-a-store and instead make it explicit as part of the API.

should I change return to Any?

Sorry I'm not familiar enough with mypy myself to know what's going on here.

@muayyad-alsadi
Copy link
Contributor Author

where MemoryView

I've called it MemorySlicer because view might imply zero-copy.

@alexcrichton
Copy link
Member

Thinking more on this, can you say more as to why a dedicated read/write method won't work? Idiomatically with these bindings this new class doesn't really quite fit, since it's the only one that closes over a Store and everything else takes it as an argument.

@muayyad-alsadi
Copy link
Contributor Author

muayyad-alsadi commented Mar 7, 2023

Thinking more on this, can you say more as to why a dedicated read/write method won't work?

we can do them, but I have two concern

  • the slicing operator is more familiar and versatile ex. slicer[-10:] or slicer[5:-10], where as taking arguments will be confusing. I'm even considering implementing step ex. red, green, blue =slicer[0::3], slicer[1::3], slicer[2::3]

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/cdb_djp_hash.py#L44
https://github.com/muayyad-alsadi/wasm-demos/blob/main/histogram-matching/histo_wasmtime_fast.py#L67

    # cdb_djp_hash.py
    fast_mem[start:stop]=input

    # histo_wasmtime_fast.py
    fast_mem[0:] = a_ref
    fast_mem[len(a_ref):] = a_in
    dst = fast_mem[len(a_ref)+len(a_in): len(a_ref)+len(a_in)+len(a_in)]
  • I can't figure what the signature should be
def read(self, store, key=None):
    start, stop, step = key.indices(size)

def write(self, store, value, key=None):
    start, stop, step = key.indices(size)

or

def read(self, store, start=0, stop=None, step=None):
    key = slice(start, stop, step)
    start, stop, step = key.indices(size)
    # ...

def write(self, store, value, start=0, stop=None, step=None):
    key = slice(start, stop, step)
    start, stop, step = key.indices(size)
    # ...
  • should we use built in slice() to get all the goodies of negative indexing ...etc or should we replicate that?
  • should those function handle single offset access? in that case should it return an array of one item or just the item?
  • is it ok for write() to take value after store? while the standard __setitem__() is value-last
  • is it start, stop, step or offset, length?

@muayyad-alsadi
Copy link
Contributor Author

The "NOTE" here disagrees with the code below. The NOTE makes me expect c_ubyte * 1 when below it shows up as * val_size. I think that discrepancy should be fixed.

no, it's totally fine, it will work, I've tested it
the reason that ctypes.memmove() will work even if it's passed pointers of *1 and val_size>1
is that memmove(3) takes a void*
pointer

if you insist I'll remove this comment.

Since this is a relatively low-level interface could this perhaps take bytearray directly? I think that would help with mypy typing as well since the argument would have a specific type.

mypy already passes fine
I want this function to be used with b"123" and with array.array() without casting
because it's very common to pass bytes to bytearrays and vice versa

>>> hashlib.sha256(b"foo").hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(bytearray(b"foo")).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(array.array('B', b"foo")).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(array.array('B', bytearray(b"foo"))).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'

and bytearray and array.array are done without copying
the pointer as seen by ctypes.memmove is equivalent in both bases

if you like I can change the type of value to be typing.Sequence which is anything that implements len and getitem
or typing.

>>> isinstance([], typing.Sequence)
True
>>> isinstance(b"", typing.Sequence)
True
>>> isinstance(array.array('B', []), typing.Sequence)
True
>>> isinstance(bytearray(), typing.Sequence)
True

>>> isinstance(bytearray(), typing.Iterable)
True
>>> isinstance(array.array('B', []), typing.Iterable)
True
>>> isinstance([], typing.Iterable)
True
>>> isinstance(b"", typing.Iterable)
True

but that seems like it should throw an exception rather than silently returning an empty array?

this is the standard behavior in python of all indexable array like objecrs

>>> ls = list(range(0, 100))
>>> len(ls)
100
>>> ls[:200]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> ls[200:]
[]
>>> ls[4:4]
[]
>>> ls[4:3]
[]
>>>
>>> ba=bytearray([i for i in range(ord('a'), ord('z')+1)])
>>> ba
bytearray(b'abcdefghijklmnopqrstuvwxyz')
>>> len(ba)
26
>>> ba[:200]
bytearray(b'abcdefghijklmnopqrstuvwxyz')
>>> ba[200:]
bytearray(b'')
>>> ba[4:4]
bytearray(b'')
>>> ba[4:3]
bytearray(b'')

A downside of having the stop argument here is that this needs to document semantics of "what if the desired range is bigger than the value slice?" and I would prefer to not have to answer such a question.

as you can see in the code, we did not implement the logic, we left it to python builtin slice to document and implement this.

key = slice(start, stop)
start, stop, _ = key.indices(size)

this will do all the cases like

  • [-10 : ] last 10 bytes
  • [ : -10] up to the last 10 bytes
  • [200 : -10] from offset=200 up to the last 10 bytes.

and as mentioned before in python the slice might under-return as in the following (we asked to 200 items, but got 26)

>>> ba[:200]
bytearray(b'abcdefghijklmnopqrstuvwxyz')

I prefer to delegate to python builtin logic than to implement my own, because if we do a custom then we need to do more documentation and more reasoning

I hope I answered all user concerns

@alexcrichton
Copy link
Member

Sorry this PR is taking quite a lot more time than I though it would to review, so I'm going to have to find some dedicated time to review this in the future as it's not working for me to fit it in "gaps" in my day.

@muayyad-alsadi
Copy link
Contributor Author

TYT and I'm open to any suggestion or question.

@j-blue-arz
Copy link

j-blue-arz commented Mar 12, 2023

Firstly, I think this is an important topic and I would like to help where I can.

I think much of the still-open discussions come from slice vs. no slice interface distinction. Both the question about the necessity of the stop parameter and the discussion on negative length can be reduced to that.

@muayyad-alsadi I understand that you would like the interface to behave like a slice. All your arguments on the above two matters are correct for slices. As a python programmer, I would also expect that mem[start:stop] = val works even with negative values for stop, and mem[4:2] will return an empty array.
However, it was already decided that we go with the mem.write(store, start, stop, val) interface. So it is not a slice! Now, negative stop values are somewhat confusing.
Even worse, your argument on the negative start / stop values does not hold when assigning to a slice:

>>> a = [0, 1, 2, 3, 4, 5, 6]
>>> a[2:-2] = [9]
>>> a
[0, 1, 9, 5, 6]

That is not at all what I would expect when writing to a location in memory, and also is (luckily) not what memmove does. Therefore, @alexcrichton's remark on the

semantics of "what if the desired range is bigger than the value slice?"

is completely appropriate.

I would vote to remove the stop parameter for the write method. For the read method, I am undecided.

def write(
self,
store: Storelike,
value: typing.Any,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said "works with mypy" better I was referring to the fact that this is listed as typing.Any. While I don't dispute "mypy runs without errors" falling back on typing.Any defeats the purpose of using type ascription in the first place.

This function does not take any type in Python. I think you provided a compelling case for it shouldn't just take bytearray, but in that case the list of supported types I think should be listed here under typing.Union[...] or something similar.

Comment on lines 118 to 119
if slice_size != val_size:
raise IndexError(f"mismatched value size, value of size [{val_size}] with slice of size [{slice_size}]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this logic is correct? Here slice_size is always equal to val_size, so this if never fires?

Comment on lines 123 to 124
src_ptr = (ptr_type).from_buffer(value)
dst_ptr = (ptr_type).from_address(ctypes.addressof(data_ptr.contents) + start)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that the parentheses around ptr_type are necessary here.

Comment on lines 97 to 100
write a bytearray value into a possibly large slice of memory
negative start is allowed in a way similat to list slice mylist[-10:]
if value is not bytearray it will be used to construct an intermediate bytearray
return number of bytes written
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to expand this documentation to explain what happens for the raise cases below, for example writing beyond the end of memory. My impression is that it behaves differently than native Python slicing which I think would be good to call out.

@muayyad-alsadi
Copy link
Contributor Author

I've addressed all of those points in my last push.

if start >= size:
raise IndexError("index out of range")
# value must be bytearray ex. cast bytes() to bytearray
if not isinstance(value, array.array) and not isinstance(value, bytearray):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that if array.array is to be supported here it needs more work and more tests. Currently there are no tests using array.array and for any size bigger than u8 I don't believe it's correct. For example an array.array of 32-bit integers doesn't correctly copy the size of the array into memory. Furthermore endianness is a concern with array.array where I don't think that larger than 1-byte values should be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are totally right, should I check for value.itemsize==1 and throw ValueError or remove array.array

raise IndexError("index out of range")
# value must be bytearray ex. cast bytes() to bytearray
if not isinstance(value, array.array) and not isinstance(value, bytearray):
# value = array.array('B', value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line can be removed?

# value must be bytearray ex. cast bytes() to bytearray
if not isinstance(value, array.array) and not isinstance(value, bytearray):
# value = array.array('B', value)
value = bytearray(value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is bytearray the best choice of conversion here? If the input is a bytes, for example, wouldn't this create a temporary copy before then copying into the linear memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, bytes does not work because src_ptr = ptr_type.from_buffer(value) would give

TypeError: underlying buffer is not writable

@muayyad-alsadi
Copy link
Contributor Author

I would love to include thewtex suggestion

def get_buffer_ptr(self, store):
    """
    buffer suitable for creating zero-copy writable NumPy Buffer Protocol

    np_mem = np.frombuffer(memory.get_buffer_ptr(store), dtype=np.uint8)
    np_mem[start:end] = A
    B = np_mem[start:end]
    """
    ptr_type = ctypes.c_ubyte * (self.data_len(store))
    return ptr_type.from_address(ctypes.addressof(self.data_ptr(store).contents))

@alexcrichton
Copy link
Member

Thanks! This looks good so I'll merge this. I think that method might make a good follow-up PR if you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants