FIXES #81: 400x faster slice assignment #134

muayyad-alsadi · 2023-03-05T09:26:40Z

for more information check #81

alexcrichton

Thanks! Could you be sure to add some tests for this in the test suite, additionally for the cases that throw an error?

wasmtime/_memory.py

muayyad-alsadi · 2023-03-07T07:41:30Z

Thanks! Could you be sure to add some tests for this in the test suite, additionally for the cases that throw an error?

thank you for your quick response.
I hope this will get merged soon.

I have so far demonstrated two cases one is the histogram matching image filter
the other is a simple hashing algorithm taken from the public domain constant CDB

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/

I believe the later one is a good use case for tests (the wasm is ~500 bytes only)

I think it would be better to not have this form of storage and take the store as input to the methods below. Additionally could they perhaps be renamed to read and write to be similar to the Wasmtime embedding and convey that copies are being performed as opposed to acquiring views?

unfortunately we can't change the signature of __getitem__() and __setitem__()
if you have a policy to decouple memory from storage, we can create a new class like this one I called it FastMemory

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/wasmtime_fast_memory.py

which is used like this

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/cdb_djp_hash.py

store = Store()
module = Module.from_file(store.engine, 'cdb_djp_hash.wasm')
instance = Instance(store, module, [])
memory = instance.exports(store)['memory']
fast_mem = FastMemory(memory, store)
# ...
fast_mem[start:stop] = input
output = fast_mem[start_output:stop_output]

any suggestion for its name?

I think this logic is the same as above and here, so could this be refactored out?

let me double check start, stop, step = key.indices(size) because it maybe handles all the edge conditions for us.

muayyad-alsadi · 2023-03-07T09:45:17Z

I've added test, please let me know if you want me to move my logic into a different class and please suggest its name

muayyad-alsadi · 2023-03-07T11:55:02Z

I don't know how do you want to solved those two type hints

91: error: Argument 1 to "len" has incompatible type "Union[int, bytearray]"; expected "Sized"  [arg-type]
86: error: Returning Any from function declared to return "Union[int, bytearray]"  [no-any-return]

should I change return to Any?

alexcrichton · 2023-03-07T15:47:58Z

unfortunately we can't change the signature of

I realize this since they're Python builtins, which is why I was wondering if something like renaming the methods to read/write would work? I realize you wouldn't be able to use [...] syntax but that seems ok in the sense that ergonomics aren't of the utmost importance for something like this.

if you have a policy to decouple memory from storage

Personally I think the "stateful" approach you have right now where set_store must be called first is a bit of a footgun. I think it would be better to express this through a different API. One alternative is what I was mentioning where read and write method names were used and one of their arguments is a Storelike. Another alternative is what you mentioned where, from a Memory, you can pair it with a Store and then read/write that returned object. I wouldn't call it a "fast memory" per se but instead probably something like:

class Memory:
    def view(self, store: Storelike) -> MemoryView:
        # ...

where MemoryView has the __{get,set}item__ methods you currently have, and MemoryView internally retains both the Store and Memory.

In either case though I think what's important is to avoid the stateful setting-of-a-store and instead make it explicit as part of the API.

should I change return to Any?

Sorry I'm not familiar enough with mypy myself to know what's going on here.

muayyad-alsadi · 2023-03-07T18:07:22Z

where MemoryView

I've called it MemorySlicer because view might imply zero-copy.

alexcrichton · 2023-03-07T20:06:32Z

Thinking more on this, can you say more as to why a dedicated read/write method won't work? Idiomatically with these bindings this new class doesn't really quite fit, since it's the only one that closes over a Store and everything else takes it as an argument.

muayyad-alsadi · 2023-03-07T20:48:53Z

Thinking more on this, can you say more as to why a dedicated read/write method won't work?

we can do them, but I have two concern

the slicing operator is more familiar and versatile ex. slicer[-10:] or slicer[5:-10], where as taking arguments will be confusing. I'm even considering implementing step ex. red, green, blue =slicer[0::3], slicer[1::3], slicer[2::3]

https://github.com/muayyad-alsadi/wasm-demos/blob/main/cdb_djp_hash/cdb_djp_hash.py#L44
https://github.com/muayyad-alsadi/wasm-demos/blob/main/histogram-matching/histo_wasmtime_fast.py#L67

    # cdb_djp_hash.py
    fast_mem[start:stop]=input

    # histo_wasmtime_fast.py
    fast_mem[0:] = a_ref
    fast_mem[len(a_ref):] = a_in
    dst = fast_mem[len(a_ref)+len(a_in): len(a_ref)+len(a_in)+len(a_in)]

I can't figure what the signature should be

def read(self, store, key=None):
    start, stop, step = key.indices(size)

def write(self, store, value, key=None):
    start, stop, step = key.indices(size)

or

def read(self, store, start=0, stop=None, step=None):
    key = slice(start, stop, step)
    start, stop, step = key.indices(size)
    # ...

def write(self, store, value, start=0, stop=None, step=None):
    key = slice(start, stop, step)
    start, stop, step = key.indices(size)
    # ...

should we use built in slice() to get all the goodies of negative indexing ...etc or should we replicate that?
should those function handle single offset access? in that case should it return an array of one item or just the item?
is it ok for write() to take value after store? while the standard __setitem__() is value-last
is it start, stop, step or offset, length?

wasmtime/_memory.py

muayyad-alsadi · 2023-03-08T17:21:25Z

The "NOTE" here disagrees with the code below. The NOTE makes me expect c_ubyte * 1 when below it shows up as * val_size. I think that discrepancy should be fixed.

no, it's totally fine, it will work, I've tested it
the reason that ctypes.memmove() will work even if it's passed pointers of *1 and val_size>1
is that memmove(3) takes a void*
pointer

if you insist I'll remove this comment.

Since this is a relatively low-level interface could this perhaps take bytearray directly? I think that would help with mypy typing as well since the argument would have a specific type.

mypy already passes fine
I want this function to be used with b"123" and with array.array() without casting
because it's very common to pass bytes to bytearrays and vice versa

>>> hashlib.sha256(b"foo").hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(bytearray(b"foo")).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(array.array('B', b"foo")).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'
>>> hashlib.sha256(array.array('B', bytearray(b"foo"))).hexdigest()
'2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae'

and bytearray and array.array are done without copying
the pointer as seen by ctypes.memmove is equivalent in both bases

if you like I can change the type of value to be typing.Sequence which is anything that implements len and getitem
or typing.

>>> isinstance([], typing.Sequence)
True
>>> isinstance(b"", typing.Sequence)
True
>>> isinstance(array.array('B', []), typing.Sequence)
True
>>> isinstance(bytearray(), typing.Sequence)
True

>>> isinstance(bytearray(), typing.Iterable)
True
>>> isinstance(array.array('B', []), typing.Iterable)
True
>>> isinstance([], typing.Iterable)
True
>>> isinstance(b"", typing.Iterable)
True

but that seems like it should throw an exception rather than silently returning an empty array?

this is the standard behavior in python of all indexable array like objecrs

>>> ls = list(range(0, 100))
>>> len(ls)
100
>>> ls[:200]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> ls[200:]
[]
>>> ls[4:4]
[]
>>> ls[4:3]
[]
>>>
>>> ba=bytearray([i for i in range(ord('a'), ord('z')+1)])
>>> ba
bytearray(b'abcdefghijklmnopqrstuvwxyz')
>>> len(ba)
26
>>> ba[:200]
bytearray(b'abcdefghijklmnopqrstuvwxyz')
>>> ba[200:]
bytearray(b'')
>>> ba[4:4]
bytearray(b'')
>>> ba[4:3]
bytearray(b'')

A downside of having the stop argument here is that this needs to document semantics of "what if the desired range is bigger than the value slice?" and I would prefer to not have to answer such a question.

as you can see in the code, we did not implement the logic, we left it to python builtin slice to document and implement this.

key = slice(start, stop)
start, stop, _ = key.indices(size)

this will do all the cases like

[-10 : ] last 10 bytes
[ : -10] up to the last 10 bytes
[200 : -10] from offset=200 up to the last 10 bytes.

and as mentioned before in python the slice might under-return as in the following (we asked to 200 items, but got 26)

>>> ba[:200]
bytearray(b'abcdefghijklmnopqrstuvwxyz')

I prefer to delegate to python builtin logic than to implement my own, because if we do a custom then we need to do more documentation and more reasoning

I hope I answered all user concerns

alexcrichton · 2023-03-09T18:40:22Z

Sorry this PR is taking quite a lot more time than I though it would to review, so I'm going to have to find some dedicated time to review this in the future as it's not working for me to fit it in "gaps" in my day.

muayyad-alsadi · 2023-03-09T21:35:21Z

TYT and I'm open to any suggestion or question.

j-blue-arz · 2023-03-12T13:58:47Z

Firstly, I think this is an important topic and I would like to help where I can.

I think much of the still-open discussions come from slice vs. no slice interface distinction. Both the question about the necessity of the stop parameter and the discussion on negative length can be reduced to that.

@muayyad-alsadi I understand that you would like the interface to behave like a slice. All your arguments on the above two matters are correct for slices. As a python programmer, I would also expect that mem[start:stop] = val works even with negative values for stop, and mem[4:2] will return an empty array.
However, it was already decided that we go with the mem.write(store, start, stop, val) interface. So it is not a slice! Now, negative stop values are somewhat confusing.
Even worse, your argument on the negative start / stop values does not hold when assigning to a slice:

>>> a = [0, 1, 2, 3, 4, 5, 6]
>>> a[2:-2] = [9]
>>> a
[0, 1, 9, 5, 6]

That is not at all what I would expect when writing to a location in memory, and also is (luckily) not what memmove does. Therefore, @alexcrichton's remark on the

semantics of "what if the desired range is bigger than the value slice?"

is completely appropriate.

I would vote to remove the stop parameter for the write method. For the read method, I am undecided.

alexcrichton · 2023-03-13T14:38:54Z

wasmtime/_memory.py

+    def write(
+            self,
+            store: Storelike,
+            value: typing.Any,


When I said "works with mypy" better I was referring to the fact that this is listed as typing.Any. While I don't dispute "mypy runs without errors" falling back on typing.Any defeats the purpose of using type ascription in the first place.

This function does not take any type in Python. I think you provided a compelling case for it shouldn't just take bytearray, but in that case the list of supported types I think should be listed here under typing.Union[...] or something similar.

alexcrichton · 2023-03-13T14:41:09Z

wasmtime/_memory.py

+        if slice_size != val_size:
+            raise IndexError(f"mismatched value size, value of size [{val_size}] with slice of size [{slice_size}]")


I don't think that this logic is correct? Here slice_size is always equal to val_size, so this if never fires?

alexcrichton · 2023-03-13T14:41:43Z

wasmtime/_memory.py

+        src_ptr = (ptr_type).from_buffer(value)
+        dst_ptr = (ptr_type).from_address(ctypes.addressof(data_ptr.contents) + start)


I don't think that the parentheses around ptr_type are necessary here.

alexcrichton · 2023-03-13T14:46:31Z

wasmtime/_memory.py

+        write a bytearray value  into a possibly large slice of memory
+        negative start is allowed in a way similat to list slice mylist[-10:]
+        if value is not bytearray it will be used to construct an intermediate bytearray
+        return number of bytes written


I think it would be good to expand this documentation to explain what happens for the raise cases below, for example writing beyond the end of memory. My impression is that it behaves differently than native Python slicing which I think would be good to call out.

…ment

muayyad-alsadi · 2023-03-13T19:15:03Z

I've addressed all of those points in my last push.

alexcrichton · 2023-03-14T14:24:38Z

wasmtime/_memory.py

+        if start >= size:
+            raise IndexError("index out of range")
+        # value must be bytearray ex. cast bytes() to bytearray
+        if not isinstance(value, array.array) and not isinstance(value, bytearray):


I think that if array.array is to be supported here it needs more work and more tests. Currently there are no tests using array.array and for any size bigger than u8 I don't believe it's correct. For example an array.array of 32-bit integers doesn't correctly copy the size of the array into memory. Furthermore endianness is a concern with array.array where I don't think that larger than 1-byte values should be supported.

you are totally right, should I check for value.itemsize==1 and throw ValueError or remove array.array

alexcrichton · 2023-03-14T14:24:53Z

wasmtime/_memory.py

+            raise IndexError("index out of range")
+        # value must be bytearray ex. cast bytes() to bytearray
+        if not isinstance(value, array.array) and not isinstance(value, bytearray):
+            # value = array.array('B', value)


I think this line can be removed?

alexcrichton · 2023-03-14T14:25:45Z

wasmtime/_memory.py

+        # value must be bytearray ex. cast bytes() to bytearray
+        if not isinstance(value, array.array) and not isinstance(value, bytearray):
+            # value = array.array('B', value)
+            value = bytearray(value)


Is bytearray the best choice of conversion here? If the input is a bytes, for example, wouldn't this create a temporary copy before then copying into the linear memory?

yes, bytes does not work because src_ptr = ptr_type.from_buffer(value) would give

TypeError: underlying buffer is not writable

muayyad-alsadi · 2023-03-14T15:07:29Z

I would love to include thewtex suggestion

def get_buffer_ptr(self, store):
    """
    buffer suitable for creating zero-copy writable NumPy Buffer Protocol

    np_mem = np.frombuffer(memory.get_buffer_ptr(store), dtype=np.uint8)
    np_mem[start:end] = A
    B = np_mem[start:end]
    """
    ptr_type = ctypes.c_ubyte * (self.data_len(store))
    return ptr_type.from_address(ctypes.addressof(self.data_ptr(store).contents))

…ction and document it

alexcrichton · 2023-03-16T14:38:56Z

Thanks! This looks good so I'll merge this. I think that method might make a good follow-up PR if you're interested.

muayyad-alsadi added 3 commits March 5, 2023 12:25

FIXES bytecodealliance#81: 400x faster slice assignment

f7c3110

FIXES bytecodealliance#81: provide faster get too

6da7389

FIXES bytecodealliance#81: use slice.indices

f02e2d3

alexcrichton reviewed Mar 6, 2023

View reviewed changes

wasmtime/_memory.py Outdated Show resolved Hide resolved

wasmtime/_memory.py Outdated Show resolved Hide resolved

wasmtime/_memory.py Outdated Show resolved Hide resolved

wasmtime/_memory.py Outdated Show resolved Hide resolved

muayyad-alsadi added 2 commits March 7, 2023 12:26

rely on slice and add ptr_type once

b3cf058

add tests

f13e159

muayyad-alsadi added 11 commits March 7, 2023 12:46

add tests

2f3aee6

add tests

017d837

add tests

4cd3b72

add tests

46677f6

add more tests

1f8e921

remove ctypes.memmove from __getitem__

1afb8d4

fix test

18f560b

apply flake8

09d37bc

fix test

462efc3

fix test

68f76a8

fix test

9676cea

introduce memorySlicer

274d659

muayyad-alsadi added 2 commits March 7, 2023 21:09

introduce memorySlicer

45e77c0

introduce memorySlicer

4ebd300

muayyad-alsadi added 3 commits March 8, 2023 00:26

replace memorySlicer with read/write

ef85351

replace memorySlicer with read/write

7e2aec2

replace memorySlicer with read/write

e1393bc

alexcrichton reviewed Mar 8, 2023

View reviewed changes

wasmtime/_memory.py Outdated Show resolved Hide resolved

wasmtime/_memory.py Outdated Show resolved Hide resolved

wasmtime/_memory.py Show resolved Hide resolved

wasmtime/_memory.py Outdated Show resolved Hide resolved

j-blue-arz mentioned this pull request Mar 12, 2023

Setting a slice of memory without having to enumerate #81

Closed

muayyad-alsadi added 6 commits March 12, 2023 20:20

remove stop from write

42df875

remove stop from write

d101a70

remove stop from write

7ec6b05

remove stop from write

de30021

remove stop from write

bab36a9

remove stop from write

3b985ee

alexcrichton reviewed Mar 13, 2023

View reviewed changes

muayyad-alsadi added 2 commits March 13, 2023 20:12

types union, remove useless raise, remove () from pointer and add com…

9c15925

…ment

fix typing

86f4730

alexcrichton reviewed Mar 14, 2023

View reviewed changes

muayyad-alsadi added 8 commits March 15, 2023 10:55

write no longer accept array.array, make common code in src_ptr a fun…

a8e0153

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

5a0bf2b

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

9573435

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

86c30d6

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

a88fb01

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

89a6a14

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

25edd03

…ction and document it

write no longer accept array.array, make common code in src_ptr a fun…

1640947

…ction and document it

alexcrichton merged commit e655f68 into bytecodealliance:main Mar 16, 2023

muayyad-alsadi mentioned this pull request Mar 21, 2023

optimize time wasted in passing argument and fetching return #137

Open

muayyad-alsadi mentioned this pull request Apr 5, 2023

Performance questions #96

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIXES #81: 400x faster slice assignment #134

FIXES #81: 400x faster slice assignment #134

muayyad-alsadi commented Mar 5, 2023

alexcrichton left a comment

muayyad-alsadi commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

alexcrichton commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

alexcrichton commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023 •

edited

Loading

muayyad-alsadi commented Mar 8, 2023

alexcrichton commented Mar 9, 2023

muayyad-alsadi commented Mar 9, 2023

j-blue-arz commented Mar 12, 2023 •

edited

Loading

alexcrichton Mar 13, 2023

alexcrichton Mar 13, 2023

alexcrichton Mar 13, 2023

alexcrichton Mar 13, 2023

muayyad-alsadi commented Mar 13, 2023

alexcrichton Mar 14, 2023

muayyad-alsadi Mar 14, 2023

alexcrichton Mar 14, 2023

alexcrichton Mar 14, 2023

muayyad-alsadi Mar 14, 2023

muayyad-alsadi commented Mar 14, 2023

alexcrichton commented Mar 16, 2023

		if slice_size != val_size:
		raise IndexError(f"mismatched value size, value of size [{val_size}] with slice of size [{slice_size}]")

		src_ptr = (ptr_type).from_buffer(value)
		dst_ptr = (ptr_type).from_address(ctypes.addressof(data_ptr.contents) + start)

FIXES #81: 400x faster slice assignment #134

FIXES #81: 400x faster slice assignment #134

Conversation

muayyad-alsadi commented Mar 5, 2023

alexcrichton left a comment

Choose a reason for hiding this comment

muayyad-alsadi commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

alexcrichton commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023

alexcrichton commented Mar 7, 2023

muayyad-alsadi commented Mar 7, 2023 • edited Loading

muayyad-alsadi commented Mar 8, 2023

alexcrichton commented Mar 9, 2023

muayyad-alsadi commented Mar 9, 2023

j-blue-arz commented Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muayyad-alsadi commented Mar 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

muayyad-alsadi commented Mar 14, 2023

alexcrichton commented Mar 16, 2023

muayyad-alsadi commented Mar 7, 2023 •

edited

Loading

j-blue-arz commented Mar 12, 2023 •

edited

Loading