-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
marshall module ideas #8
Comments
Problems would be: no differentiation between tuple and list, dict and OrderedDict. |
Also, no encoding of array with 8 bits of length, there's a jump from 4 bits to 16 bits (same for maps). |
There's also CBOR, and teh-drama between it and MsgPack: msgpack/msgpack#129 |
CBOR is used in CoAP, so kinda would be "more useful" than MsgPack... |
MsgPack has random gap in:
I.e., only short textual strs can be efficiently encoded, bytestr's require explicit len byte always. CBOR doesn't have that "limitation": https://tools.ietf.org/html/rfc7049#appendix-B (of course, it encodes something else less efficiently instead, as all MsgPack encoding bytes are used (well, one is reserved)). |
Note that motivation for marshall module is encoding data rows for btree database. I.e. the motivation is: "need to serialize tuples for btree db" -> "why not implement that by implementing marshall module which can be used for many other things too". That adds additional requirement: being able to efficiently compare serialized arrays (i.e. without requiring full decoding). |
CBOR defines encodings for bignums for example. Looks, like it's a winner. |
CBOR tags are rather extensible, they are looking to incorporate fixed point types and arrays for ADCs. |
Umm, no? 0xc0 through 0xc3 are None//False/True. CBOR also has gaps in it … A more relevant advantage of CBOR is that you can prefix an item with a rather simple "use the following data as input to Another advantage would be the ability to encode indeterminate-length data (this is basically impossible with msgpack), though I have no idea whether that is actually a relevant use case for micropython/pycopy. |
asan considers that memcmp(p, q, N) is permitted to access N bytes at each of p and q, even for values of p and q that have a difference earlier. Accessing additional values is frequently done in practice, reading 4 or more bytes from each input at a time for efficiency, so when completing "non_exist<TAB>" in the repl, this causes a diagnostic: ==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff READ of size 9 at 0x555555cd8dc8 thread T0 #0 0x7ffff726457a (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a) #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301 #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513 #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni #5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/ #6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m #7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308 #8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni 0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8 'import_str' is ascii string 'import ' Signed-off-by: Jeff Epler <jepler@gmail.com>
It seems that MsgPack is a viable choice to implement marshall encdoing: https://github.com/msgpack/msgpack/blob/master/spec.md
Possibly, an adhoc serialization format would be even more efficient, but at least MsgPack is able to differentiate bytes vs str's, etc.
The text was updated successfully, but these errors were encountered: