Overhead of signed types #1

iiSeymour · 2019-05-05T22:25:31Z

Encoding/decoding of int16/int32 arrays has a large overhead (~75%) as zigzag encoding is done in Python. No low hanging optimization left to be had from Python as the implementation is already using numpy ufuncs.

Total time: 0.052461 s
Function: encode at line 80

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    80                                               @wraps(c_func)
    81                                               def encode(data, prev=0):
    82                                           
    83         1         40.0     40.0      0.1          if np.issubdtype(data.dtype, np.signedinteger):
    84         1      10796.0  10796.0     20.6              diffs = np.ediff1d(data, to_begin=data[0])
    85         1          9.0      9.0      0.0              shift = data.dtype.itemsize * 8 - 1
    86         1      26187.0  26187.0     49.9              data = to_zig_zag(diffs, np.int32(shift))
    87                                           
    88         1         39.0     39.0      0.1          if np.issubdtype(data.dtype, np.uint16):
    89                                                       data = data.astype(np.uint32)
    90                                           
    91         1         42.0     42.0      0.1          output = np.zeros(max_compressed_bytes(len(data)), dtype=np.uint8)
    92         1          1.0      1.0      0.0          encoded_size = c_func(
    93         1        230.0    230.0      0.4              data.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32)),
    94         1          1.0      1.0      0.0              len(data),
    95         1         41.0     41.0      0.1              output.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)),
    96         1      15066.0  15066.0     28.7              prev
    97                                                   )
    98         1          9.0      9.0      0.0          return output[:encoded_size]

Total time: 0.083777 s
Function: decode at line 111

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   111                                               @wraps(c_func)
   112                                               def decode(data, n, prev=0, dtype=None):
   113                                           
   114         1         31.0     31.0      0.0          output = np.zeros(n, dtype=np.uint32)
   115         1          1.0      1.0      0.0          c_func(
   116         1        105.0    105.0      0.1              data.ctypes.data_as(ctypes.POINTER(ctypes.c_uint8)),
   117         1         31.0     31.0      0.0              output.ctypes.data_as(ctypes.POINTER(ctypes.c_uint32)),
   118         1          0.0      0.0      0.0              n,
   119         1      20428.0  20428.0     24.4              prev,
   120                                                   )
   121                                           
   122         1         51.0     51.0      0.1          if dtype and np.issubdtype(dtype, np.signedinteger):
   123         1      20575.0  20575.0     24.6              zigzag = from_zig_zag(output)
   124         1      42553.0  42553.0     50.8              output = np.cumsum(zigzag, dtype=dtype)
   125                                                   elif dtype and output.dtype != dtype:
   126                                                       return output.astype(dtype)
   127         1          2.0      2.0      0.0          return output

Maybe @lemire already has an efficient int16, int32 -> uint32 zigzag implementation and/or is interested in supporting signed typed in streamvbyte natively?

The text was updated successfully, but these errors were encountered:

lemire · 2019-05-06T16:13:42Z

Yes, maybe you could start by creating an issue upstream? This seems like something we want to support.

iiSeymour · 2019-05-15T16:33:45Z

Note: L3 cache size 25MB.

lemire · 2019-05-15T16:49:33Z

It is an interesting plot which you may want to reproduce upstream to motivate further work.

iiSeymour added the help wanted Extra attention is needed label May 6, 2019

iiSeymour mentioned this issue May 6, 2019

Support for Signed Types Integrated in the CODECs fast-pack/streamvbyte#28

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhead of signed types #1

Overhead of signed types #1

iiSeymour commented May 5, 2019 •

edited

Loading

lemire commented May 6, 2019

iiSeymour commented May 15, 2019

lemire commented May 15, 2019

Overhead of signed types #1

Overhead of signed types #1

Comments

iiSeymour commented May 5, 2019 • edited Loading

lemire commented May 6, 2019

iiSeymour commented May 15, 2019

lemire commented May 15, 2019

iiSeymour commented May 5, 2019 •

edited

Loading