encodeVarint64 performance improvement preallocate ArrayBuffer #31

jamesdbrock · 2021-06-24T00:42:53Z

In encodeVarint64, instead of creating a separate ArrayBuffer for each byte, we could preallocate an ArrayBuffer with the right number of bytes for the entire varint by examining its value. And the same for encodeVarint32.

https://github.com/xc-jp/purescript-protobuf/blob/b4ddb67b72b4a3db8b5e5f6ec046182a802e18aa/src/Protobuf/Encode.purs#L311-L321

The text was updated successfully, but these errors were encountered:

jamesdbrock · 2021-06-24T00:56:50Z

case n of
  _ | n < 128 -> do
    Builder.putUint8 n
  _ | n < 16384 -> do
    buf <- ArrayBuffer.empty 2
    dv <- DataView.whole buf
    DataView.setInt8 dv 0 $ contGroup $ n `and` u0x7F
    DataView.setInt8 dv 1 $ (n `zsh` 7) `and` u0x7F
    Builder.putArrayBuffer buf
  _ | n < 2097152 -> do

jamesdbrock · 2021-06-24T01:22:14Z

Just for fun

What if we had a set of builder primitives which returned

The byte size required to write the primitive
A function which takes an offset to an ArrayBuffer and writes the primitive.

type Putter :: Tuple ByteSize (ByteOffset -> ArrayBuffer -> Effect Unit)
putInt8 :: Int -> Putter
putStringUTF8 :: String -> Putter
putFloat64 :: Number -> Putter

Then we could build an ArrayBuffer in two passes, without ever allocating any temporary ArrayBuffers:

Collect the Putters, add their sizes, allocate the final ArrayBuffer.
Call the writer function of each Putter to fill the ArrayBuffer.

Of course we would have to allocate all of these temporary Putters, which is certainly more expensive than allocating the temporary ArrayBuffers.

Maybe there is a way to carry the Putter in a monad and have two different ways to exec the monad, one to calculate ByteSize and one to write to an ArrayBuffer?

Or something like this monoidal append?

<> :: Putter -> Putter -> Putter
Tuple sl wl <> Tuple sr wr = Tuple (sl + sr) (\o ab -> wl o ab *> wr (o + sl) ab)

The thing is that, in cases like encoding a UTF8 string, if you want to know the required size of the encoded string, you have to walk the whole string and do the encoding in order to calculate the size. Won't it always be faster to just write the encoding to a temporary buffer while you're at it, and then copy that temporary buffer later?

I guess in the case of encoding a UTF8 string we could encode the temporary buffer and enclose it with the writer function which, when called, just memcpys the temporary buffer. I'm not saying that's a good idea. But it's a thing we could do.

jamesdbrock · 2021-06-24T01:54:49Z

Just for fun

What we want to do is read a ByteSize in context r and write c in context w, kind of like this:

https://hackage.haskell.org/package/codec-0.2.1/docs/Control-Monad-Codec.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encodeVarint64 performance improvement preallocate ArrayBuffer #31

encodeVarint64 performance improvement preallocate ArrayBuffer #31

jamesdbrock commented Jun 24, 2021

jamesdbrock commented Jun 24, 2021

jamesdbrock commented Jun 24, 2021 •

edited

Loading

jamesdbrock commented Jun 24, 2021 •

edited

Loading

encodeVarint64 performance improvement preallocate ArrayBuffer #31

encodeVarint64 performance improvement preallocate ArrayBuffer #31

Comments

jamesdbrock commented Jun 24, 2021

jamesdbrock commented Jun 24, 2021

jamesdbrock commented Jun 24, 2021 • edited Loading

Just for fun

jamesdbrock commented Jun 24, 2021 • edited Loading

Just for fun

jamesdbrock commented Jun 24, 2021 •

edited

Loading

jamesdbrock commented Jun 24, 2021 •

edited

Loading