Make FST BytesStore grow smoothly #12619

gf2121 · 2023-10-04T06:53:36Z

Description

Too bad we don't have a writer that uses tiny (like 8 bytes) block at first, but doubles size for each new block (16 bytes, 32 bytes next, etc.). Then we would naturally use log(size) number of blocks without over-allocating.

But then reading bytes is a bit tricky because we'd need to take discrete log (base 2) of the address. Maybe it wouldn't be so bad -- we could do this with Long.numberOfLeadingZeros maybe? But that's a bigger change ... we can do this separately/later.

From #12604 (comment)

mikemccand · 2023-11-14T13:48:36Z

Note that oal.store.ByteBuffersDataOutput takes a different and neat approach to gracefully growing: it picks an initial block size (1024 bytes by default), and appends new blocks as you write bytes, but then if it reaches 100 blocks, it "resizes" itself by doubling the block size and copying over, so that now you have 50 blocks.

So it's still O(N) amortized cost of that doubling/copying with time, and at any given moment you will not be wasting too many %tg of the bytes you've written, except at the start 1 KB block size.

dungba88 · 2023-11-15T00:35:17Z

In #12624, I moved the main FST body out of BytesStore into ByteBuffersDataOutput, and BytesStore becomes only a single byte[] for the currently written node so maybe we don't need to do this?

gf2121 added the type:enhancement label Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make FST BytesStore grow smoothly #12619

Make FST BytesStore grow smoothly #12619

gf2121 commented Oct 4, 2023

mikemccand commented Nov 14, 2023 •

edited

Loading

dungba88 commented Nov 15, 2023

Make FST BytesStore grow smoothly #12619

Make FST BytesStore grow smoothly #12619

Comments

gf2121 commented Oct 4, 2023

Description

mikemccand commented Nov 14, 2023 • edited Loading

dungba88 commented Nov 15, 2023

mikemccand commented Nov 14, 2023 •

edited

Loading