Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make FST BytesStore grow smoothly #12619

Open
gf2121 opened this issue Oct 4, 2023 · 2 comments
Open

Make FST BytesStore grow smoothly #12619

gf2121 opened this issue Oct 4, 2023 · 2 comments

Comments

@gf2121
Copy link
Contributor

gf2121 commented Oct 4, 2023

Description

Too bad we don't have a writer that uses tiny (like 8 bytes) block at first, but doubles size for each new block (16 bytes, 32 bytes next, etc.). Then we would naturally use log(size) number of blocks without over-allocating.

But then reading bytes is a bit tricky because we'd need to take discrete log (base 2) of the address. Maybe it wouldn't be so bad -- we could do this with Long.numberOfLeadingZeros maybe? But that's a bigger change ... we can do this separately/later.

From #12604 (comment)

@mikemccand
Copy link
Member

mikemccand commented Nov 14, 2023

Note that oal.store.ByteBuffersDataOutput takes a different and neat approach to gracefully growing: it picks an initial block size (1024 bytes by default), and appends new blocks as you write bytes, but then if it reaches 100 blocks, it "resizes" itself by doubling the block size and copying over, so that now you have 50 blocks.

So it's still O(N) amortized cost of that doubling/copying with time, and at any given moment you will not be wasting too many %tg of the bytes you've written, except at the start 1 KB block size.

@dungba88
Copy link
Contributor

In #12624, I moved the main FST body out of BytesStore into ByteBuffersDataOutput, and BytesStore becomes only a single byte[] for the currently written node so maybe we don't need to do this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants