More buffers #39

datdenkikniet · 2023-08-02T15:47:06Z

Add some write buffering to the Rust version to significantly speed up write speeds, and an option to count the cubes in a file when converting stream-oriented files.

Also add a "prefill" option to the PCube header so that we can write a fixed-size header.

datdenkikniet · 2023-08-04T17:54:41Z

@NailLegProcessorDivide if you could take a look at this I'd appreciate it :) It makes some changes that are important for calculating larger sizes of N.

rust/src/pcube/mod.rs

NailLegProcessorDivide · 2023-08-04T20:51:08Z

is this mostly for getting ready to stream to files? I dont quite understand the LEB128 thing.

datdenkikniet · 2023-08-04T21:27:41Z

The leb128 change makes it so that we can write a header of a predetermined length (by default, leb128 makes it so you can write smaller numbers in a fewer bytes. In this case, we already know the theoretical limit (u64), and saving 9 bytes on a file of 600 GB is pointless).

Now we can write a stream of cubes to the file while keeping a running count, and set it to the correct value at the end. Makes it so we don't have to re-write an entire file once we know the length (there is, AFAIK, no way to "shift a file to the right" without rewriting all data following it). Not having to re-write and/or re-read the entire file if we're well aware that we will eventually know how many cubes we've written is nice.

To illustrate:
Before:

Write header of length 1 (to indicate length 0) + N (other bytes) and 10123014 cubes
Rewrite entire file just to write a header of length M (leb128 encoding of 10123014) + N

Now:

Write header of length 10 (zero, but padded) + N (other bytes) and 10123014 cubes
Rewrite header to update length field (still 10 bytes, possibly padded).

I'm not sure why the format uses leb128 (just a u128 or u64 would have avoided having to do it this messily), but 🤷

This could technically mess up implementations that don't have the read function implemented (correctly!) as we have it here, but there is nothing about leb128 encoding that says you can't do this (i.e. encoding a number with a fixed amount of bytes).

Main motivation is that I don't want to wait 2.5 hours just to see if the file I wrote for N = 16 thinks it contains 59795121480 polycubes.

It's especially relevant when generating new cubesets, since the whole point is that you can't keep them all in memory and there is no way to know how many there will be in advance.

datdenkikniet · 2023-08-04T21:59:53Z

Oh and the rest is for perf improvements (unbuffered writes to disk are slow) and bug fixes.

NailLegProcessorDivide · 2023-08-05T11:30:44Z

ok yep, I hadnt realised the length field was variable size, that is a choice given the file is unstoreable even close to a u64 of polycubes.

Rest of the changes look good to me.

having to re-read the whole file Put this in a const Just get rid of this limit, we can handle it without

datdenkikniet · 2023-08-05T17:12:37Z

@bertie2 I think this is ready for merge :) I have rebased the commits to clean up the history a bit (and would prefer a non-squash this time around).

I have also verified that the funky trick (ref) for getting a fixed-size header works fine with the leb128 python library, so I hope you're OK with it.

bertie2 · 2023-08-08T21:54:31Z

apologies for the delay in merging this partly life has gotten in the way, partly I have become fixated on methods for distributing the work.
all looks good, and runs a good bit faster on my machine.
also yes the files still seems to be compatible, I guess the python library handles extraneous blocks of zeros in case another implementation uses them.
hope to get back to you soon with the V2 .pcube format asap as I know the current limitations are proving annoying.

datdenkikniet · 2023-08-08T22:02:05Z

That sounds cool! Would like to see that too, the way the DFS algorithm works seems to support distrubution well :)

Also no worries about delays, things happen I will gladly take any response, especially as quickly as 3 days! Thanks for merging.

datdenkikniet force-pushed the more-buffers branch from 8f17f8a to cc4c01e Compare August 4, 2023 17:54

NailLegProcessorDivide reviewed Aug 4, 2023

View reviewed changes

rust/src/pcube/mod.rs Outdated Show resolved Hide resolved

datdenkikniet force-pushed the more-buffers branch 2 times, most recently from 8e7e38a to 74bbeac Compare August 4, 2023 20:29

datdenkikniet force-pushed the more-buffers branch from ec6ba00 to 0d20878 Compare August 5, 2023 17:02

datdenkikniet added 4 commits August 5, 2023 19:05

also buffer write output, with same size as for read buffer

ce06e53

Add counting option to converting, so that all cubes are counted

d43d138

Fix bar & progress for validate

e6fe238

(ab)use LEB128 for fixed-width header so we can write the count without

eeb6488

having to re-read the whole file Put this in a const Just get rid of this limit, we can handle it without

datdenkikniet force-pushed the more-buffers branch from 0d20878 to eeb6488 Compare August 5, 2023 17:06

datdenkikniet force-pushed the more-buffers branch from 600aade to a9f9e0c Compare August 5, 2023 21:37

Converting now longer needs a counting option, as it is always done

a37c83c

datdenkikniet force-pushed the more-buffers branch from a9f9e0c to a37c83c Compare August 5, 2023 22:11

bertie2 merged commit 7e05d7a into mikepound:main Aug 8, 2023

datdenkikniet deleted the more-buffers branch August 8, 2023 22:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More buffers #39

More buffers #39

datdenkikniet commented Aug 2, 2023 •

edited

Loading

datdenkikniet commented Aug 4, 2023

NailLegProcessorDivide commented Aug 4, 2023

datdenkikniet commented Aug 4, 2023 •

edited

Loading

datdenkikniet commented Aug 4, 2023

NailLegProcessorDivide commented Aug 5, 2023

datdenkikniet commented Aug 5, 2023 •

edited

Loading

bertie2 commented Aug 8, 2023

datdenkikniet commented Aug 8, 2023 •

edited

Loading

More buffers #39

More buffers #39

Conversation

datdenkikniet commented Aug 2, 2023 • edited Loading

datdenkikniet commented Aug 4, 2023

NailLegProcessorDivide commented Aug 4, 2023

datdenkikniet commented Aug 4, 2023 • edited Loading

datdenkikniet commented Aug 4, 2023

NailLegProcessorDivide commented Aug 5, 2023

datdenkikniet commented Aug 5, 2023 • edited Loading

bertie2 commented Aug 8, 2023

datdenkikniet commented Aug 8, 2023 • edited Loading

datdenkikniet commented Aug 2, 2023 •

edited

Loading

datdenkikniet commented Aug 4, 2023 •

edited

Loading

datdenkikniet commented Aug 5, 2023 •

edited

Loading

datdenkikniet commented Aug 8, 2023 •

edited

Loading