This repository has been archived by the owner on Sep 21, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 37
Potentially unbounded memory overhead using BodyChunkIpld
#498
Labels
Enhancement
New user-facing feature
Comments
This was referenced Aug 3, 2023
jsantell
added a commit
that referenced
this issue
Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
jsantell
added a commit
that referenced
this issue
Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 15, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 17, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
jsantell
added a commit
that referenced
this issue
Aug 18, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a temporary read/write space that does not persist. * Introduce streaming `BodyChunkIpld::encode` and `BodyChunkIpld::decode` methods. * Streaming mechanisms store data in memory until a threshold is hit, which then stores to disk. * Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and `BodyChunkIpld::store_bytes` as deprecated. * Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`. * Promote `bytes` to a workspace dependency.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Basic usage of one of our most venerable constructs,
BodyChunkIpld
, will lead to unacceptable levels of memory overhead when reading large files. Currently, bothBodyChunkIpld::store_bytes
andBodyChunkIpld::load_all_bytes
allocate up to file-size memory when used.In the case of
BodyChunkIpld::store_bytes
: in order to efficiently encode/store bytes from a file asBodyChunkIpld
pages, we would need to read the file back-to-front. This introduces some complexity to how we compute chunk cut points (using a Rust implementation of FastCDC). We contributed a change to our dependency to enable async streaming support in the chunker, which gets us half-way to where we need to be. However, in order to make use it we will most likely have to feed bytes into the chunker in reverse (sinceBodyChunkIpld
pages have to be hashed back to front).In the case of
BodyChunkIpld::load_all_bytes
, the solution is simpler: we should deprecate the method and instead prefer a method that yields pages of bytes as an async stream (e.g.,BodyChunkIpld::stream
or something to that effect).The text was updated successfully, but these errors were encountered: