Skip to content
This repository has been archived by the owner on Sep 21, 2024. It is now read-only.

Potentially unbounded memory overhead using BodyChunkIpld #498

Open
cdata opened this issue Jul 15, 2023 · 0 comments
Open

Potentially unbounded memory overhead using BodyChunkIpld #498

cdata opened this issue Jul 15, 2023 · 0 comments
Assignees
Labels
Enhancement New user-facing feature

Comments

@cdata
Copy link
Collaborator

cdata commented Jul 15, 2023

Basic usage of one of our most venerable constructs, BodyChunkIpld, will lead to unacceptable levels of memory overhead when reading large files. Currently, both BodyChunkIpld::store_bytes and BodyChunkIpld::load_all_bytes allocate up to file-size memory when used.

In the case of BodyChunkIpld::store_bytes: in order to efficiently encode/store bytes from a file as BodyChunkIpld pages, we would need to read the file back-to-front. This introduces some complexity to how we compute chunk cut points (using a Rust implementation of FastCDC). We contributed a change to our dependency to enable async streaming support in the chunker, which gets us half-way to where we need to be. However, in order to make use it we will most likely have to feed bytes into the chunker in reverse (since BodyChunkIpld pages have to be hashed back to front).

In the case of BodyChunkIpld::load_all_bytes, the solution is simpler: we should deprecate the method and instead prefer a method that yields pages of bytes as an async stream (e.g., BodyChunkIpld::stream or something to that effect).

@cdata cdata added the Enhancement New user-facing feature label Jul 15, 2023
jsantell added a commit that referenced this issue Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
jsantell added a commit that referenced this issue Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 10, 2023
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 14, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 15, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 16, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 17, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
jsantell added a commit that referenced this issue Aug 18, 2023
* Introduce a `Scratch` trait, for `Storage` providers to provide a
  temporary read/write space that does not persist.
* Introduce streaming `BodyChunkIpld::encode` and
  `BodyChunkIpld::decode` methods.
* Streaming mechanisms store data in memory until a threshold is hit,
  which then stores to disk.
* Mark non-streaming functions `BodyChunkIpld::load_all_bytes` and
  `BodyChunkIpld::store_bytes` as deprecated.
* Remove `BodyChunkDecoder` (now implemented as `BodyChunkIpld::decode`.
* Promote `bytes` to a workspace dependency.
@jsantell jsantell self-assigned this Oct 19, 2023
@jsantell jsantell moved this from 🏔️ Icebox to 🧑‍🌾 In Progress in Subconscious + Noosphere Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Enhancement New user-facing feature
Projects
Status: 🧑‍🌾 In Progress
Development

No branches or pull requests

2 participants