revisit memory fix for `fastmultigather` - use `into_par_iter` properly #482

ctb · 2024-10-17T13:55:54Z

this commit in #430 deallocates the duplicate memory used for against_collection here. Unfortunately, the RSS does briefly spike to 2x because of this. We should switch to using into_par_iter() per luiz. From slack:

luizirber you want into_par_iter and into_iter
13h
luizirber
par_iter and iter return references, into_* returns values
13h
luizirber

   pub fn into_par_iter(&self) -> impl IndexedParallelIterator<Item = (&Collection, Idx, &Record)> {
        // first create a Vec of all triples (Collection, Idx, Record)
        self
            .collections
            .into_par_iter() // CTB: are we loading things into memory here? No...
            .flat_map(|c| c.into_iter().map(move |(_idx, record)| (c, _idx, record)))
    }

13h
luizirber
or something close to that?

The text was updated successfully, but these errors were encountered:

ctb · 2024-11-04T13:26:28Z

After playing with things a bit, I'm pretty sure that the need to keep the Collection and specifically the ZipStorage around while loading things means that there's no practical way to do what I want here. More when I have time.

ctb changed the title ~~revisit memory fix for fastmultigather~~ revisit memory fix for fastmultigather - use into_par_iter properly Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit memory fix for `fastmultigather` - use `into_par_iter` properly #482

revisit memory fix for `fastmultigather` - use `into_par_iter` properly #482

ctb commented Oct 17, 2024

ctb commented Nov 4, 2024

revisit memory fix for fastmultigather - use into_par_iter properly #482

revisit memory fix for fastmultigather - use into_par_iter properly #482

Comments

ctb commented Oct 17, 2024

ctb commented Nov 4, 2024

revisit memory fix for `fastmultigather` - use `into_par_iter` properly #482

revisit memory fix for `fastmultigather` - use `into_par_iter` properly #482