Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: Improved memory usage while downsampling #529

Merged

Commits on Feb 6, 2019

  1. compact: avoid memory leak while downsampling

    Add instant writer implementation to shrink memory consumption during the downsampling stage.
    Encoded chunks are written to chunks blob files right away after series was handled.
    Flush method closes chunk writer and sync all symbols, series, labels, posting and meta data to files.
    It still works in one thread, hence operates only on one core.
    
    Estimated memory consumption is unlikely more than 1Gb, but depends on data set, labels size and series' density: chunk data size (512MB) + encoded buffers + index data
    
    Fixes thanos-io#297
    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    1145585 View commit details
    Browse the repository at this point in the history
  2. compact: clarify purpose of streamed block writer

    Add comments and close resources properly.
    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    9467c03 View commit details
    Browse the repository at this point in the history
  3. downsample: fix postings index

    Use proper posting index to fetch series data with label set and chunks
    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    cbca7e4 View commit details
    Browse the repository at this point in the history
  4. Add stream writer an ability to write index data right during

    the downsampling process.
    
    One of the trade-offs is to preserve symbols from raw blocks, as we have to write them
    before preserving the series.
    
    Stream writer allows downsample a huge data blocks with no needs to keep
    all series in RAM, the only need it preserve label values and postings references.
    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    b695d13 View commit details
    Browse the repository at this point in the history
  5. fix nitpicks

    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    ff9c776 View commit details
    Browse the repository at this point in the history
  6. downsampling: simplify StreamedBlockWriter interface

    Reduce of use public Flush method to finalize index and meta files.
    In case of error, a caller has to remove block directory with a preserved
    garbage inside.
    
    Rid of use tmp directories and renaming, syncing the final block on disk
    before upload.
    xjewer committed Feb 6, 2019
    Configuration menu
    Copy the full SHA
    12ffbe3 View commit details
    Browse the repository at this point in the history