Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] High CPU/Memory usage for large-payload insert queries #499

Open
zghong opened this issue Dec 25, 2024 · 2 comments · May be fixed by #500
Open

[BUG] High CPU/Memory usage for large-payload insert queries #499

zghong opened this issue Dec 25, 2024 · 2 comments · May be fixed by #500

Comments

@zghong
Copy link

zghong commented Dec 25, 2024

Describe the bug
When running large-payload insert queries (e.g., offline data import rate of 3GB/s), chproxy experiences severe performance bottlenecks, with a significant increase in CPU/Memory utilization, and even causing the online chproxy service down in some cases.

image
image
image

To Reproduce
In offline data import scenario, large amounts of data are batch-inserted into ClickHouse via chproxy. If each entry is approximately 100 bytes, constructing an INSERT SQL statement to insert 500,000 rows results in an overall size of around 50MB. Sending this SQL via chproxy's POST method results in the aforementioned issues (30 concurrent processes, each performing 100 insert operations).

CREATE TABLE IF NOT EXISTS test_db.test_tb
(
    `id` UInt32,
    `partCode` UInt16,
    `shopId` UInt64,
    `processDate` Date,
    `vTagType` LowCardinality(String),
    `shop_first_order_time` SimpleAggregateFunction(anyLast, Date),
    `shop_latest_order_time` SimpleAggregateFunction(anyLast, Date),
    `pct` SimpleAggregateFunction(anyLast, Float64),
    `valid_ord_montary` SimpleAggregateFunction(anyLast, Float64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY vTagType
ORDER BY (shopId, partCode, id);
#!/bin/bash

output_file="insert_data.sql"
table_name="test_db.test_tb"
batch_size=10000
total_records=500000

echo -n "INSERT INTO $table_name VALUES" > $output_file
# generate data and write to file
for ((batch=0; batch<total_records/batch_size; batch++)); do
    values=""
    for ((i=1; i<=batch_size; i++)); do
        id=$((batch * batch_size + i))
        partCode=$((RANDOM % 65536))
        shopId=$((RANDOM % 1000000000000))
        processDate=$(date -d "$((RANDOM % 365)) days ago" +"%Y-%m-%d")
        vTagType="type_$((RANDOM % 10))"
        shop_first_order_time=$(date -d "$((RANDOM % 365)) days ago" +"%Y-%m-%d")
        shop_latest_order_time=$(date -d "$((RANDOM % 365)) days ago" +"%Y-%m-%d")
        pct=$(awk -v min=0 -v max=100 'BEGIN{srand(); print min+rand()*(max-min)}')
        valid_ord_montary=$(awk -v min=0 -v max=1000 'BEGIN{srand(); print min+rand()*(max-min)}')

        values+="($id, $partCode, $shopId, '$processDate', '$vTagType', '$shop_first_order_time', '$shop_latest_order_time', $pct, $valid_ord_montary),"
    done

    # remove the last comma
    values=${values%,}
    # append to file
    if (( batch == (total_records/batch_size - 1) )); then
        echo -n "$values;" >> $output_file  # End the last batch with a semicolon
    else
        echo -n "$values," >> $output_file  # End other batches with a comma
    fi

    echo "Batch $((batch + 1)) of $((total_records / batch_size)) written to $output_file"
done

echo "All insert statements have been written to $output_file"

Expected behavior
In v1.15.0 and earlier versions, the same stress testing did not present similar problems, and better performance with low CPU and Memory utilization.
image
image
image

Environment information

  • ClickHouse Version: 24.3.13.40, 2 shards and 2 replicas per shard, total 4 nodes (16C48G).
  • chproxy Version: 1.26.4, total 1 node (4C16G).
@zghong
Copy link
Author

zghong commented Dec 25, 2024

The flame graph indicates that the functions readAndRestoreRequestBody and getFullQueryFromBody are heavily involved in I/O operations. I believe these two functions are causing unnecessary performance degradation, even though retry and throughput limits are not set in the configuration (#296, #299).

image

@zghong
Copy link
Author

zghong commented Dec 25, 2024

Looks like this is a duplicate of #428

@zghong zghong linked a pull request Dec 25, 2024 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

1 participant