You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running large-payload insert queries (e.g., offline data import rate of 3GB/s), chproxy experiences severe performance bottlenecks, with a significant increase in CPU/Memory utilization, and even causing the online chproxy service down in some cases.
To Reproduce
In offline data import scenario, large amounts of data are batch-inserted into ClickHouse via chproxy. If each entry is approximately 100 bytes, constructing an INSERT SQL statement to insert 500,000 rows results in an overall size of around 50MB. Sending this SQL via chproxy's POST method results in the aforementioned issues (30 concurrent processes, each performing 100 insert operations).
CREATETABLEIF NOT EXISTS test_db.test_tb
(
`id` UInt32,
`partCode` UInt16,
`shopId` UInt64,
`processDate`Date,
`vTagType` LowCardinality(String),
`shop_first_order_time` SimpleAggregateFunction(anyLast, Date),
`shop_latest_order_time` SimpleAggregateFunction(anyLast, Date),
`pct` SimpleAggregateFunction(anyLast, Float64),
`valid_ord_montary` SimpleAggregateFunction(anyLast, Float64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY vTagType
ORDER BY (shopId, partCode, id);
#!/bin/bash
output_file="insert_data.sql"
table_name="test_db.test_tb"
batch_size=10000
total_records=500000
echo -n "INSERT INTO $table_name VALUES">$output_file# generate data and write to filefor((batch=0; batch<total_records/batch_size; batch++));do
values=""for((i=1; i<=batch_size; i++));do
id=$((batch * batch_size + i))
partCode=$((RANDOM %65536))
shopId=$((RANDOM %1000000000000))
processDate=$(date -d "$((RANDOM %365)) days ago" +"%Y-%m-%d")
vTagType="type_$((RANDOM %10))"
shop_first_order_time=$(date -d "$((RANDOM %365)) days ago" +"%Y-%m-%d")
shop_latest_order_time=$(date -d "$((RANDOM %365)) days ago" +"%Y-%m-%d")
pct=$(awk -v min=0 -v max=100 'BEGIN{srand(); print min+rand()*(max-min)}')
valid_ord_montary=$(awk -v min=0 -v max=1000 'BEGIN{srand(); print min+rand()*(max-min)}')
values+="($id, $partCode, $shopId, '$processDate', '$vTagType', '$shop_first_order_time', '$shop_latest_order_time', $pct, $valid_ord_montary),"done# remove the last comma
values=${values%,}# append to fileif(( batch == (total_records/batch_size -1) ));thenecho -n "$values;">>$output_file# End the last batch with a semicolonelseecho -n "$values,">>$output_file# End other batches with a commafiecho"Batch $((batch +1)) of $((total_records / batch_size)) written to $output_file"doneecho"All insert statements have been written to $output_file"
Expected behavior
In v1.15.0 and earlier versions, the same stress testing did not present similar problems, and better performance with low CPU and Memory utilization.
Environment information
ClickHouse Version: 24.3.13.40, 2 shards and 2 replicas per shard, total 4 nodes (16C48G).
chproxy Version: 1.26.4, total 1 node (4C16G).
The text was updated successfully, but these errors were encountered:
The flame graph indicates that the functions readAndRestoreRequestBody and getFullQueryFromBody are heavily involved in I/O operations. I believe these two functions are causing unnecessary performance degradation, even though retry and throughput limits are not set in the configuration (#296, #299).
Describe the bug
When running large-payload insert queries (e.g., offline data import rate of 3GB/s), chproxy experiences severe performance bottlenecks, with a significant increase in CPU/Memory utilization, and even causing the online chproxy service down in some cases.
To Reproduce
In offline data import scenario, large amounts of data are batch-inserted into ClickHouse via chproxy. If each entry is approximately 100 bytes, constructing an INSERT SQL statement to insert 500,000 rows results in an overall size of around 50MB. Sending this SQL via chproxy's POST method results in the aforementioned issues (30 concurrent processes, each performing 100 insert operations).
Expected behavior
In v1.15.0 and earlier versions, the same stress testing did not present similar problems, and better performance with low CPU and Memory utilization.
Environment information
The text was updated successfully, but these errors were encountered: