-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"acquire conn timeout" issue #1229
Comments
Did you try increasing: clickhouse-go/clickhouse_options.go Line 136 in 596ddb2
Idle connections are not really related here. It seems your app starves connection poll and does not recover from it. Could you please share your code snippet? |
@jkaflik Hi, actually before I go further in discussing the problem, I want to test one fix. I have a suspicion that it happens because of not calling batch.Abort() in case of failure. |
@jkaflik I figured that adding Abort wouldn't change matters at all, because for the last 2 days I had no failed batch, but I still have this "acquire conn timeout" error in logs. Increasing DialTimeout has no value, since I think that 30s is way more than enough to get available connection. I could also decrease overall connection lifetime to seconds, but query performance may get worse because of a need to initiate connections more often. |
All I did was set MaxIdle/OpenConnections to 100 and 105, respectively. However, considering read/write patterns described above, 20/25 should've been a suitable option. |
I decided to compare clickhouse-go and pgx poolers by writing the following program. In pgx, I set the MaxConns value to 5, and it performs well even when there're 1000, 2000, 5000 concurrent queries, whereas clickhouse-go can't handle such numbers. |
Do you mean you get "acquire conn timeout"? |
@jkaflik exactly |
Based on your code snippet, I wrote the following test case: package issues
import (
"context"
"log"
"sync"
"testing"
"github.com/ClickHouse/clickhouse-go/v2"
clickhouse_tests "github.com/ClickHouse/clickhouse-go/v2/tests"
"github.com/google/uuid"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func Test1229(t *testing.T) {
te, err := GetIssuesTestEnvironment()
require.NoError(t, err)
opts := clickhouse_tests.ClientOptionsFromEnv(te, clickhouse.Settings{})
opts.MaxIdleConns = 5
opts.MaxOpenConns = 5
conn, err := clickhouse_tests.GetConnectionWithOptions(&opts)
ctx := context.Background()
require.NoError(t, err)
const ddl = "CREATE TABLE IF NOT EXISTS test_1223 (a String, b String, c Int) Engine = Memory"
require.NoError(t, conn.Exec(ctx, ddl))
defer func() {
conn.Exec(ctx, "DROP TABLE IF EXISTS test_1223")
}()
require.NoError(t, conn.Ping(ctx))
wg := new(sync.WaitGroup)
for i := 0; i < 1000; i++ {
wg.Add(1)
go func(i int) {
defer wg.Done()
if err := doInsert(ctx, conn); err != nil {
log.Println("error: ", err.Error())
return
}
log.Println(i, "inserted")
}(i)
}
wg.Wait()
assert.Equal(t, 0, conn.Stats().Open)
assert.Equal(t, 5, conn.Stats().Idle)
}
func doInsert(ctx context.Context, conn clickhouse.Conn) error {
query := "INSERT INTO test_1223 VALUES ($1, $2, $3)"
return conn.Exec(ctx, query, uuid.New().String(), "name", 23)
} It runs in ~2-3 seconds against the latest ClickHouse version running inside the virtualized container without failure. What client version and ClickHouse server version do you use? |
@jkaflik |
Is there anything in later versions that optimizes overall pooler performance? |
@threadedstream could you post how INSERTs and SELECTs are executed in your app? |
@jkaflik sure, here's my INSERT withTimeoutCtx, cancel := context.WithTimeout(ctx, writeTimeout)
defer cancel()
batch, err := pool.PrepareBatch(withTimeoutCtx, query)
if err != nil {
return err
}
for i := 0; i < len(buf); i += fieldsNum {
if err = batch.Append(buf[i : i+fieldsNum]...); err != nil {
return err
}
}
return batch.Send() Here's a SELECT query condition, queryArgs := s.prepareFindCondition(filter)
if len(queryArgs) == 0 {
return nil, errors.New("no query args")
}
query := "SELECT ... WHERE" + condition
withTimeoutCtx, cancel := context.WithTimeout(ctx, readTimeout)
defer cancel()
rows, err := pool.Query(withTimeoutCtx, query, queryArgs...)
for rows.Next() {
......
}
return result, nil |
again, I don't think that not calling batch.Abort is the issue, as I had no troubles with batches for the last 2 days, so connection should be released, i.e returned to idle pool upon batch.Send. |
@jkaflik could you please tell me how can I debug open connections? That is, find out what each connection is busy with at the moment. I assume it's not possible on client-side, maybe server-side may help. |
This issue is a little different. I'll be able to describe it, but won't be able to outline steps to reproduce it.
Describe the bug
In my app, I've set max open/idle connections to 25 and 20 respectively. In the beginning I had a problem that every nth query ended up returning error "acquire conn timeout: ...". I tried to fix it by setting max conn lifetime to 5s and passing context with timeout. Nevertheless I still see the same problem, less often than it used to be fortunately, but it still exists, and bothers me a little.
Expected behaviour
I expected to see no problem with settings above.
Side notes
Inserts happen in batches once in every 20s. Reads are rare.
The text was updated successfully, but these errors were encountered: