"acquire conn timeout" issue #1229

threadedstream · 2024-03-07T16:43:00Z

This issue is a little different. I'll be able to describe it, but won't be able to outline steps to reproduce it.

Describe the bug

In my app, I've set max open/idle connections to 25 and 20 respectively. In the beginning I had a problem that every nth query ended up returning error "acquire conn timeout: ...". I tried to fix it by setting max conn lifetime to 5s and passing context with timeout. Nevertheless I still see the same problem, less often than it used to be fortunately, but it still exists, and bothers me a little.

Expected behaviour

I expected to see no problem with settings above.

Side notes

Inserts happen in batches once in every 20s. Reads are rare.

jkaflik · 2024-03-08T06:06:20Z

Hi @threadedstream

Did you try increasing:

clickhouse-go/clickhouse_options.go

Line 136 in 596ddb2

DialTimeout time.Duration // default 30 second

Idle connections are not really related here. It seems your app starves connection poll and does not recover from it. Could you please share your code snippet?

threadedstream · 2024-03-08T12:37:45Z

@jkaflik Hi, actually before I go further in discussing the problem, I want to test one fix. I have a suspicion that it happens because of not calling batch.Abort() in case of failure.

threadedstream · 2024-03-13T11:37:37Z

@jkaflik I figured that adding Abort wouldn't change matters at all, because for the last 2 days I had no failed batch, but I still have this "acquire conn timeout" error in logs. Increasing DialTimeout has no value, since I think that 30s is way more than enough to get available connection. I could also decrease overall connection lifetime to seconds, but query performance may get worse because of a need to initiate connections more often.

threadedstream · 2024-03-13T11:41:13Z

All I did was set MaxIdle/OpenConnections to 100 and 105, respectively. However, considering read/write patterns described above, 20/25 should've been a suitable option.

threadedstream · 2024-03-13T20:12:58Z

I decided to compare clickhouse-go and pgx poolers by writing the following program. In pgx, I set the MaxConns value to 5, and it performs well even when there're 1000, 2000, 5000 concurrent queries, whereas clickhouse-go can't handle such numbers.
That's why I think that value 25 for MaxOpenConnections is enough (even more than that) to serve queries given read/write patterns above.

jkaflik · 2024-03-14T05:22:32Z

@threadedstream

whereas clickhouse-go can't handle such numbers.

Do you mean you get "acquire conn timeout"?

threadedstream · 2024-03-14T07:10:51Z

@jkaflik exactly

jkaflik · 2024-03-14T08:29:42Z

@threadedstream

Based on your code snippet, I wrote the following test case:

package issues

import (
	"context"
	"log"
	"sync"
	"testing"

	"github.com/ClickHouse/clickhouse-go/v2"
	clickhouse_tests "github.com/ClickHouse/clickhouse-go/v2/tests"
	"github.com/google/uuid"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func Test1229(t *testing.T) {
	te, err := GetIssuesTestEnvironment()
	require.NoError(t, err)
	opts := clickhouse_tests.ClientOptionsFromEnv(te, clickhouse.Settings{})
	opts.MaxIdleConns = 5
	opts.MaxOpenConns = 5
	conn, err := clickhouse_tests.GetConnectionWithOptions(&opts)

	ctx := context.Background()
	require.NoError(t, err)
	const ddl = "CREATE TABLE IF NOT EXISTS test_1223 (a String, b String, c Int) Engine = Memory"
	require.NoError(t, conn.Exec(ctx, ddl))
	defer func() {
		conn.Exec(ctx, "DROP TABLE IF EXISTS test_1223")
	}()

	require.NoError(t, conn.Ping(ctx))

	wg := new(sync.WaitGroup)
	for i := 0; i < 1000; i++ {
		wg.Add(1)

		go func(i int) {
			defer wg.Done()
			if err := doInsert(ctx, conn); err != nil {
				log.Println("error: ", err.Error())
				return
			}

			log.Println(i, "inserted")
		}(i)
	}

	wg.Wait()

	assert.Equal(t, 0, conn.Stats().Open)
	assert.Equal(t, 5, conn.Stats().Idle)
}

func doInsert(ctx context.Context, conn clickhouse.Conn) error {
	query := "INSERT INTO test_1223 VALUES ($1, $2, $3)"
	return conn.Exec(ctx, query, uuid.New().String(), "name", 23)
}

It runs in ~2-3 seconds against the latest ClickHouse version running inside the virtualized container without failure.
Could you provide more details about your implementation? What is your most queries run? INSERT/SELECTs?
Please be aware ClickHouse INSERT performance depends on the block size and it's recommended to do a [bulk inserts](https://clickhouse.com/docs/en/cloud/bestpractices/bulk-inserts#:~:text=Generally%2C%20we%20recommend%20inserting%20data,inserts%20(see%20asynchronous%20inserts) or use async inserts.

What client version and ClickHouse server version do you use?

threadedstream · 2024-03-14T08:47:57Z

@jkaflik
In my production code, INSERT's happen more often relative to SELECT's. I'm aware that Clickhouse is not optimized for small insertions that happen frequently, that's why I'm inserting data in batches once in every 20s.
I'm using clickhouse-go version 2.15.0 and Clickhouse server version 22.8.6.71

threadedstream · 2024-03-14T08:56:07Z

Is there anything in later versions that optimizes overall pooler performance?

jkaflik · 2024-03-14T09:00:48Z

@threadedstream could you post how INSERTs and SELECTs are executed in your app?

threadedstream · 2024-03-14T09:37:50Z

@jkaflik sure, here's my INSERT

	withTimeoutCtx, cancel := context.WithTimeout(ctx, writeTimeout)
	defer cancel()

	batch, err := pool.PrepareBatch(withTimeoutCtx, query)
	if err != nil {
		return err
	}

	for i := 0; i < len(buf); i += fieldsNum {
		if err = batch.Append(buf[i : i+fieldsNum]...); err != nil {
			return err
		}
	}

	return batch.Send()

Here's a SELECT query

	condition, queryArgs := s.prepareFindCondition(filter)

	if len(queryArgs) == 0 {
		return nil, errors.New("no query args")
	}

	query := "SELECT ... WHERE" + condition

	withTimeoutCtx, cancel := context.WithTimeout(ctx, readTimeout)
	defer cancel()

	rows, err := pool.Query(withTimeoutCtx, query, queryArgs...)

	for rows.Next() {
        ......
	}

	return result, nil

threadedstream · 2024-03-14T09:40:47Z

again, I don't think that not calling batch.Abort is the issue, as I had no troubles with batches for the last 2 days, so connection should be released, i.e returned to idle pool upon batch.Send.

threadedstream · 2024-03-15T10:35:42Z

@jkaflik could you please tell me how can I debug open connections? That is, find out what each connection is busy with at the moment. I assume it's not possible on client-side, maybe server-side may help.

* Ensure test container termination * make #1229 more resilient * make #1229 more resilent

threadedstream added the bug label Mar 7, 2024

jkaflik added question and removed bug labels Mar 8, 2024

threadedstream mentioned this issue Mar 18, 2024

Make errors channel buffered inside query() #1237

Merged

jkaflik closed this as completed in #1237 Mar 18, 2024

threadedstream mentioned this issue Mar 18, 2024

Error channel deadlock fix test case #1239

Merged

jkaflik added a commit that referenced this issue Apr 15, 2024

make #1229 more resilient

0439543

jkaflik added a commit that referenced this issue Apr 15, 2024

make #1229 more resilent

7eee727

jkaflik added a commit that referenced this issue Apr 15, 2024

Ensure test container termination (#1274)

6fdbf7d

* Ensure test container termination * make #1229 more resilient * make #1229 more resilent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"acquire conn timeout" issue #1229

"acquire conn timeout" issue #1229

threadedstream commented Mar 7, 2024

jkaflik commented Mar 8, 2024

threadedstream commented Mar 8, 2024

threadedstream commented Mar 13, 2024

threadedstream commented Mar 13, 2024

threadedstream commented Mar 13, 2024 •

edited

Loading

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 14, 2024

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 15, 2024

"acquire conn timeout" issue #1229

"acquire conn timeout" issue #1229

Comments

threadedstream commented Mar 7, 2024

Describe the bug

Expected behaviour

Side notes

jkaflik commented Mar 8, 2024

threadedstream commented Mar 8, 2024

threadedstream commented Mar 13, 2024

threadedstream commented Mar 13, 2024

threadedstream commented Mar 13, 2024 • edited Loading

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 14, 2024

jkaflik commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 14, 2024

threadedstream commented Mar 15, 2024

threadedstream commented Mar 13, 2024 •

edited

Loading