Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"acquire conn timeout" issue #1229

Closed
threadedstream opened this issue Mar 7, 2024 · 14 comments · Fixed by #1237
Closed

"acquire conn timeout" issue #1229

threadedstream opened this issue Mar 7, 2024 · 14 comments · Fixed by #1237
Labels

Comments

@threadedstream
Copy link
Contributor

This issue is a little different. I'll be able to describe it, but won't be able to outline steps to reproduce it.

Describe the bug

In my app, I've set max open/idle connections to 25 and 20 respectively. In the beginning I had a problem that every nth query ended up returning error "acquire conn timeout: ...". I tried to fix it by setting max conn lifetime to 5s and passing context with timeout. Nevertheless I still see the same problem, less often than it used to be fortunately, but it still exists, and bothers me a little.

Expected behaviour

I expected to see no problem with settings above.

Side notes

Inserts happen in batches once in every 20s. Reads are rare.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 8, 2024

Hi @threadedstream

Did you try increasing:

DialTimeout time.Duration // default 30 second

Idle connections are not really related here. It seems your app starves connection poll and does not recover from it. Could you please share your code snippet?

@jkaflik jkaflik added question and removed bug labels Mar 8, 2024
@threadedstream
Copy link
Contributor Author

@jkaflik Hi, actually before I go further in discussing the problem, I want to test one fix. I have a suspicion that it happens because of not calling batch.Abort() in case of failure.

@threadedstream
Copy link
Contributor Author

@jkaflik I figured that adding Abort wouldn't change matters at all, because for the last 2 days I had no failed batch, but I still have this "acquire conn timeout" error in logs. Increasing DialTimeout has no value, since I think that 30s is way more than enough to get available connection. I could also decrease overall connection lifetime to seconds, but query performance may get worse because of a need to initiate connections more often.

@threadedstream
Copy link
Contributor Author

All I did was set MaxIdle/OpenConnections to 100 and 105, respectively. However, considering read/write patterns described above, 20/25 should've been a suitable option.

@threadedstream
Copy link
Contributor Author

threadedstream commented Mar 13, 2024

I decided to compare clickhouse-go and pgx poolers by writing the following program. In pgx, I set the MaxConns value to 5, and it performs well even when there're 1000, 2000, 5000 concurrent queries, whereas clickhouse-go can't handle such numbers.
That's why I think that value 25 for MaxOpenConnections is enough (even more than that) to serve queries given read/write patterns above.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 14, 2024

@threadedstream

whereas clickhouse-go can't handle such numbers.

Do you mean you get "acquire conn timeout"?

@threadedstream
Copy link
Contributor Author

@jkaflik exactly

@jkaflik
Copy link
Contributor

jkaflik commented Mar 14, 2024

@threadedstream

Based on your code snippet, I wrote the following test case:

package issues

import (
	"context"
	"log"
	"sync"
	"testing"

	"github.com/ClickHouse/clickhouse-go/v2"
	clickhouse_tests "github.com/ClickHouse/clickhouse-go/v2/tests"
	"github.com/google/uuid"
	"github.com/stretchr/testify/assert"
	"github.com/stretchr/testify/require"
)

func Test1229(t *testing.T) {
	te, err := GetIssuesTestEnvironment()
	require.NoError(t, err)
	opts := clickhouse_tests.ClientOptionsFromEnv(te, clickhouse.Settings{})
	opts.MaxIdleConns = 5
	opts.MaxOpenConns = 5
	conn, err := clickhouse_tests.GetConnectionWithOptions(&opts)

	ctx := context.Background()
	require.NoError(t, err)
	const ddl = "CREATE TABLE IF NOT EXISTS test_1223 (a String, b String, c Int) Engine = Memory"
	require.NoError(t, conn.Exec(ctx, ddl))
	defer func() {
		conn.Exec(ctx, "DROP TABLE IF EXISTS test_1223")
	}()

	require.NoError(t, conn.Ping(ctx))

	wg := new(sync.WaitGroup)
	for i := 0; i < 1000; i++ {
		wg.Add(1)

		go func(i int) {
			defer wg.Done()
			if err := doInsert(ctx, conn); err != nil {
				log.Println("error: ", err.Error())
				return
			}

			log.Println(i, "inserted")
		}(i)
	}

	wg.Wait()

	assert.Equal(t, 0, conn.Stats().Open)
	assert.Equal(t, 5, conn.Stats().Idle)
}

func doInsert(ctx context.Context, conn clickhouse.Conn) error {
	query := "INSERT INTO test_1223 VALUES ($1, $2, $3)"
	return conn.Exec(ctx, query, uuid.New().String(), "name", 23)
}

It runs in ~2-3 seconds against the latest ClickHouse version running inside the virtualized container without failure.
Could you provide more details about your implementation? What is your most queries run? INSERT/SELECTs?
Please be aware ClickHouse INSERT performance depends on the block size and it's recommended to do a [bulk inserts](https://clickhouse.com/docs/en/cloud/bestpractices/bulk-inserts#:~:text=Generally%2C%20we%20recommend%20inserting%20data,inserts%20(see%20asynchronous%20inserts) or use async inserts.

What client version and ClickHouse server version do you use?

@threadedstream
Copy link
Contributor Author

@jkaflik
In my production code, INSERT's happen more often relative to SELECT's. I'm aware that Clickhouse is not optimized for small insertions that happen frequently, that's why I'm inserting data in batches once in every 20s.
I'm using clickhouse-go version 2.15.0 and Clickhouse server version 22.8.6.71

@threadedstream
Copy link
Contributor Author

Is there anything in later versions that optimizes overall pooler performance?

@jkaflik
Copy link
Contributor

jkaflik commented Mar 14, 2024

@threadedstream could you post how INSERTs and SELECTs are executed in your app?

@threadedstream
Copy link
Contributor Author

@jkaflik sure, here's my INSERT

	withTimeoutCtx, cancel := context.WithTimeout(ctx, writeTimeout)
	defer cancel()

	batch, err := pool.PrepareBatch(withTimeoutCtx, query)
	if err != nil {
		return err
	}

	for i := 0; i < len(buf); i += fieldsNum {
		if err = batch.Append(buf[i : i+fieldsNum]...); err != nil {
			return err
		}
	}

	return batch.Send()

Here's a SELECT query

	condition, queryArgs := s.prepareFindCondition(filter)

	if len(queryArgs) == 0 {
		return nil, errors.New("no query args")
	}

	query := "SELECT ... WHERE" + condition

	withTimeoutCtx, cancel := context.WithTimeout(ctx, readTimeout)
	defer cancel()

	rows, err := pool.Query(withTimeoutCtx, query, queryArgs...)

	for rows.Next() {
        ......
	}

	return result, nil

@threadedstream
Copy link
Contributor Author

again, I don't think that not calling batch.Abort is the issue, as I had no troubles with batches for the last 2 days, so connection should be released, i.e returned to idle pool upon batch.Send.

@threadedstream
Copy link
Contributor Author

@jkaflik could you please tell me how can I debug open connections? That is, find out what each connection is busy with at the moment. I assume it's not possible on client-side, maybe server-side may help.

jkaflik added a commit that referenced this issue Apr 15, 2024
jkaflik added a commit that referenced this issue Apr 15, 2024
jkaflik added a commit that referenced this issue Apr 15, 2024
* Ensure test container termination

* make #1229 more resilient

* make #1229 more resilent
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants