Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Connect to local BigQuery emulator #10

Closed
1 task done
waiq opened this issue Sep 19, 2022 · 9 comments · Fixed by #11
Closed
1 task done

[Proposal]: Connect to local BigQuery emulator #10

waiq opened this issue Sep 19, 2022 · 9 comments · Fixed by #11
Assignees
Labels
enhancement New feature or request

Comments

@waiq
Copy link
Contributor

waiq commented Sep 19, 2022

Contact Details

martin.olofsson@dentech.se

Summary of your proposal

To change the default connection url you can use the optionsFn parameters to the bigquery.NewClient.

Making the impact as minimum as possible and not changing the behavior used today. The option.ClientOption can be part of the method calls where the bigquery.NewClient is called.
"google.golang.org/api/option"

e.g:
func NewStreamer(ctx context.Context, projectID, dataSetID, tableID string, cfg *StreamerConfig, opts ...option.ClientOption) (*Streamer, error)

where then they can be passed on to (storage, batch and insertall) e.g:

client, err := storage.NewClient(
					projectID, dataSetID, tableID,
					encoder, protobufDescriptor,
					logger,
					opts..., 

and the pass it on to the google bigquery client:
writer, err := managedwriter.NewClient(ctx, projectID, opts...)

To use in test code:

bqWriter, err := bqwriter.NewStreamer(
		ctx, projectId, datasetId, tableId,
		&bqwriter.StreamerConfig{
			...
		},
		option.WithEndpoint("localhost:9050"),
		option.WithoutAuthentication(),
	)
	if err != nil {
		panic(err)
	}

Motivation for your proposal

The motivation for this change:
To use the 'https://github.com/goccy/bigquery-emulator' as part of the integration tests during development and deploy pipeline.

bigquery-emulator is a locally running biguery emulator, that easily can run during testing. Fort this to work though, local connection needs to be supported.

Alternatives for your proposal

The work around use today is a fork of the bqwriter master. There the OptionFn parameter is past to the used methods.

Alternatively think about if the bqwriter should have it's own OptionFn pattern to handle optional parameters sent in on create.

Version

0.4.1 (Latest)

What platform are you mostly using or planning to use our software on?

Linux

Code of Conduct

  • I agree to follow this project's Code of Conduct
@waiq waiq added the enhancement New feature or request label Sep 19, 2022
@andreas-lindfalk
Copy link
Contributor

+1

@andreas-lindfalk
Copy link
Contributor

andreas-lindfalk commented Oct 3, 2022

@GlenDC The code for this lives in a fork atm (https://github.com/waiq/bqwriter), you think you want to do this change?

@GlenDC
Copy link
Member

GlenDC commented Oct 3, 2022

@andreas-dentech I am fine with this change, please open a PR :)

Optionally and only if possible (for your time constrains) it would be cool if you can also add a working example or integration test. With the latter working as an integration test for bqwriter.

@andreas-lindfalk
Copy link
Contributor

@GlenDC I’m on it now, but I could use a tip about how to persist every single row straight away with as little delay as possible in my tests, so I can query the emulator and verify the data I just streamed to it. So to the question, are there anything else I could/should do in my tests besides setting WorkerCount, MaxBatchDelay and BatchSize to ”as low as possible”? Make it as little eventual consistent as possible in other words :)

@GlenDC
Copy link
Member

GlenDC commented Oct 4, 2022

I do not think your test should rely too much on exact numbers. Low numbers for those do sound reasonable. But besides that I would just rely on the usual Go primitives to know when you're finished or not with something.

The API is kept very simple on purpose and is mostly meant for stream inserts on high concurrency. But for our purposes we do not care too much that it is inserted directly, or only after a minute. As long as all requests are correctly handled and stored.

@GlenDC
Copy link
Member

GlenDC commented Oct 4, 2022

If you are really stuck, do let me know though with a WIP branch with comments where you are stuck and I'll gladly help you out with comments on the draft PR from your WIP branch.

@andreas-lindfalk
Copy link
Contributor

It's the integration test in our internal service I'm talking about, to get this to work in bqwriter is for later :)

And in this test, the service receives event(s) which is streams to BQ with the help of bqwriter (to a testcontainer running "ghcr.io/goccy/bigquery-emulator:0.1.14") and then I'm using a bigquery client to check that my event(s) was transformed and stored correctly in BigQuery. But before I try to read the data I obviously have to instruct bqwriter to "flush it's batch", that's why I asked about the stuff above.

My idea was have bqwriter "to write as often as possible" and then I could do an "awaitility" kind of thing and check the db a couple of times. But WorkerCount etc did not do much of a difference, marginally faster compared to the default config. So what I'm doing now instead is to simply call "Close()" on the bqwriter to flush the batch to BigQuery and this approach works but it takes 15 seconds for the Close() func to complete... tried a couple of different approaches with the context but have not found a way to decrease these 15 seconds yet. Any tips on how to make Close() become faster?

@andreas-lindfalk
Copy link
Contributor

Seems the emulator is a bit immature yet, there are issues with things freezing up (goccy/bigquery-emulator#49) so gonna let this marinate for a while instead of banging my head more for now. Either way, we want to have this merged so we can continue on the adventure

@GlenDC
Copy link
Member

GlenDC commented Oct 5, 2022

Thank you for your contribution. If here is anything else @andreas-dentech feel free to let me know.
I'll work on a release of BQWriter tomorrow :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants