Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: write to channel should be protected with context #91

Merged
merged 1 commit into from
Nov 11, 2024

Conversation

MetalBlueberry
Copy link
Contributor

@MetalBlueberry MetalBlueberry commented Nov 8, 2024

Summary

If a failure occurs during copy, the worker will return an error and publish it to the errChan.
in the mean time, the publisher is waiting for a worker to pickup the next batch, but this will never happen as the worker has stopped to process due to the error.

The fix makes sure the publish is protected with the context, this way if the workers signals cancellation, the publisher will skip the write to the channel en return.

Fix #86

Before

$ timescaledb-parallel-copy --batch-size 2000 -columns timestamp,email,product_name,product_price,product_description,address -connection "postgres://tsdbadmin@dg7cvr30ww.xltbtwir0g.dev.metronome-cloud.com:30672/tsdb?sslmode=require" -file bad-250Mb.csv -log-batches --skip-header -workers 1 -table products -verbose
Skipping the first 1 lines of the input.
[BATCH] took 642.905133ms, batch size 2000, row rate 3110.878880/sec
[BATCH] took 134.037298ms, batch size 2000, row rate 14921.219913/sec
[BATCH] took 140.005033ms, batch size 2000, row rate 14285.200733/sec
[BATCH] took 132.490415ms, batch size 2000, row rate 15095.431620/sec
[BATCH] took 183.054623ms, batch size 2000, row rate 10925.700576/sec
[BATCH] took 134.58509ms, batch size 2000, row rate 14860.487146/sec
[BATCH] took 110.557455ms, batch size 2000, row rate 18090.141456/sec
[BATCH] took 115.60151ms, batch size 2000, row rate 17300.812074/sec
[BATCH] took 110.960928ms, batch size 2000, row rate 18024.362594/sec
[BATCH] took 108.822631ms, batch size 2000, row rate 18378.530106/sec
[BATCH] took 107.878214ms, batch size 2000, row rate 18539.424466/sec
[BATCH] took 113.947117ms, batch size 2000, row rate 17552.001776/sec
[BATCH] took 113.841528ms, batch size 2000, row rate 17568.281410/sec
[BATCH] took 112.475923ms, batch size 2000, row rate 17781.583353/sec
[BATCH] took 109.640744ms, batch size 2000, row rate 18241.393911/sec
[BATCH] took 109.81101ms, batch size 2000, row rate 18213.109960/sec
[BATCH] took 118.469893ms, batch size 2000, row rate 16881.926280/sec
[BATCH] took 115.099953ms, batch size 2000, row rate 17376.201709/sec
[BATCH] took 111.687116ms, batch size 2000, row rate 17907.168451/sec
[BATCH] took 112.279093ms, batch size 2000, row rate 17812.755221/sec
[BATCH] took 109.997162ms, batch size 2000, row rate 18182.287285/sec
[BATCH] took 110.56991ms, batch size 2000, row rate 18088.103716/sec
[BATCH] took 118.05362ms, batch size 2000, row rate 16941.454231/sec
[BATCH] took 114.429088ms, batch size 2000, row rate 17478.073407/sec
[BATCH] took 118.001467ms, batch size 2000, row rate 16948.941830/sec
[BATCH] took 112.955909ms, batch size 2000, row rate 17706.023684/sec
[BATCH] took 119.419659ms, batch size 2000, row rate 16747.661287/sec
[BATCH] took 112.8693ms, batch size 2000, row rate 17719.610204/sec
[BATCH] took 112.945094ms, batch size 2000, row rate 17707.719115/sec
[BATCH] took 111.475324ms, batch size 2000, row rate 17941.190285/sec
[BATCH] took 111.745182ms, batch size 2000, row rate 17897.863373/sec
[BATCH] took 225.354858ms, batch size 2000, row rate 8874.891883/sec
[BATCH] took 155.144322ms, batch size 2000, row rate 12891.222664/sec
^C⏎     

After

$ timescaledb-parallel-copy --batch-size 2000 -columns timestamp,email,product_name,product_price,product_description,address -connection "postgres://tsdbadmin@dg7cvr30ww.xltbtwir0g.dev.metronome-cloud.com:30672/tsdb?sslmode=require" -file bad-250Mb.csv -log-batches --skip-header -workers 1 -table products -verbose
Skipping the first 1 lines of the input.
[BATCH] took 702.384056ms, batch size 2000, row rate 2847.445045/sec
[BATCH] took 140.041415ms, batch size 2000, row rate 14281.489515/sec
[BATCH] took 123.926393ms, batch size 2000, row rate 16138.612216/sec
[BATCH] took 130.310722ms, batch size 2000, row rate 15347.931232/sec
[BATCH] took 122.969743ms, batch size 2000, row rate 16264.163454/sec
[BATCH] took 138.998113ms, batch size 2000, row rate 14388.684543/sec
[BATCH] took 119.888143ms, batch size 2000, row rate 16682.216856/sec
[BATCH] took 143.950427ms, batch size 2000, row rate 13893.671882/sec
[BATCH] took 175.660948ms, batch size 2000, row rate 11385.569888/sec
[BATCH] took 180.364551ms, batch size 2000, row rate 11088.653446/sec
[BATCH] took 269.932019ms, batch size 2000, row rate 7409.272925/sec
[BATCH] took 201.150499ms, batch size 2000, row rate 9942.804069/sec
[BATCH] took 281.849237ms, batch size 2000, row rate 7095.992245/sec
[BATCH] took 132.748001ms, batch size 2000, row rate 15066.140243/sec
[BATCH] took 133.793328ms, batch size 2000, row rate 14948.428520/sec
[BATCH] took 137.898498ms, batch size 2000, row rate 14503.421205/sec
[BATCH] took 119.587076ms, batch size 2000, row rate 16724.215249/sec
[BATCH] took 115.550021ms, batch size 2000, row rate 17308.521303/sec
[BATCH] took 120.206873ms, batch size 2000, row rate 16637.983753/sec
[BATCH] took 120.333245ms, batch size 2000, row rate 16620.510816/sec
[BATCH] took 136.72771ms, batch size 2000, row rate 14627.612793/sec
[BATCH] took 115.813919ms, batch size 2000, row rate 17269.081448/sec
[BATCH] took 116.057495ms, batch size 2000, row rate 17232.837914/sec
[BATCH] took 129.666911ms, batch size 2000, row rate 15424.135460/sec
[BATCH] took 116.341334ms, batch size 2000, row rate 17190.794804/sec
[BATCH] took 111.159006ms, batch size 2000, row rate 17992.244371/sec
[BATCH] took 120.338014ms, batch size 2000, row rate 16619.852144/sec
[BATCH] took 122.051788ms, batch size 2000, row rate 16386.486694/sec
[BATCH] took 121.859221ms, batch size 2000, row rate 16412.381300/sec
[BATCH] took 141.213773ms, batch size 2000, row rate 14162.924462/sec
[BATCH] took 134.007307ms, batch size 2000, row rate 14924.559300/sec
[BATCH] took 177.593516ms, batch size 2000, row rate 11261.672414/sec
[BATCH] took 129.579431ms, batch size 2000, row rate 15434.548405/sec
panic: ERROR: invalid input syntax for type timestamp with time zone: "2024-02-16T07:04:00ZXXXXX" (SQLSTATE 22007)

goroutine 7 [running]:
github.com/timescale/timescaledb-parallel-copy/pkg/csvcopy.(*Copier).processBatches(0xc0000f2000, 0xc0000143d0, 0xc0000cc8c0)
	/home/victor/Documents/Projects/timescale/timescaledb-parallel-copy/pkg/csvcopy/csvcopy.go:240 +0x873
created by github.com/timescale/timescaledb-parallel-copy/pkg/csvcopy.(*Copier).Copy in goroutine 1
	/home/victor/Documents/Projects/timescale/timescaledb-parallel-copy/pkg/csvcopy/csvcopy.go:162 +0x7f

this prevents dead lock if the listener stopped while the publisher was waiting to send a message
@CLAassistant
Copy link

CLAassistant commented Nov 8, 2024

CLA assistant check
All committers have signed the CLA.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@MetalBlueberry MetalBlueberry marked this pull request as ready for review November 8, 2024 10:10
@MetalBlueberry MetalBlueberry merged commit 3efae78 into main Nov 11, 2024
3 checks passed
@MetalBlueberry MetalBlueberry deleted the vperez/fix-#86 branch November 11, 2024 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fatal error: all goroutines are asleep - deadlock!
3 participants