-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up influx_inspect export
command and improve performance
#7741
Conversation
The export code was moved around a bit, particularly to ease testing export of a single TSM or WAL file. The functionality should not have changed.
Benchmark improvements with this change: benchmark old ns/op new ns/op delta BenchmarkExportTSMFloats_100s_250vps-4 23206480 10279106 -55.71% BenchmarkExportTSMInts_100s_250vps-4 17995000 5762310 -67.98% BenchmarkExportTSMBools_100s_250vps-4 17067605 4235467 -75.18% BenchmarkExportTSMStrings_100s_250vps-4 54846997 34682568 -36.76% BenchmarkExportWALFloats_100s_250vps-4 23459937 10436297 -55.51% BenchmarkExportWALInts_100s_250vps-4 18747150 6236062 -66.74% BenchmarkExportWALBools_100s_250vps-4 17988273 4814358 -73.24% BenchmarkExportWALStrings_100s_250vps-4 59700802 35815739 -40.01% benchmark old allocs new allocs delta BenchmarkExportTSMFloats_100s_250vps-4 201442 51738 -74.32% BenchmarkExportTSMInts_100s_250vps-4 201442 51728 -74.32% BenchmarkExportTSMBools_100s_250vps-4 201441 51638 -74.37% BenchmarkExportTSMStrings_100s_250vps-4 404092 201584 -50.11% BenchmarkExportWALFloats_100s_250vps-4 250322 75627 -69.79% BenchmarkExportWALInts_100s_250vps-4 250323 75617 -69.79% BenchmarkExportWALBools_100s_250vps-4 250321 75527 -69.83% BenchmarkExportWALStrings_100s_250vps-4 452868 225291 -50.25% benchmark old bytes new bytes delta BenchmarkExportTSMFloats_100s_250vps-4 5170539 2351789 -54.52% BenchmarkExportTSMInts_100s_250vps-4 5143189 2331276 -54.67% BenchmarkExportTSMBools_100s_250vps-4 3724951 2143780 -42.45% BenchmarkExportTSMStrings_100s_250vps-4 17131400 10796281 -36.98% BenchmarkExportWALFloats_100s_250vps-4 4487868 1468109 -67.29% BenchmarkExportWALInts_100s_250vps-4 4458395 1452359 -67.42% BenchmarkExportWALBools_100s_250vps-4 2838719 1258755 -55.66% BenchmarkExportWALStrings_100s_250vps-4 16787201 10010700 -40.37% Also, after improving those benchmarks, I did a time-filtered export on a 450MB TSM file to a 21GB plain text output, with and without the bufio.BufferedWriter. Without buffering, it took about 263s, and with buffering, it took about 60s, for a delta of about -77%.
benchmark old ns/op new ns/op delta BenchmarkEscapeStringField_Plain-4 167 65.3 -60.90% BenchmarkEscapeString_Quotes-4 167 165 -1.20% BenchmarkEscapeString_Backslashes-4 211 184 -12.80% BenchmarkEscapeString_QuotesAndBackslashes-4 413 397 -3.87% BenchmarkExportTSMStrings_100s_250vps-4 33833611 27381442 -19.07% BenchmarkExportWALStrings_100s_250vps-4 34977761 29222717 -16.45% benchmark old allocs new allocs delta BenchmarkEscapeStringField_Plain-4 4 1 -75.00% BenchmarkEscapeString_Quotes-4 4 3 -25.00% BenchmarkEscapeString_Backslashes-4 5 3 -40.00% BenchmarkEscapeString_QuotesAndBackslashes-4 9 5 -44.44% BenchmarkExportTSMStrings_100s_250vps-4 201605 76938 -61.84% BenchmarkExportWALStrings_100s_250vps-4 225371 100728 -55.31% benchmark old bytes new bytes delta BenchmarkEscapeStringField_Plain-4 56 16 -71.43% BenchmarkEscapeString_Quotes-4 56 48 -14.29% BenchmarkEscapeString_Backslashes-4 104 80 -23.08% BenchmarkEscapeString_QuotesAndBackslashes-4 208 160 -23.08% BenchmarkExportTSMStrings_100s_250vps-4 10872629 6062048 -44.24% BenchmarkExportWALStrings_100s_250vps-4 10094933 5269980 -47.80%
The previous implementation was wrong and double-escaped quotes and backslashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit, otherwise LGTM 👍
// Because calling (*os.File).Write is relatively expensive, | ||
// and we don't *need* to sync to disk on every written line of export, | ||
// use a sized buffered writer so that we only sync the file every megabyte. | ||
bw := bufio.NewWriterSize(f, 1024*1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not just be w := bufio....
and then we don't need the var w = bw
below?
w needs to be typed as an IO Writer so the gzip writer can be conditionally
assigned to w later.
In fact I need to double check whether it performs better with the buffered
writer in front of the gzip writer. I suspect it will.
…On Mon, Dec 19, 2016 at 4:30 AM Edd Robinson ***@***.***> wrote:
***@***.**** approved this pull request.
Just a small nit, otherwise LGTM 👍
------------------------------
In cmd/influx_inspect/export/export.go
<#7741 (review)>
:
> if err != nil {
return err
}
- defer w.Close()
+ defer f.Close()
+
+ // Because calling (*os.File).Write is relatively expensive,
+ // and we don't *need* to sync to disk on every written line of export,
+ // use a sized buffered writer so that we only sync the file every megabyte.
+ bw := bufio.NewWriterSize(f, 1024*1024)
Can this not just be w := bufio.... and then we don't need the var w = bw
below?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7741 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AARvV1bBrt_w4KrcOlX4z1Bt85i1rxuaks5rJnjtgaJpZM4LQC9j>
.
|
Using my earlier test case of a 450MB TSM file that, when time-filtered, exports to a 21GB plain text file or a 650MB gzipped text file, I found:
So the real time is about the same whether the buffer writer is before or after the gzip writer, but putting the gzip writer before the buffered writer seems to result in fewer file writes, which seems like the better win. |
Nice speedup! |
I cleaned up and reorganized the
influx_inspect export
command code and added tests and benchmarks, and then I made exports go quite a bit faster.(This benchmark is updated from a commit message to include the EscapeStringField optimization.)
Also, after improving those benchmarks, I did a time-filtered export on
a 450MB TSM file to a 21GB plain text output, with and without the
bufio.BufferedWriter.
Without buffering, it took about 263s, and with buffering, it took about
60s, for a delta of about -77%.