Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic logic of re-using buffers #382

Merged
merged 3 commits into from
Nov 16, 2021
Merged

Implement basic logic of re-using buffers #382

merged 3 commits into from
Nov 16, 2021

Conversation

akamensky
Copy link
Contributor

Closes #353

@codecov
Copy link

codecov bot commented Nov 15, 2021

Codecov Report

Merging #382 (6fe1401) into master (2b8d60f) will increase coverage by 0.52%.
The diff coverage is 92.59%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #382      +/-   ##
============================================
+ Coverage     61.84%   62.37%   +0.52%     
- Complexity     1191     1248      +57     
============================================
  Files           131      135       +4     
  Lines          6401     6631     +230     
  Branches        475      516      +41     
============================================
+ Hits           3959     4136     +177     
- Misses         2175     2219      +44     
- Partials        267      276       +9     
Impacted Files Coverage Δ
...c/statement/ClickHousePreparedInsertStatement.java 52.90% <ø> (+3.55%) ⬆️
.../github/housepower/settings/ClickHouseDefines.java 0.00% <ø> (-75.00%) ⬇️
...hub/housepower/data/ColumnWriterBufferFactory.java 70.58% <70.58%> (ø)
...rc/main/java/com/github/housepower/data/Block.java 91.17% <91.66%> (-0.06%) ⬇️
.../com/github/housepower/buffer/ByteArrayWriter.java 100.00% <100.00%> (+29.62%) ⬆️
...com/github/housepower/data/ColumnWriterBuffer.java 100.00% <100.00%> (ø)
.../com/github/housepower/serde/BinarySerializer.java 94.73% <100.00%> (+0.53%) ⬆️
...m/github/housepower/settings/ClickHouseConfig.java 77.12% <0.00%> (-0.19%) ⬇️
...om/github/housepower/data/type/DataTypeDate32.java 45.83% <0.00%> (ø)
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b8d60f...6fe1401. Read the comment docs.

public BinarySerializer(BuffedWriter writer, boolean enableCompress) {
this.enableCompress = enableCompress;
BuffedWriter compressWriter = null;
if (enableCompress) {
compressWriter = new CompressedBuffedWriter(ClickHouseDefines.SOCKET_SEND_BUFFER_BYTES, writer);
}
switcher = new Switcher<>(compressWriter, writer);
// max num of byte is 8 for double and long
writeBuffer = new byte[8];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@AndreyNudko
Copy link
Contributor

First of all, thanks a lot for addressing this over-allocation problem. I was contemplating a simpler change, but this one is a lot better!
I have couple of questions if that's ok:

  1. Is it necessary for SOCKET_SEND_BUFFER_BYTES, SOCKET_RECV_BUFFER_BYTES, and COLUMN_BUFFER_BYTES to be final?
    We're currently setting lower values to minimize allocation
    I appreciate this PR addresses the problem in a better way, but they seem like useful tunables (unless ClickHouseDefines is considered internal API)

  2. If there was a spike in writes and ByteArrayWriter allocated extra buffers - they will remain in freeList forever and won't be collected, right?
    It may be desirable in case of long-term change in traffic pattern, or wasting memory in case of a short burst
    Not saying this is a problem (almost certainly not for our use-cases), just trying to understand implications

  3. ColumnWriterBuffer::writeTo
    Is there any reason to use temporary buffer instead of doing something like
    serializer.writeBytes(buffer.array(), buffer.arrayOffset(), buffer.remaining());
    I understand this is coupled with ByteArrayWriter implementation allocating heap buffers - but both classes already seem to have quite close relationship; and the clue is in the name

@akamensky
Copy link
Contributor Author

@AndreyNudko Please see comment in linked ticket. I am not the right person to ask those questions and not sure the right person will have more time to work on this to address those. I am but a messenger.

@AndreyNudko
Copy link
Contributor

@akamensky Don't get me wrong - those questions are more for maintainers to consider, I'm certainly not asking for more from you guys. And I really appreciate the effort you made.

@sundy-li
Copy link
Member

@akamensky Thanks for the pr, I'll take time to review this.

@AndreyNudko

Is it necessary for SOCKET_SEND_BUFFER_BYTES, SOCKET_RECV_BUFFER_BYTES, and COLUMN_BUFFER_BYTES to be final?

I do think letting it be the way before is better, users can modify set these to other values.

If there was a spike in writes and ByteArrayWriter allocated extra buffers - they will remain in freeList forever and won't be collected, right?

Yes, that's the side effect, manually re-using the buffer or let them be controlled by JVM. But I think it may be worthy to do if we can set fixed blocksize (batchsize) for each insert( better let the value lesser than 1048576).

clickhouse-go seems to have a buffer block in each connection, see: https://github.com/ClickHouse/clickhouse-go/blob/master/clickhouse.go#L46-L61

}

public void recycleBuffer(ColumnWriterBuffer buffer) {
stack.addLast(buffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about reset it before addLast, making stack keeps all clean buffers.

*/
public class ColumnWriterBufferFactory {

private final ConcurrentLinkedDeque<ColumnWriterBuffer> stack = new ConcurrentLinkedDeque<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to export a method named clear to clear all buffers manually.

@akamensky
Copy link
Contributor Author

@sundy-li as I mentioned, please check the comment in linked ticket. I did not write those changes, it comes from a colleague who is a Java expert. They do not have more time to invest into this changeset. They also mentioned that the implementation is very crude, so you may be right on those comments you left. I am not the right person to follow up on those comments, but feel free to modify these changes as you see appropriate.

Meantime we are using those changes internally and it reduced CPU load dramatically.

@sundy-li sundy-li merged commit 03a3fa3 into housepower:master Nov 16, 2021
pan3793 pushed a commit that referenced this pull request Dec 3, 2021
* Implement basic logic of re-using buffers

* Add clearAllBuffers method inside ColumnWriterBufferFactory

* Tab to spaces

Co-authored-by: sundyli <543950155@qq.com>
AndreyNudko pushed a commit to AndreyNudko/ClickHouse-Native-JDBC that referenced this pull request Dec 21, 2021
…usepower#382

Instead of copying serialized data from ByteBuffer via intermediate byte[] array, just write backing array directly (this somewhat couples ColumnWriterBuffer with ByteArrayWriter internals, however as the name implies ByteArrayWriter is expected to use heap buffers).
ByteArrayWriter and CompressedBuffedWriter use fixed-size buffers internally and split bytes as required anyway
SocketBuffedWriter is updated to write in fixed-size chunks to minimize allocations in the native code; this may be dubious optimisation, but in worst case it should be harmless
@akamensky akamensky deleted the reuse-buffers branch December 31, 2021 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Very high CPU utilization when using native driver
3 participants