Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR 2/5] Introduce unsafe API for bulk read/write ops #334

Merged
merged 6 commits into from
Jun 26, 2024

Conversation

fzhinkin
Copy link
Collaborator

This is a first portion of unsafe API described in #135.
This particular part aimed to facilitate integration with other frameworks and libraries.

Implemented API was initially described in the "Bulk API" subsection of #135 (comment)

@fzhinkin fzhinkin requested a review from shanshin May 30, 2024 10:27
@fzhinkin
Copy link
Collaborator Author

CC @e5l

@e5l e5l self-requested a review May 30, 2024 10:28
@fzhinkin
Copy link
Collaborator Author

Previously, newly introduced API was published as a snapshot build from https://github.com/Kotlin/kotlinx-io/tree/snapshot/unsafe-api branch.
Since then, some functions changed their name, and vectorized read on JVM changed its signature slightly, but otherwise everything remained the same.

@fzhinkin
Copy link
Collaborator Author

fzhinkin commented May 30, 2024

This change sits on top of #332.
I ran benchmark for this and develop branches to check if this PR affects existing API and current branch performs significantly better, compared to develop. That looks questionable, I'll revisit results later.

kotlinx-io-benchmarking-result-2024-05-29-dev.json
kotlinx-io-benchmarking-result-2024-05-29-bulk-api-part-1.json

UPD:

That looks questionable, I'll revisit results later.

The results look legit: direct access to the buffer's tail improved performance on the write path.

core/common/src/Buffer.kt Outdated Show resolved Hide resolved
@@ -36,46 +37,52 @@ import kotlin.jvm.JvmField
* `limit` and beyond. There is a single owning segment for each byte array. Positions,
* limits, prev, and next references are not shared.
*/
internal class Segment {
public class Segment {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you tell me why the Segment is public? I can't remember any usage of it.

if it still needs to be public, it would be good to annotate it with @InternalAPI and make useful extensions public as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An API to iterate over Buffer's segments (see #135 (comment)) will be implemented in subsequent PRs. That API would expose Segments to make iteration allocation free.

@fzhinkin fzhinkin force-pushed the organize-segments-as-list branch from e3f76f7 to c128d21 Compare June 6, 2024 16:10
@fzhinkin fzhinkin changed the title Introduce unsafe API for bulk read/write ops [PR 2/5] Introduce unsafe API for bulk read/write ops Jun 6, 2024
fzhinkin added 4 commits June 21, 2024 12:31
The API aimed to facilitate integration
with other frameworks and libraries.

Implemented API was initially described in the "Bulk API" subsection of
#135 (comment)
@fzhinkin fzhinkin changed the base branch from organize-segments-as-list to develop June 21, 2024 10:36
@PublishedApi
@JvmSynthetic
@Suppress("UNUSED_PARAMETER")
internal fun dataAsByteArray(readOnly: Boolean): ByteArray = data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readOnly parameter created for the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@@ -295,3 +341,6 @@ internal fun Segment.indexOfBytesOutbound(bytes: ByteArray, startOffset: Int): I
}
return -1
}

@PublishedApi
internal fun Segment.isEmpty(): Boolean = size == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not a member of the Segment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the size is a core property of a Segment that could not be easily implemented outside the Segment, isEmpty is just a convince function. So I decided to implement it as an extension.

core/common/src/unsafe/UnsafeBufferOperations.kt Outdated Show resolved Hide resolved
core/common/src/unsafe/UnsafeBufferOperations.kt Outdated Show resolved Hide resolved
core/common/src/unsafe/UnsafeBufferOperations.kt Outdated Show resolved Hide resolved
core/common/test/unsafe/UnsafeBufferOperationsWriteTest.kt Outdated Show resolved Hide resolved
core/jvm/src/unsafe/UnsafeBufferOperationsJvm.kt Outdated Show resolved Hide resolved
}

/**
* Provides read-only access to [buffer]'s data by filling provided [iovec] array with [ByteBuffer]'s representing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not entirely clear from the documentation what the size of each ByteBuffer in the array will be and, in general, on the basis of which the total number of buffers is determined, is it predictable or not, which array size is better to use

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation is indeed require some improvements.

However, the API itself was designed with the idea that iovec-array will be allocated once and cached by the caller (for instance, as a thread-local variable) to reduce excessive allocations on the write path.
The number of ByteBuffers to be created corresponds to the number of segments a buffer has. Subsequent patches provides an API allowing to find that value, but in general, it's hard to give an advice on the array size (for example, a buffer holding 8000 bytes may have either 1 segment, or multiple).

fzhinkin added 2 commits June 21, 2024 16:53
- got rid of duplicate checks
- started verifying segment's content for bulk reads
@fzhinkin fzhinkin requested a review from shanshin June 25, 2024 12:37
@fzhinkin fzhinkin merged commit 72fa4ee into develop Jun 26, 2024
1 check passed
@fzhinkin fzhinkin deleted the bulk-api-part-1 branch August 27, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants