Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ByteBuffer: Read and write null terminated strings #1990

Merged

Conversation

fabianfett
Copy link
Member

Motivation:

Some network protocols use null terminated strings (e.g. Postgres).
NIO should provide helper functions for reading and writing them to
make sure the everyone gets the best performance possible.

Modifications:

  • Add a setNullTerminatedString(String, at: Int) -> Int method
  • Add a writeNullTerminatedString(String) -> Int method
  • Add a getNullTerminatedString(at: Int) -> String? method
  • Add a readNullTerminatedString() -> String? method

Result:

Easier for community to use null terminated strings in ByteBuffer.

@fabianfett fabianfett added the 🆕 semver/minor Adds new public API. label Nov 22, 2021
@fabianfett fabianfett requested review from weissi and Lukasa November 22, 2021 19:35
// Substring.UTF8View does not implement withContiguousStorageIfAvailable
// and therefore has no fast access to the backing storage.
return string.withCString { ptr in
let urbp = UnsafeRawBufferPointer(start: .init(ptr), count: strlen(ptr) + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use string.utf8.count instead of strlen(ptr) but not 100% we should rely on this...

Copy link
Member Author

@fabianfett fabianfett Nov 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had another look at our options.

        return string.utf8CString.withUnsafeBytes {
            return self.setBytes($0, at: self.writerIndex)
        }

This looks much better.

@fabianfett fabianfett force-pushed the ff-read-and-write-null-terminated-strings branch 2 times, most recently from f6d2aec to 417892e Compare November 22, 2021 20:04
@fabianfett
Copy link
Member Author

We will need other naming for this, since we would likely break the Vapor ecosystem otherwise:
https://github.com/vapor/core/blob/main/Sources/Bits/ByteBuffer+string.swift#L6

/// - index: The index for the first serialized byte.
/// - returns: The number of bytes written.
public mutating func setNullTerminatedString(_ string: String, at index: Int) -> Int {
return string.utf8CString.withUnsafeBytes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This triggers an unnecessary allocation. Given that we're going to use withUnsafeBytes anyway, withCString followed by type-erasing the pointer to UnsafeMutableRawPointer would be a better pair of function calls.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa did you check that this allocates? Because the idea with Swift 5(?)'s (then) new String was that we always store it 0 terminated so this can be free. So I'd expect that the utf8CString view is the same as the utf8 view but with the extra 0 byte visible.

Happy to be wrong but if I'm wrong we should file this, no need for Swift to alloc here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utf8CString returns ContiguousArray<CInt>. This does not match any of the internal views of the String, and so it allocates.

Copy link
Member

@weissi weissi Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG, thanks, TIL this returns a ContiguousArray!?. I think this is still worthy a Swift bug (CC @milseman) to deprecate this and replace it by a view. IMHO, properties should be O(1) and I don't see why we can't make a view here much like UTF8View that we can get a contiguous pointer on.

Copy link

@milseman milseman Nov 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not from Swift, it's from Foundation. There is a different entry point called by C pointer bridging, which did have an extra allocation inside it, not sure if that's still the case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa ah yes, that's the one. I got it confused with utf8String, which likely has a signature you'd prefer. utf8CString is an old API and I'm ok with deprecating it in favor of something else. What would that type be, given you want separate ownership?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @weissi proposed having a view type (like UTF8View that was null-terminated). To allow separate ownership we probably need the view to be wide, so that it can store a smol string with the null-terminator. That should allow zero allocations to produce.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We model shared ownership via closures today, ala withCString which will only allocate for non-native strings. Native strings are tail-allocated. When we get move only types, we can have move-only shared views and partially-escape closure hell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does UTF8View fit into that model?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UTFView doesn't unconditionally provide access to contiguous UTF-8 encoded bytes in memory. You can conditionally get those if available via wCSIA

return nil
}

return self.withUnsafeReadableBytes { pointer in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the unsafe code here: this function could easily be written on ByteBufferView instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered that iterating the byteBufferView was slow, but apparently @PeterAdams-A fixed this: #1394. Will change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we control BBV, and can make it fast if we find that it's slow. :D

/// Read a null terminated string off this `ByteBuffer`, decoding it as `String` using the UTF-8 encoding. Move the reader index
/// forward by the string's length and its null terminator.
///
/// - returns: A `String` value deserialized from this `ByteBuffer` or `nil` if there isn't a null terminated string readable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be clearer: it would be better to say that this returns nil if there is not a complete null-terminated string, including null-terminator, in the readable bytes of the buffer.

Copy link
Member

@dnadoba dnadoba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice addition!

@0xTim
Copy link
Contributor

0xTim commented Nov 23, 2021

We will need other naming for this, since we would likely break the Vapor ecosystem otherwise:
https://github.com/vapor/core/blob/main/Sources/Bits/ByteBuffer+string.swift#L6

FYI that's a Vapor 3 package that relies on NIO 1 so no issue with this PR

@fabianfett fabianfett force-pushed the ff-read-and-write-null-terminated-strings branch 2 times, most recently from f4f97d9 to 6080080 Compare November 23, 2021 12:45
fabianfett added a commit to vapor/postgres-nio that referenced this pull request Nov 23, 2021
### Motivation

Because of https://bugs.swift.org/browse/SR-15517, we might run into naming conflicts with SwiftNIO, once apple/swift-nio#1990 lands.

### Changes

- Prefix all ByteBuffer utility methods

### Result

Chances of breaking code reduced.
@fabianfett fabianfett requested a review from Lukasa November 23, 2021 14:39
/// - returns: The number of bytes written.
public mutating func setNullTerminatedString(_ string: String, at index: Int) -> Int {
let length = self.setString(string, at: index)
self.setInteger(UInt8(0), at: index + length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unchecked.

public mutating func setNullTerminatedString(_ string: String, at index: Int) -> Int {
let length = self.setString(string, at: index)
self.setInteger(UInt8(0), at: index + length)
return length + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unchecked too.

}

private func getNullTerminatedStringLength(at index: Int) -> Int? {
guard self.readerIndex <= index, index < self.readerIndex + self.readableBytes else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer && to , when there are no conditional bindings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send.readerIndex + self.readableBytes is just writerIndex.

return nil
}

guard let endIndex = self.readableBytesView[index...].firstIndex(where: { $0 == 0 }) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mild nit but we don't need to stress the optimiser with the closure: we can just do firstIndex(of: 0).

guard let endIndex = self.readableBytesView[index...].firstIndex(where: { $0 == 0 }) else {
return nil
}
return endIndex - index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be unchecked.

@fabianfett fabianfett force-pushed the ff-read-and-write-null-terminated-strings branch from 6080080 to c452287 Compare November 24, 2021 10:47
@Lukasa Lukasa merged commit 926fe9d into apple:main Nov 24, 2021
@fabianfett fabianfett deleted the ff-read-and-write-null-terminated-strings branch November 24, 2021 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🆕 semver/minor Adds new public API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants