Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultipartKit V5 #100

Merged
merged 29 commits into from
Jan 8, 2025
Merged

MultipartKit V5 #100

merged 29 commits into from
Jan 8, 2025

Conversation

ptoffy
Copy link
Member

@ptoffy ptoffy commented Oct 7, 2024

No description provided.

@ptoffy ptoffy added the semver-major Breaking changes label Oct 7, 2024
@ptoffy ptoffy self-assigned this Oct 7, 2024
Copy link
Contributor

@adam-fowler adam-fowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments.

Sources/MultipartKit/MultipartParser+AsyncStream.swift Outdated Show resolved Hide resolved
Sources/MultipartKit/MultipartParser.swift Outdated Show resolved Hide resolved
Sources/MultipartKit/MultipartParser+AsyncStream.swift Outdated Show resolved Hide resolved
Sources/MultipartKit/MultipartParser.swift Show resolved Hide resolved
@ptoffy ptoffy requested a review from adam-fowler October 30, 2024 12:02
@ptoffy
Copy link
Member Author

ptoffy commented Oct 30, 2024

@adam-fowler Do you want to take a look at this again and see if stuff makes more sense now? I added in some binary data (which even contains hex-CRLF 😄) to the tests so it should be able to parse anything now

@ptoffy ptoffy marked this pull request as ready for review November 18, 2024 15:54
@ptoffy ptoffy requested review from 0xTim and gwynne as code owners November 18, 2024 15:54
@ptoffy ptoffy requested review from czechboy0 and Joannis and removed request for adam-fowler November 18, 2024 20:06
@ptoffy
Copy link
Member Author

ptoffy commented Nov 18, 2024

@Joannis @simonjbeaumont @czechboy0 pulling you into this so you can take a look if you want

Package.swift Show resolved Hide resolved
Package.swift Outdated Show resolved Hide resolved
Package.swift Show resolved Hide resolved
public func decode<D: Decodable>(_ decodable: D.Type, from data: String, boundary: String) throws -> D {
try decode(D.self, from: ByteBuffer(string: data), boundary: boundary)
public func decode<D: Decodable>(_ decodable: D.Type, from string: String, boundary: String) throws -> D {
try decode(D.self, from: Array(string.utf8), boundary: boundary)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a copy from string, can't we use withContiguousMemoryIfAvailable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh I don't think that would work any better because we need the Collection there. We can't pass in the raw bytes and if we're copying them to an array we're still making a copy at that point right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're doing an .append on the parser you're right that you're already making a copy. Except right now you're making two copies of the same data, and each time you're also allocating space for that data.

Copy link
Member Author

@ptoffy ptoffy Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean we can't pass the raw pointer to the decode method because it expects the collection of bytes, so this can't be done

try string.utf8.withContiguousStorageIfAvailable { bytes in
    decode(D.self, from: bytes, boundary: boundary)
} ?? decode(D.self, from: Array(string.utf8), boundary: boundary)

And if we were to do something like

if let bytes = string.utf8.withContiguousStorageIfAvailable(Array.init) {
    try decode(D.self, from: bytes, boundary: boundary)
} else {
    try decode(D.self, from: Array(string.utf8), boundary: boundary)
}

we're still initialising an array with the raw bytes so I think this doesn't really save us a copy.
Unless you mean a different way of using withContiguousStorageIfAvailable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really prefer not copying unnecessarily in a library like this

@ptoffy ptoffy requested a review from adam-fowler January 1, 2025 18:58
@ptoffy
Copy link
Member Author

ptoffy commented Jan 1, 2025

I agree with the proposed changes but I moved the nextCollatedPart() to simply be the next() method of the new sequence which makes more sense to me

@ptoffy ptoffy requested a review from 0xTim January 1, 2025 21:17
Copy link
Member

@0xTim 0xTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking on my end. Once other comments have been resolved we can look at integrating this into projects to see how it actually works

Copy link
Member

@Joannis Joannis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main (only) issue is that this library makes a lot of copies out of every byte carrier (String, Array etc)

@ptoffy ptoffy requested a review from Joannis January 6, 2025 18:48
Copy link
Member

@gwynne gwynne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various nits

@ptoffy ptoffy requested a review from gwynne January 7, 2025 10:06
Copy link
Member

@Joannis Joannis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should still optimise the one copy, but I'm happy with this!

@ptoffy
Copy link
Member Author

ptoffy commented Jan 7, 2025

Cool - if @adam-fowler has no outstanding requests then this should be ready to go

@ptoffy ptoffy removed the semver-major Breaking changes label Jan 8, 2025
Copy link
Contributor

@adam-fowler adam-fowler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the proposed changes but I moved the nextCollatedPart() to simply be the next() method of the new sequence which makes more sense to me

While I can see the sense in this, I'm thinking of how this may be used. Given multipart files can come with sections of different structure, some you might want to stream and some you might want collated I think you should re-instate the nextCollatedPart().

One other thing to consider. Nobody is ever going to want to stream headers, so collating them automatically makes sense.

I am trying this out with my swift package registry example just now. Will report back. To give you some context SwiftPM packages a file into a Multipart file consisting of headers for metadata, a metadata block, a headers for the package zip, and then the zip file of the package. I don't want to stream the first three sections but will want to stream the last.

@adam-fowler
Copy link
Contributor

Yeah by making me deal with streaming all the sections it is painful. Would be great if I could do

let multipartStream = StreamingMultipartParserAsyncSequence(...)
let iterator = multipartStream.makeAsyncIterator()
while let part = try await iterator.next() {
    guard case .headerFields(let headers) else { continue }
    // is metadata
    if getParameter(headers[.contentDisposition].first, parameter: "name") == "metadata" {
        guard let metadata = iterator.nextCollatedBlock() else { throw Error() }
        parseMetadata(metadata)
    }
    // is archive
    if getParameter(headers[.contentDisposition].first, parameter: "name") == "archive" {
        while case .bodyChunk(let buffer) = try await iterator.next() {
            streamArchivePartToDisk(buffer)
        }
    }
}

@0xTim
Copy link
Member

0xTim commented Jan 8, 2025

I think that's a reasonable request for the API. If you're happy @adam-fowler I think we should merge as is and then add the improved API going forward?

@ptoffy
Copy link
Member Author

ptoffy commented Jan 8, 2025

@0xTim I think adding the change in this PR makes sense. I'll add it in this afternoon and then we can merge

@ptoffy
Copy link
Member Author

ptoffy commented Jan 8, 2025

@adam-fowler I updated the code to work with your example and I can confirm this works

let stream = makeParsingStream(for: message)
var iterator = StreamingMultipartParserAsyncSequence(boundary: boundary, buffer: stream).makeAsyncIterator()

var accumulatedBody: [UInt8] = []

while let part = try await iterator.next() {
    guard case .headerFields(let fields) = part else { continue }

    if fields.getParameter(.contentDisposition, "name") == "id" {
        guard let id = try await iterator.nextCollatedPart() else {
            throw MultipartMessageError.unexpectedEndOfFile
        }
        #expect(id == .bodyChunk(ArraySlice("123e4567-e89b-12d3-a456-426655440000".utf8)))
    } else {
        while case .bodyChunk(let chunk) = try await iterator.next() {
            accumulatedBody.append(contentsOf: chunk)
        }
    }
}

#expect(accumulatedBody == pngData)

though I haven't added the test to the branch because I'm a bit reluctant in making fields.getParameter() public. We can always do that later if needed. If you're happy with this I'll merge now

@ptoffy ptoffy merged commit beea78b into main Jan 8, 2025
10 of 11 checks passed
@ptoffy ptoffy deleted the v5 branch January 8, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants