MultipartKit V5 #100

ptoffy · 2024-10-07T18:52:36Z

No description provided.

adam-fowler

Some initial comments.

Sources/MultipartKit/MultipartParser+AsyncStream.swift

Sources/MultipartKit/MultipartParser.swift

Sources/MultipartKit/MultipartParser+AsyncStream.swift

Sources/MultipartKit/MultipartParser.swift

ptoffy · 2024-10-30T12:05:04Z

@adam-fowler Do you want to take a look at this again and see if stuff makes more sense now? I added in some binary data (which even contains hex-CRLF 😄) to the tests so it should be able to parse anything now

ptoffy · 2024-11-18T20:10:28Z

@Joannis @simonjbeaumont @czechboy0 pulling you into this so you can take a look if you want

Package.swift

Sources/MultipartKit/FormDataDecoder/FormDataDecoder.SingleValueContainer.swift

Joannis · 2024-11-19T08:32:52Z

Sources/MultipartKit/FormDataDecoder/FormDataDecoder.swift

-    public func decode<D: Decodable>(_ decodable: D.Type, from data: String, boundary: String) throws -> D {
-        try decode(D.self, from: ByteBuffer(string: data), boundary: boundary)
+    public func decode<D: Decodable>(_ decodable: D.Type, from string: String, boundary: String) throws -> D {
+        try decode(D.self, from: Array(string.utf8), boundary: boundary)


This makes a copy from string, can't we use withContiguousMemoryIfAvailable?

Mhh I don't think that would work any better because we need the Collection there. We can't pass in the raw bytes and if we're copying them to an array we're still making a copy at that point right?

Since you're doing an .append on the parser you're right that you're already making a copy. Except right now you're making two copies of the same data, and each time you're also allocating space for that data.

No, I mean we can't pass the raw pointer to the decode method because it expects the collection of bytes, so this can't be done

try string.utf8.withContiguousStorageIfAvailable { bytes in decode(D.self, from: bytes, boundary: boundary) } ?? decode(D.self, from: Array(string.utf8), boundary: boundary)

And if we were to do something like

if let bytes = string.utf8.withContiguousStorageIfAvailable(Array.init) { try decode(D.self, from: bytes, boundary: boundary) } else { try decode(D.self, from: Array(string.utf8), boundary: boundary) }

we're still initialising an array with the raw bytes so I think this doesn't really save us a copy.
Unless you mean a different way of using withContiguousStorageIfAvailable

I would really prefer not copying unnecessarily in a library like this

Sources/MultipartKit/MultipartParser+parse.swift

ptoffy · 2025-01-01T18:59:56Z

I agree with the proposed changes but I moved the nextCollatedPart() to simply be the next() method of the new sequence which makes more sense to me

0xTim

Nothing blocking on my end. Once other comments have been resolved we can look at integrating this into projects to see how it actually works

Sources/MultipartKit/MultipartSerializer.swift

Joannis

My main (only) issue is that this library makes a lot of copies out of every byte carrier (String, Array etc)

gwynne

Various nits

Sources/MultipartKit/FormDataDecoder/FormDataDecoder+SingleValueContainer.swift

Sources/MultipartKit/FormDataDecoder/FormDataDecoder+KeyedContainer.swift

Sources/MultipartKit/FormDataEncoder/FormDataEncoder+SingleValueContainer.swift

Sources/MultipartKit/StreamingMultipartParserAsyncSequence.swift

Joannis

We should still optimise the one copy, but I'm happy with this!

ptoffy · 2025-01-07T14:18:37Z

Cool - if @adam-fowler has no outstanding requests then this should be ready to go

Sources/MultipartKit/MultipartSerializer.swift

adam-fowler

I agree with the proposed changes but I moved the nextCollatedPart() to simply be the next() method of the new sequence which makes more sense to me

While I can see the sense in this, I'm thinking of how this may be used. Given multipart files can come with sections of different structure, some you might want to stream and some you might want collated I think you should re-instate the nextCollatedPart().

One other thing to consider. Nobody is ever going to want to stream headers, so collating them automatically makes sense.

I am trying this out with my swift package registry example just now. Will report back. To give you some context SwiftPM packages a file into a Multipart file consisting of headers for metadata, a metadata block, a headers for the package zip, and then the zip file of the package. I don't want to stream the first three sections but will want to stream the last.

adam-fowler · 2025-01-08T11:38:39Z

Yeah by making me deal with streaming all the sections it is painful. Would be great if I could do

let multipartStream = StreamingMultipartParserAsyncSequence(...)
let iterator = multipartStream.makeAsyncIterator()
while let part = try await iterator.next() {
    guard case .headerFields(let headers) else { continue }
    // is metadata
    if getParameter(headers[.contentDisposition].first, parameter: "name") == "metadata" {
        guard let metadata = iterator.nextCollatedBlock() else { throw Error() }
        parseMetadata(metadata)
    }
    // is archive
    if getParameter(headers[.contentDisposition].first, parameter: "name") == "archive" {
        while case .bodyChunk(let buffer) = try await iterator.next() {
            streamArchivePartToDisk(buffer)
        }
    }
}

0xTim · 2025-01-08T12:29:36Z

I think that's a reasonable request for the API. If you're happy @adam-fowler I think we should merge as is and then add the improved API going forward?

ptoffy · 2025-01-08T12:54:20Z

@0xTim I think adding the change in this PR makes sense. I'll add it in this afternoon and then we can merge

ptoffy · 2025-01-08T14:44:05Z

@adam-fowler I updated the code to work with your example and I can confirm this works

let stream = makeParsingStream(for: message)
var iterator = StreamingMultipartParserAsyncSequence(boundary: boundary, buffer: stream).makeAsyncIterator()

var accumulatedBody: [UInt8] = []

while let part = try await iterator.next() {
    guard case .headerFields(let fields) = part else { continue }

    if fields.getParameter(.contentDisposition, "name") == "id" {
        guard let id = try await iterator.nextCollatedPart() else {
            throw MultipartMessageError.unexpectedEndOfFile
        }
        #expect(id == .bodyChunk(ArraySlice("123e4567-e89b-12d3-a456-426655440000".utf8)))
    } else {
        while case .bodyChunk(let chunk) = try await iterator.next() {
            accumulatedBody.append(contentsOf: chunk)
        }
    }
}

#expect(accumulatedBody == pngData)

though I haven't added the test to the branch because I'm a bit reluctant in making fields.getParameter() public. We can always do that later if needed. If you're happy with this I'll merge now

Start making the parser async

fc69abc

ptoffy added the semver-major Breaking changes label Oct 7, 2024

ptoffy self-assigned this Oct 7, 2024

ptoffy added 5 commits October 21, 2024 10:28

Make header parsing work

00d854e

Add body parsing support

00bc9d4

Housekeeping

232c2a8

Add more complex example test

86414b7

Move error throwing up one level

9822834

adam-fowler reviewed Oct 23, 2024

View reviewed changes

Apply suggestions and add binary data test

122018f

ptoffy requested a review from adam-fowler October 30, 2024 12:02

ptoffy added 3 commits November 5, 2024 15:23

Add sync parsing and serialising

64a014c

Wip

d6c26c3

Make encoding work again

d9a056e

ptoffy force-pushed the v5 branch from ad06379 to d9a056e Compare November 7, 2024 09:54

ptoffy added 2 commits November 7, 2024 10:55

Start generifying stuff

fee44de

Make encoders work with generics

e7c852a

ptoffy mentioned this pull request Nov 18, 2024

Compiler crash in Swift Testing swiftlang/swift#77674

Closed

Finish up en/decoding

93e27c9

ptoffy marked this pull request as ready for review November 18, 2024 15:54

ptoffy requested review from 0xTim and gwynne as code owners November 18, 2024 15:54

Remove NIO and add some docs

2e683dd

ptoffy requested review from czechboy0 and Joannis and removed request for adam-fowler November 18, 2024 20:06

Joannis reviewed Nov 19, 2024

View reviewed changes

Fix imports and rename some files

eabe133

ptoffy requested a review from adam-fowler January 1, 2025 18:58

Address feedback

69ab2c8

ptoffy requested a review from 0xTim January 1, 2025 21:17

0xTim approved these changes Jan 2, 2025

View reviewed changes

Joannis reviewed Jan 4, 2025

View reviewed changes

Sources/MultipartKit/MultipartSerializer.swift Outdated Show resolved Hide resolved

Joannis reviewed Jan 4, 2025

View reviewed changes

Sources/MultipartKit/MultipartSerializer.swift Outdated Show resolved Hide resolved

Joannis reviewed Jan 4, 2025

View reviewed changes

Sources/MultipartKit/MultipartSerializer.swift Outdated Show resolved Hide resolved

Joannis reviewed Jan 4, 2025

View reviewed changes

Sources/MultipartKit/MultipartSerializer.swift Outdated Show resolved Hide resolved

Joannis requested changes Jan 4, 2025

View reviewed changes

Reduce temporary allocations

40621cf

ptoffy requested a review from Joannis January 6, 2025 18:48

gwynne requested changes Jan 6, 2025

View reviewed changes

ptoffy added 2 commits January 7, 2025 11:05

Fix oversight

460c1e6

Apply suggestions

19dd6fa

ptoffy requested a review from gwynne January 7, 2025 10:06

gwynne approved these changes Jan 7, 2025

View reviewed changes

Joannis approved these changes Jan 7, 2025

View reviewed changes

MahdiBM reviewed Jan 7, 2025

View reviewed changes

Sources/MultipartKit/MultipartSerializer.swift Show resolved Hide resolved

ptoffy removed the semver-major Breaking changes label Jan 8, 2025

adam-fowler reviewed Jan 8, 2025

View reviewed changes

ptoffy added 2 commits January 8, 2025 15:39

Move collation parsing to the original sequence

746c3bd

Fix test

274abad

ptoffy merged commit beea78b into main Jan 8, 2025
10 of 11 checks passed

ptoffy deleted the v5 branch January 8, 2025 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultipartKit V5 #100

MultipartKit V5 #100

ptoffy commented Oct 7, 2024

adam-fowler left a comment

ptoffy commented Oct 30, 2024

ptoffy commented Nov 18, 2024

Joannis Nov 19, 2024

ptoffy Nov 19, 2024

Joannis Nov 19, 2024

ptoffy Nov 19, 2024 •

edited

Loading

Joannis Jan 4, 2025

ptoffy commented Jan 1, 2025

0xTim left a comment

Joannis left a comment

gwynne left a comment

Joannis left a comment

ptoffy commented Jan 7, 2025

adam-fowler left a comment

adam-fowler commented Jan 8, 2025

0xTim commented Jan 8, 2025

ptoffy commented Jan 8, 2025

ptoffy commented Jan 8, 2025

MultipartKit V5 #100

MultipartKit V5 #100

Conversation

ptoffy commented Oct 7, 2024

adam-fowler left a comment

Choose a reason for hiding this comment

ptoffy commented Oct 30, 2024

ptoffy commented Nov 18, 2024

Joannis Nov 19, 2024

Choose a reason for hiding this comment

ptoffy Nov 19, 2024

Choose a reason for hiding this comment

Joannis Nov 19, 2024

Choose a reason for hiding this comment

ptoffy Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Joannis Jan 4, 2025

Choose a reason for hiding this comment

ptoffy commented Jan 1, 2025

0xTim left a comment

Choose a reason for hiding this comment

Joannis left a comment

Choose a reason for hiding this comment

gwynne left a comment

Choose a reason for hiding this comment

Joannis left a comment

Choose a reason for hiding this comment

ptoffy commented Jan 7, 2025

adam-fowler left a comment

Choose a reason for hiding this comment

adam-fowler commented Jan 8, 2025

0xTim commented Jan 8, 2025

ptoffy commented Jan 8, 2025

ptoffy commented Jan 8, 2025

ptoffy Nov 19, 2024 •

edited

Loading