Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #11310 - multipart parser dropping some relevant CR bytes in parts #11409

Merged

Conversation

joakime
Copy link
Contributor

@joakime joakime commented Feb 14, 2024

Fixing the MultiPart.Parser handling of part content that would drop occasional CR bytes if the CR occurs at the border of a network buffer.

This was fixed in PR #11388 (as the bug was first isolated during testing in that PR) but this is more fundamental bug unrelated to the goals of that PR so I created a standalone PR for this bug (also so that it shows up in the changelog for 12.0.7 appropriately)

Fixes #11310

Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lachlan-roberts can you look at this?

sbordet
sbordet previously approved these changes Feb 15, 2024
@gregw
Copy link
Contributor

gregw commented Feb 15, 2024

I've looked a little bit more at this and I think we need to never emit a content chunk that ends with a CR, as that may be the first byte of a CRLF preceding a boundary.

We will still need the if (sliceLimit > 0 && buffer.get(sliceLimit - 1) == '\r') code to look for CRs within a buffer, but when sliceLimit==0, then the crContent will have already been set.

I think we can better utilize the existing crContent boolean to remember if there is a hanging CR (skipped as the last byte of a previous chunk), then when we emit the next content,( including a partially matched boundary), we emit the CR content first if the boolean is set.

@sbordet
Copy link
Contributor

sbordet commented Feb 15, 2024

@gregw

we need to never emit a content chunk that ends with a CR, as that may be the first byte of a CRLF preceding a boundary

Not sure what you mean here, but we do emit the content chunk excluding the last CR, which we remember in the boolean until we have another chunk to parse.

Not sure I understand the rest of your comment, but I found and fixed another bug.

@sbordet sbordet requested a review from gregw February 15, 2024 18:26
…he boundary.

Added and fixed related tests.

Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@gregw
Copy link
Contributor

gregw commented Feb 16, 2024

I think I have convinced myself this is correct.... but I also think it needs either some better comments, and/or perhaps some simplifications.
I'll have a fiddle this afternoon and make some suggestions.

@gregw
Copy link
Contributor

gregw commented Feb 16, 2024

@joakime @sbordet I have pushed to this branch some simplifications to the logic and some comments that make it easier for me to understand. If you think they are at all controversial, let's revert and hangout to clarify.

@sbordet
Copy link
Contributor

sbordet commented Feb 16, 2024

Sorry @gregw, I liked the code better before your changes, as it was clearer what cases did what.
The new code is more difficult to follow for me because of fall-throughs that now force you do re-read the code above to figure out what cases are covered. I liked the if/else blocks better because you knew exactly what case it was.

Also, most of your added comments are just repetitions of what the code does on the next line, so they are unnecessary, and they will rotten away if the code changes.

Lastly, can you please be consistent and start all your comment with a capital letter and end with a period? (Sorry to be pedantic on this, but it's really confusing -- also the "conversation" style of the comments make you re-read from above otherwise the comment is out of context).
Reading many of your comments is unclear if there is a line before that was forgotten (because it starts with a lowercase), or a line after that was forgotten (because it does not end with a period).
Because in other cases you do use a capital to start and a period to end, so one wonders what's missing in the ones that are not like that.

@gregw
Copy link
Contributor

gregw commented Feb 16, 2024

@sbordet if it is just the fall throughs that you don't like, then let me put the if elses back, as I think there were some good simplifications even without that.
I'll review the comments as well, for needless repetition of the code.

If you don't like that, then I'll revert both commits and give a negative review on the bits that I found very confusing and let you have a go of either improving them or better comments.

standby...

@gregw gregw force-pushed the fix/12.0.x/multipart-parser-missing-part-CR-bytes branch from b0cd639 to 781105f Compare February 16, 2024 17:53
Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic needs cleanup and better comments

@@ -1339,7 +1339,8 @@ private boolean parseContent(Content.Chunk chunk)
}
else
{
// The boundary was partially matched.
// The boundary was partially matched, but it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is confusing with nested if structure:

if (boudnaryMatch > 0)
{
    if (boundaryMath == X)
    {
        // A
    }
    else
    {
        // B
    }
}
else
{
    // C
}

This should be:

if (boudnaryMatch > 0)
{
    // A
} 
else if (boundaryMath == X)
{
    // B
}
else
{
    // C
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not equivalent.
In your version, when boundaryMatch == X it will execute the branch where boundaryMatch > 0, which is wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops I meant:

if (boudnaryMatch == X)
{
    // A
} 
else if (boundaryMath > 0)
{
    // B
}
else
{
    // C
}

@joakime joakime requested a review from sbordet February 21, 2024 17:05
Signed-off-by: Simone Bordet <simone.bordet@gmail.com>
@sbordet sbordet requested review from gregw and removed request for sbordet February 21, 2024 18:25
Copy link
Contributor

@gregw gregw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the logic can be cleaned up more in this class.... but let's get the CR fix in.

@sbordet sbordet merged commit 2d51170 into jetty-12.0.x Feb 21, 2024
8 checks passed
@sbordet sbordet deleted the fix/12.0.x/multipart-parser-missing-part-CR-bytes branch February 21, 2024 20:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug For general bugs on Jetty side
Projects
No open projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Uploading big multipart files via jetty 12.0.5 with spring boot 3.2.1 cause problems
4 participants