Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix HttpPayloadParser dealing with chunked response (#4630) #4801

Closed
wants to merge 570 commits into from

Conversation

rhdxmr
Copy link
Contributor

@rhdxmr rhdxmr commented Jun 7, 2020

HttpPayloadParser waits for trailers indefinitely even if there are no trailers
at the response. This happens when only the last CRLF or the last LF are sent
via separate TCP segment.

When the connection is keep alive and if this bug occurs then users experience
response timeout. But this problem is not exposed when keep alive is disabled
because .feed_eof is called. (Instead of TimeoutError, ClientPayloadError is raised if keep alive is disabled)

What do these changes do?

Fix a bug that HttpPayloadParser waits for data indefintely that will never come.
The bug makes caller of 'await response.read()' awaits forever or for timeout.
There are a few conditions which need to be met in order to reproduce this bug.

  • keep alive enabled connection
  • response body size is large so that data is splitted into many TCP segments
  • Only the last CRLF or just LF is separated alone and is sent via the last TCP segment.
  • asyncio tcp protocol handler for the http response receives TCP payload of the last TCP segment. In order for this the last TCP payload should not be merged with the previous data at lower level such as TCP Segmentation Offload.
  • The response should not be the last response that the connection allows. The keep alive connection processes the limited number of requests, so if the response is the last thing, the connection is going to be closed by server so that .feed_eof method is called explicitly.

Are there changes in behavior for the user?

Improvement experience of users who are suffering from mysterous response timeout. There does not exist any log of which response time is slow in sever access log, but client writes log about response timeout.

Related issue number

#4630

I had a problem with intermittent response timeout so I conducted debug a lot.
And I managed to locate what caused the problem and fixed it on my own.
And then I was going to add test code that prove my modification really fix that thing. So I learned how to run test code in this project. And I ran test codes for the first time.
Surprisingly the issue 4630 just popped up with XPASS labeled. At this point I realized that the issue had been already reported 3 months ago. Sigh.. If the issue was fixed at that time, I would not spend my time for debugging this problem!

I did not mean to provide the fix on behalf of the person who reported the issue before me. I just did not know there was the issue before I started fixing the bug.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • [ ] Documentation reflects the changes => don't need that because this PR just fixes bug of HttpPayloadParser which has nothing to do with normal users.
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

@rhdxmr rhdxmr requested a review from asvetlov as a code owner June 7, 2020 11:05
@rhdxmr
Copy link
Contributor Author

rhdxmr commented Jun 7, 2020

I am going to do some extra works I found at the checklist. And then I will add that changes here.

@rhdxmr rhdxmr requested a review from webknjaz as a code owner June 8, 2020 14:50
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Jun 8, 2020
@codecov-commenter
Copy link

codecov-commenter commented Jun 8, 2020

Codecov Report

Merging #4801 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4801   +/-   ##
=======================================
  Coverage   97.60%   97.60%           
=======================================
  Files          43       43           
  Lines        8932     8938    +6     
  Branches     1406     1408    +2     
=======================================
+ Hits         8718     8724    +6     
  Misses         95       95           
  Partials      119      119           
Impacted Files Coverage Δ
aiohttp/http_parser.py 96.46% <100.00%> (+0.04%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 385b03e...3633eab. Read the comment docs.

@rhdxmr
Copy link
Contributor Author

rhdxmr commented Jun 9, 2020

Hello @asvetlov

Can you review my PR?

And I am wondering why CI fails at test_close of test_web_protocol.py
I am not sure this failure is related to my modification.

# end of stream
self.payload.feed_eof()
return True, chunk[2:]
if len(chunk) >= 2:
Copy link
Contributor

@socketpair socketpair Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much simplier:

head = chunk[:2]
if not head:
    return False, b''
if head == SEP[:1]:
    self._chunk_tail = head
    return False, b''
if head == SEP:
    # end of stream
    self.payload.feed_eof()
    return True, chunk[2:]
self._chunk = ChunkState.PARSE_TRAILERS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @socketpair

I agree with you. So I changed the code simpler as you suggested.
But I moved the `if head == SEP' at the top because it is more likely to happen.

Thanks,

If the last CRLF or only the LF are received via separate TCP segment,
HTTPPayloadParser misjudges that trailers should come after 0\r\n in the
chunked response body.

In this case, HttpPayloadParser starts waiting for trailers, but the only
remaining data to be received is CRLF. Thus, HttpPayloadParser waits trailers
indefinitely and this incurs TimeoutError in user code.

However, if the connection is keep alive disabled, this problem is not
reproduced because the server shutdown the connection explicitly after sending
all data. If the connection is closed .feed_eof is called and it helps
HttpPayloadParser finish its waiting.
@rhdxmr
Copy link
Contributor Author

rhdxmr commented Jun 15, 2020

Hello @webknjaz,
I am wondering that this PR is acceptable or not.
Can you leave a comment about that?

@rhdxmr
Copy link
Contributor Author

rhdxmr commented Jun 29, 2020

Hello @asvetlov, @webknjaz and @socketpair

I think this PR is forgotten.
I hope this PR to be merged soon because I am waiting for new version of aiohttp with this patch applied.
I make use of aiohttp in production, but TimeoutError by this bug have kept occurring for several weeks.

Please review this PR positively.

Thanks,

@socketpair
Copy link
Contributor

@rhdxmr please rebase from master main branch, and re-push. I would merge... but there are failing checks.

@rhdxmr rhdxmr changed the base branch from master to 3.7 July 1, 2020 05:12
@rhdxmr rhdxmr closed this Jul 1, 2020
@rhdxmr rhdxmr deleted the master branch July 1, 2020 05:18
@rhdxmr
Copy link
Contributor Author

rhdxmr commented Jul 1, 2020

RESEND PR: #4846

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.