Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

Closed
wants to merge 19 commits into from

Conversation

indridieinarsson
Copy link
Contributor

Pullrequest

This is a first attempt to fix issue #1119.
I'm not very proficient in php, and not knowledgeable about imap, so for the love of all that is good and holy, review and test this properly.

Take a look at this chunk from hm-imap-base.php:

elseif ($line[$i] == '{') {
$end = mb_strpos($line, '}');
if ($end !== false) {
$literal_size = mb_substr($line, ($i + 1), ($end - $i - 1));
}
$lit_result = $this->read_literal($literal_size, $max, $current_size, $line_length);
$chunk = $lit_result[0];

If I understand this writing correctly: https://www.rfc-editor.org/rfc/rfc3501#section-4.3, the number in curly braces that is being read into $literal_size, is the number of bytes in the email body, which is then sent over the wire right away.
Therefore, all string-size calculations in read_literal() should use the strsize rather than mb_strsize command. My theory is that the issue #1119 is caused by this: the number of bytes read are underestimated since mb_strsize is smaller than strsize for real multibyte strings. We then end up trying to read strings from the server after the server has sent the entire message, causing the code to hang.

Issues

Checklist

This touches the code that reads the email body, pretty central for an email client. This PR will change behaviour in cases when the body of the message contains multibyte characters (ÞþæÆöÖðÐáÁóÓ etc.).

Copy link
Member

@kambereBr kambereBr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Fix the pipelines, and we’ll merge. We'll keep an eye on the changes. Thanks!

@indridieinarsson
Copy link
Contributor Author

@kambereBr : I can't really figure out how to get the Selenium test to pass. Can't see that my code should have change what the test is checking. Is the test somehow broken? Or has a previous commit inadvertently broken it?

@indridieinarsson
Copy link
Contributor Author

I installed the version in my feature-branch, with bunch of emails and 3 accounts. With the exception of an already reported issue, everything works as expected. In particular, the Selenium test that is failing is not reproducible in the browser.

@indridieinarsson
Copy link
Contributor Author

Ok, I did some searching around. Seems the test was failing due to a race condition.

  • We find an element and check its text.
  • Then we refresh the message list, which also reloads the page, and in that process, the element we were looking for is nuked and rebuild.
  • Then we go on to the next line to find the element again, but the reference is still pointing to the old element at this point, since the refresh is not yet finished, so we still get a reference to the old element.
  • the old element is stale, and we get an error
    I added some code that waits for the element to be available and not stale. If the element is not available after a timeout of 10s, it should throw an error.

Someone might want to re-review this PR, as I fiddled with the tests.

@marclaporte marclaporte requested a review from kroky September 6, 2024 12:25
@marclaporte
Copy link
Member

marclaporte commented Sep 6, 2024

This touches the code that reads the email body, pretty central for an email client. This PR will change behaviour in cases when the body of the message contains multibyte characters (ÞþæÆöÖðÐáÁóÓ etc.).

Well, I think it reverts some of the recent changes so it restores code to what is was for a long time.

Here are recent modification on this file:
https://github.com/cypht-org/cypht/commits/master/modules/imap/hm-imap-base.php

@marclaporte
Copy link
Member

Here is the change: #1051

Could some of the other changes have unintendended negative consequences?

@kambereBr
Copy link
Member

Here is the change: #1051

Could some of the other changes have unintendended negative consequences?

Probably yes, but it's hard to say for sure. We'll only know when we encounter issues in specific use cases.

@marclaporte
Copy link
Member

Ok, we'll stay alert :-) #1224

@marclaporte marclaporte added the blocker Needs to be addressed before we can release the next version label Sep 8, 2024
@kroky
Copy link
Member

kroky commented Sep 9, 2024

@indridieinarsson thanks for the PR but since we have more problems than the mentioned issue with literal string lengths, I combined your code with other fixes and some unit tests to catch this issue in the future here: #1230
Note for the discussion about using 8-bit string functions vs multibyte ones: we prefer and should use the multibyte functions everywhere we deal with strings in Cypht. The only exception seems to be here in imap module where literal lengths is specified in bytes rather than characters. It is actually somewhat strange as the RFC mentions only 7-bit encoding or 8-bit ones but maybe some mail servers still use unencoded multibytes in those literals. Thus, we make sure that read_literal works on bytes and doesn't check individual bytes for valid character values.
The other problem is unfinished changes from the multibyte conversion where accessing a string via array offset is not actually working for multibyte strings - we should use mb_substr in those cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Needs to be addressed before we can release the next version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FastCGI timeout (Not docker related)
4 participants