When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

indridieinarsson · 2024-09-01T00:19:56Z

Pullrequest

This is a first attempt to fix issue #1119.
I'm not very proficient in php, and not knowledgeable about imap, so for the love of all that is good and holy, review and test this properly.

Take a look at this chunk from hm-imap-base.php:

elseif ($line[$i] == '{') {
$end = mb_strpos($line, '}');
if ($end !== false) {
$literal_size = mb_substr($line, ($i + 1), ($end - $i - 1));
}
$lit_result = $this->read_literal($literal_size, $max, $current_size, $line_length);
$chunk = $lit_result[0];

If I understand this writing correctly: https://www.rfc-editor.org/rfc/rfc3501#section-4.3, the number in curly braces that is being read into $literal_size, is the number of bytes in the email body, which is then sent over the wire right away.
Therefore, all string-size calculations in read_literal() should use the strsize rather than mb_strsize command. My theory is that the issue #1119 is caused by this: the number of bytes read are underestimated since mb_strsize is smaller than strsize for real multibyte strings. We then end up trying to read strings from the server after the server has sent the entire message, causing the code to hang.

Issues

Might possibly fix FastCGI timeout (Not docker related) #1119

Checklist

This touches the code that reads the email body, pretty central for an email client. This PR will change behaviour in cases when the body of the message contains multibyte characters (ÞþæÆöÖðÐáÁóÓ etc.).

kambereBr

Looks good! Fix the pipelines, and we’ll merge. We'll keep an eye on the changes. Thanks!

…e, might have to wait.

indridieinarsson · 2024-09-05T23:38:21Z

@kambereBr : I can't really figure out how to get the Selenium test to pass. Can't see that my code should have change what the test is checking. Is the test somehow broken? Or has a previous commit inadvertently broken it?

indridieinarsson · 2024-09-06T10:38:30Z

I installed the version in my feature-branch, with bunch of emails and 3 accounts. With the exception of an already reported issue, everything works as expected. In particular, the Selenium test that is failing is not reproducible in the browser.

…e reload

indridieinarsson · 2024-09-06T12:04:46Z

Ok, I did some searching around. Seems the test was failing due to a race condition.

We find an element and check its text.
Then we refresh the message list, which also reloads the page, and in that process, the element we were looking for is nuked and rebuild.
Then we go on to the next line to find the element again, but the reference is still pointing to the old element at this point, since the refresh is not yet finished, so we still get a reference to the old element.
the old element is stale, and we get an error
I added some code that waits for the element to be available and not stale. If the element is not available after a timeout of 10s, it should throw an error.

Someone might want to re-review this PR, as I fiddled with the tests.

marclaporte · 2024-09-06T13:14:54Z

This touches the code that reads the email body, pretty central for an email client. This PR will change behaviour in cases when the body of the message contains multibyte characters (ÞþæÆöÖðÐáÁóÓ etc.).

Well, I think it reverts some of the recent changes so it restores code to what is was for a long time.

Here are recent modification on this file:
https://github.com/cypht-org/cypht/commits/master/modules/imap/hm-imap-base.php

marclaporte · 2024-09-06T14:06:45Z

Here is the change: #1051

Could some of the other changes have unintendended negative consequences?

kambereBr · 2024-09-06T14:32:28Z

Here is the change: #1051

Could some of the other changes have unintendended negative consequences?

Probably yes, but it's hard to say for sure. We'll only know when we encounter issues in specific use cases.

marclaporte · 2024-09-06T15:37:19Z

Ok, we'll stay alert :-) #1224

kroky · 2024-09-09T13:08:33Z

@indridieinarsson thanks for the PR but since we have more problems than the mentioned issue with literal string lengths, I combined your code with other fixes and some unit tests to catch this issue in the future here: #1230
Note for the discussion about using 8-bit string functions vs multibyte ones: we prefer and should use the multibyte functions everywhere we deal with strings in Cypht. The only exception seems to be here in imap module where literal lengths is specified in bytes rather than characters. It is actually somewhat strange as the RFC mentions only 7-bit encoding or 8-bit ones but maybe some mail servers still use unencoded multibytes in those literals. Thus, we make sure that read_literal works on bytes and doesn't check individual bytes for valid character values.
The other problem is unfinished changes from the multibyte conversion where accessing a string via array offset is not actually working for multibyte strings - we should use mb_substr in those cases.

Indriði Einarsson added 2 commits August 31, 2024 23:51

Use single-byte string-length to measure length of literal data

e696471

remove commented-out code

6768bcc

marclaporte requested a review from kambereBr September 1, 2024 04:11

kambereBr approved these changes Sep 5, 2024

View reviewed changes

indridieinarsson added 4 commits September 5, 2024 11:14

Remove comment.

9862b5a

Merge branch 'cypht-org:master' into issue_1119

fd9a046

Wait_with_folder_list when reloading folders. We're reloading the pag…

bd8b85c

…e, might have to wait.

Undo changes to reload_folder_test.

342ddcb

indridieinarsson added 2 commits September 6, 2024 00:08

Implicitly wait 10 seconds after reloading all messages

2de5e36

Undo changes to selenium test. Fix wasn't working, no idea why.

f21011c

indridieinarsson added 5 commits September 6, 2024 10:48

Try to kick the main_menu reference to update.

5152540

stupid bug

e464633

Fancy selenium-gymnastics to wait for element re-appearance after pag…

00e1991

…e reload

Even more selenium gymnastics.

4eb0cc8

forgot to import exception types

7050f93

indridieinarsson requested a review from kambereBr September 6, 2024 12:04

Remove commented-out code.

dd66cfe

marclaporte requested a review from kroky September 6, 2024 12:25

indridieinarsson added 2 commits September 6, 2024 12:57

messing around with tests

a88e162

Update folder_list.py

f7eb260

indridieinarsson added 3 commits September 6, 2024 13:21

mess with tests

2e34d01

Comment on test changes.

c5db6ad

cleanup test, and also trigger re-test.

1407bbe

marclaporte mentioned this pull request Sep 6, 2024

Unicode Support: Replace Standard PHP String Functions with Multibyte Counterparts #1051

Merged

marclaporte mentioned this pull request Sep 6, 2024

Stay alert to other potential regressions which arrived in Cypht 2.2.0 related to Unicode Support (Replace Standard PHP String Functions with Multibyte Counterparts) #1224

Closed

marclaporte added the blocker Needs to be addressed before we can release the next version label Sep 8, 2024

kroky closed this Sep 9, 2024

This was referenced Sep 9, 2024

[FIX] Fix multibyte string handling for accurate byte and character operations #1229

Closed

fix multibyte handling in imap literals and address splitting, improve unit tests #1230

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

indridieinarsson commented Sep 1, 2024

kambereBr left a comment

indridieinarsson commented Sep 5, 2024

indridieinarsson commented Sep 6, 2024

indridieinarsson commented Sep 6, 2024

marclaporte commented Sep 6, 2024 •

edited

Loading

marclaporte commented Sep 6, 2024

kambereBr commented Sep 6, 2024

marclaporte commented Sep 6, 2024

kroky commented Sep 9, 2024

When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

When reading literal_data from mail body from IMAP, use byte count instead of string-length count. #1217

Conversation

indridieinarsson commented Sep 1, 2024

Pullrequest

Issues

Checklist

kambereBr left a comment

Choose a reason for hiding this comment

indridieinarsson commented Sep 5, 2024

indridieinarsson commented Sep 6, 2024

indridieinarsson commented Sep 6, 2024

marclaporte commented Sep 6, 2024 • edited Loading

marclaporte commented Sep 6, 2024

kambereBr commented Sep 6, 2024

marclaporte commented Sep 6, 2024

kroky commented Sep 9, 2024

marclaporte commented Sep 6, 2024 •

edited

Loading