Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart emails with base64 encoded plaintext and html parts causing recursive thread locking. #130

Closed
outreach-soren opened this issue Sep 7, 2018 · 13 comments · Fixed by sisimai/p5-sisimai#293

Comments

@outreach-soren
Copy link

outreach-soren commented Sep 7, 2018

Hey @azumakuniyuki, I'm attempting to debug an issue that we encountered here at Outreach a little while ago, where a multipart/report message with an immediately nested multipart/alternative block, with base64 encoded text/plain and text/html sub-sections, caused Sisimai to recursively lock threads in the vm that was trying to process the message.

I've been digging into the cause of this issue and I have a few questions for you.

  1. Is sisimai supposed to be able to handle multipart messages with separately encoded base64 sections?
  2. At the end of this block in the rfc3464.scan method, it seems like the current line of the body that is being analyzed is unconditionally appended to the diagnosis. This causes the diagnosis for the final message instance that is created to be an enourmous block of base64 encoded text, and seems to be what causes the thread locking when we later call Sisimai::Data.make(data: the_created_message, hook: nil, input: 'email'). What is the intention behind appending every line of the body to the diagnosis property? Is this necessary?

Insight would be greatly appreciated.

Thanks

@azumakuniyuki
Copy link
Member

Hello @outreach-soren, Thanks for the bug report.

  1. Is sisimai supposed to be able to handle multipart messages with separately encoded base64 sections?

Yes, Sisimai::MIME.mimedocode (called from Sisimai::Message::Email) support both Base64 and Quoted-Printable encodings.

  1. At the end of this block in the rfc3464.scan method, ...

MIME encoded message body is decoded to plain text before calling each scan method in Sisimai::Bite::* and Sisimai::RFC3464 inside of Sisimai::Message::Email.parse method. However, the value of diagnosis is encoded text, it may be a bug of Sisimai::MIME class.

And then, "diagnosis" property is for keeping an error message of the email bounce and is used for deciding the value of "reason" property inside of Sisimai::Data.make method. If "diagnosis" property keeps the entire message body, it is a bug of Sisimai::MIME class.

Would you post the entire email message including headers to this issue ? (or send us)

Thanks

@outreach-soren
Copy link
Author

outreach-soren commented Sep 10, 2018

Hey @azumakuniyuki, thanks for the quick reply.

  1. Ahh I see that now. MIME.decode looks like it might parse my problem message correctly if the parts were already split out, but the error I'm describing is happening in the Sisimai::Message::Email.parse call on line 84 of Sisimai::Message::Email, which doesn't utilize MIME.decode and seems to utilize a less robust decoding method that doesn't account for multipart messages (line 396 - 434). Thus the bodystring that's getting passed to the scanners in the block at line 469 is still just the body content with all it's separate parts still encoded.

  2. That makes sense. I think in this particular case the message body contains much more than just a simple message about why the original email bounced. Even if it was decoded, the diagnosis would end up being a large block of plain text and / or html. I get the intention behind it though.

I can't post the original message that caused the issue for confidentiality reasons but I will post the body structure that I believe is causing the problem.

@outreach-soren
Copy link
Author

Here's the entire body with the content of the actual parts omitted. If there are other aspects that would be helpful to see then let me know and I can post more.

Content-Type: multipart/report;
boundary=\"_000_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_\";
report-type=delivery-status
MIME-Version: 1.0

--_000_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_
Content-Type: multipart/alternative;
  boundary=\"_002_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_\"

--_002_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_
Content-Type: text/plain; charset=\"utf-8\"
Content-Transfer-Encoding: base64

**base64 encoded text here**

--_002_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_
Content-Type: text/html; charset=\"utf-8\"
Content-ID: <028D1740CD4784428989516A3AD5753B@namprd12.prod.outlook.com>
Content-Transfer-Encoding: base64

**base64 encoded html here**

--_002_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_--

--_000_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_
Content-Type: message/delivery-status

**non-encoded delivery-status content**

--_000_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_
Content-Type: message/rfc822
Content-Disposition: attachment;
creation-date=\"Wed, 08 Aug 2018 16:49:10 GMT\";
modification-date=\"Wed, 08 Aug 2018 16:49:10 GMT\"
Content-ID: <A8EA60C60DD315458AC891C7D35CDDD6@namprd12.prod.outlook.com>

**attached mime-content from original message with its own encoded multipart body**

--_000_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_--

@azumakuniyuki
Copy link
Member

Hello, I've created a sample email based on your post: https://gist.github.com/azumakuniyuki/cd0f093d7b72f93a22dfe523464c7d04

Email headers are copied from https://github.com/sisimai/set-of-emails/blob/master/maildir/bsd/email-office365-08.eml, delivery status field is copied from https://github.com/sisimai/set-of-emails/blob/master/maildir/bsd/rfc3464-29.eml.

Is the sample email on Gist the same structure with the bounce mail you have ?

And then, would you tell me how to get a bounce mail which have the same structure ?

  • I have an outlook.com account and can send email to anywhere.
  • Where to send a email ? Gmail ? or specific recipient ? someone who has an email address ?
  • Any attachment needed ?

Thanks.

@outreach-soren
Copy link
Author

Hey @azumakuniyuki, thanks for that the structure is roughly the same but after experimenting a bit I've realized that the issue actually does not occur without the attached mailing.

To make things easier I've created this slimmed down version of the original problem email, which I've verified does reproduce the issue of the body not being decoded. This is directly based on the original message.

https://gist.github.com/outreach-soren/ed8aa7160c18071894dbc4874072cd63

To reproduce run

mesg = Sisimai::Message.new(data: sample_message_content, field: [], hook: nil)
# mesg.ds[0]["diagnosis"] => the still encoded body of the message

The thread locking does not occur with this case as the body is much smaller than the actual message, but the root issue of the body not being decoded correctly does occur.

@azumakuniyuki
Copy link
Member

@outreach-soren Hello, Thanks for the sample email ! I've got the following results which have no decoded base64 test in "diagnosticcode". I'll try to fix code to decode MIME encoded text.

    "diagnosticcode": "boundary=\\\"_002_b0fe4287ca4b491c8dbf382185911b06DM5PR10MB1513namprd10pr_\\\" aGVyZSBpcyBzb21lIHRleHQgdGhhdCB3aWxsIGJlIGVuY29kZWQgZm9yIHRoZSBtZXNzYWdlISBoZXJlIGlzIHNvbWUgdGV4dCB0aGF0IHdpbGwgYmUgZW5jb2RlZCBmb3IgdGhlIG1lc3NhZ2UhIGhlcmUgaXMgc29tZSB0ZXh0IHRoYXQgd2lsbCBiZSBlbmNvZGVkIGZvciB0aGUgbWVzc2FnZSEgaGVyZSBpcyBzb21lIHRleHQgdGhhdCB3aWxsIGJlIGVuY29kZWQgZm9yIHRoZSBtZXNzYWdlISBoZXJlIGlzIHNvbWUgdGV4dCB0aGF0IHdpbGwgYmUgZW5jb2RlZCBmb3IgdGhlIG1lc3NhZ2UhIGhlcmUgaXMgc29tZSB0ZXh0IHRoYXQgd2lsbCBiZSBlbmNvZGVkIGZvciB0aGUgbWVzc2FnZSEgaGVyZSBpcyBzb21lIHRleHQgdGhhdCB3aWxsIGJlIGVuY29kZWQgZm9yIHRoZSBtZXNzYWdlISBoZXJlIGlzIHNvbWUgdGV4dCB0aGF0IHdpbGwgYmUgZW5jb2RlZCBmb3IgdGhlIG1lc3NhZ2UhIGhlcmUgaXMgc29tZSB0ZXh0IHRoYXQgd2lsbCBiZSBlbmNvZGVkIGZvciB0aGUgbWVzc2FnZSEg Content-ID: <028D1740CD4784428989516A3AD5753B@namprd12.prod.outlook.com> PCFET0NUWVBFIGh0bWw+CjxodG1sPgo8aGVhZD4KICA8dGl0bGU+c29tZSBodG1sIGNvbnRlbnQ8L3RpdGxlPgo8L2hlYWQ+Cjxib2R5PgogIDxkaXY+CiAgICB0aGlzIGh0bWwgY29udGVudCB3aWxsIGJlIGJhc2U2NCBlbmNvZGVkCiAgPC9kaXY+CjwvYm9keT4KPC9odG1sPg==",

Thanks.

@outreach-soren
Copy link
Author

Great, that's the same result I get. Thank you so much for taking a look!

@outreach-soren
Copy link
Author

Hey @azumakuniyuki, I've continued debugging this for a while, and in the flow I am following at least, the Sisimai::MIME.mimedocode method you mentioned is not called on the body prior to being passed to the scanner block in Sisimai::Message::Email.parse. From debugging, it seems that the body that is passed to Sisimai::Message::Email.parse is taken directly from Sisimai::Message::Email.divideup which just splits the raw mime into the headers and the body, and does not do any decoding.

Have you made any progress on this issue?

I'd also be curious as to whether you have any insight on the above. My concern is that even emails that do not cause this particular issue may not be getting decoded correctly before they are scanned for bounce data.

@azumakuniyuki
Copy link
Member

Hello, I'm trying to implement a new method for decoding MIME encoded text (to flatten multipart) in Sisimai::MIME module, to rewrite code for calling Sisimai::MIME module in Sisimai::Message::Email.parse method, but it is not progressing because my main job is busy...

Debugging summary is the following:

  • Implement Sisimai::MIME.makeflat method
    • MIME decode for text/* and message/* only
    • Remove non-text part (image/* and so on)
    • Returns decoded text to a caller (Sisimai::Message::Email.parse)
  • Fix code to call Sisimai::MIME module in Sisimai::Message::Email.parse method

@outreach-soren
Copy link
Author

Hey @azumakuniyuki, thanks for the update. I'm glad you see the same issue that I see. It sounds like my concerns will be resolved by the changes you outline.

I appreciate you taking the time to work on it. If you have more detailed action items I would be happy to help out to get this issue resolved. Thanks!

@azumakuniyuki
Copy link
Member

@outreach-soren I'm sorry for the late reply :-( Debug is in 80% progress and we will be able to commit fixed code in a couple of days.

azumakuniyuki added a commit to sisimai/p5-sisimai that referenced this issue Oct 14, 2018
@azumakuniyuki azumakuniyuki reopened this Oct 14, 2018
@azumakuniyuki
Copy link
Member

@outreach-soren I've tagged v4.23.0p1 to fixed code. Would you try to v4.23.0p1 ?

Thanks.

azumakuniyuki added a commit to sisimai/set-of-emails that referenced this issue Oct 15, 2018
@outreach-soren
Copy link
Author

Hey @azumakuniyuki! Pulled and checked out the most recent version locally and it resolves the issue for the original message, as well as every other test case I've thrown at it. Thanks so much for resolving this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants