Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It does not correctly extract long filename. #109

Open
rudyeeee opened this issue Dec 12, 2018 · 5 comments
Open

It does not correctly extract long filename. #109

rudyeeee opened this issue Dec 12, 2018 · 5 comments

Comments

@rudyeeee
Copy link

rudyeeee commented Dec 12, 2018

What I did:

If the filename is too long, it is stored separately.
But when I call (*enmime.Part).FileName, it only reads the last FileName.

example)

To: diix109847@daum.net
From: =?UTF-8?B?7KCA64SQ?= <journal@comtrue.com>
Subject: 11111111111
Message-ID: <dda70b89-f9bd-575c-2db0-0addfa422f53@comtrue.com>
Date: Wed, 12 Dec 2018 17:05:46 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.3.2
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------------BA281A40060E21A764F8C2BD"
Content-Language: ko

This is a multi-part message in MIME format.
--------------BA281A40060E21A764F8C2BD
Content-Type: text/plain; charset=euc-kr; format=flowed
Content-Transfer-Encoding: 7bit

2222222222222222222


--------------BA281A40060E21A764F8C2BD
Content-Type: application/x-zip-compressed;
 name="=?UTF-8?B?6rCc7J247KCV67O0X+qwgTMw6rCcLnppcC56aXA=?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*0*=euc-kr''%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%2E%7A%69;
 filename*1*=%70%2E%7A%69%70

UEsDBBQAAAAIAGl1h02mqssRCgAAAA4AAAAMAAAAMjIyMjIyMjIudHh0MzQEAV0jMGUIAFBL
AQIUABQAAAAIAGl1h02mqssRCgAAAA4AAAAMACQAAAAAAAAAIAAAAAAAAAAyMjIyMjIyMi50
eHQKACAAAAAAAAEAGABpkKrA743UAcTHx6AekNQBxMfHoB6Q1AFQSwUGAAAAAAEAAQBeAAAA
NAAAAAAA
--------------BA281A40060E21A764F8C2BD--

In the above example, the filename is the sum of filename*0* ~ filename*1*, but only filename*1* is read.

What I expected:
sum of filename*0* ~ filename*1*

What I got:
only value of filename*1*

Release or branch I am using:
both master and release v0.4.0

If the filename is longer, it can be as follows:

Content-Disposition: attachment;
 filename*0*=euc-kr''%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0;
 filename*1*=%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8;
 filename*2*=%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30;
 filename*3*=%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0;
 filename*4*=%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8;
 filename*5*=%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30;
 filename*6*=%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0;
 filename*7*=%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8;
 filename*8*=%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30;
 filename*9*=%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0;
 filename*10*=%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA;
 filename*11*=%B8%5F%B0%A2%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2;
 filename*12*=%33%30%B0%B3%B0%B3%C0%CE%C1%A4%BA%B8%5F%B0%A2%33%30%B0%B3;
 filename*13*=%2E%7A%69%70
@rudyeeee rudyeeee changed the title Incorrect parsing in Filename with the following structure It does not correctly extract long filename. Dec 12, 2018
@jhillyerd
Copy link
Owner

Thanks, I don't remember seeing that in the RFCs, will have look into it more.

@requaos
Copy link
Collaborator

requaos commented Aug 30, 2019

This is currently implemented in the mime package: https://golang.org/src/mime/mediatype.go?s=4428:4473#L167
However, only 'us-ascii' and 'utf-8' are supported at this time: https://golang.org/src/mime/mediatype.go?s=5620:5663#L223

As is, you will get a "hex percent-encoded" result, but if the encoding is not us-ascii or utf-8 then filename data on the same line as the encoding declaration will be omitted.

@requaos
Copy link
Collaborator

requaos commented Aug 30, 2019

@jhillyerd If we want to implement this fix we have two options: PR into golang mime package, which would bring the "enmime/internal/coding/charsets.go" for the ride, or copy out ParseMediaType such that we can amend it.

Negative reasons to support copying and amending:

  • code quality: I don't have a high confidence that our code is up to the scrutiny of the go developers. I also doubt it would be acceptable to link mime pkg to transform pkg in such a way.
  • level of effort to contribute to golang/go: Sometimes these MRs just sit there for 1+yrs

Positive reasons to support copying and amending:

  • Gain the ability to consolidate our various ParseMediaType fixes
  • Is an immediate solution
  • Provides a staging ground for getting our copy of ParseMediaType upto quality standards for making a contribution to golang/go in the future

@requaos
Copy link
Collaborator

requaos commented Aug 30, 2019

Here is a playground exposing the result: https://play.golang.org/p/zOX_CQAV23l

@jhillyerd jhillyerd added this to the v1.0.0 milestone Dec 18, 2020
@jhillyerd
Copy link
Owner

Just tested on playground, this is still broken in the current version of Go (only us-ascii and utf-8 supported).

I think copying ParseMediaType into our codebase is the better option. I'm not sure how frequently this comes up in the wild. If others run into this, please comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants