Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to decode utf8 headers #178

Open
leipert opened this issue Mar 13, 2018 · 4 comments
Open

Best way to decode utf8 headers #178

leipert opened this issue Mar 13, 2018 · 4 comments

Comments

@leipert
Copy link

leipert commented Mar 13, 2018

First of all: Thank you for a great project!

I implemented a custom sorting mechanism which iterates over every message in a mailbox and uses :fetch_field to fetch header, which are then compared to rules. The problem is that some mails have utf8 encoded headers.

value = src:fetch_field(field):sub(start):lower()
print(value)

logs the following (imapfilter -v)

S (8): 100D OK SEARCH completed (Success)
C (8): 100E UID FETCH 11 BODY.PEEK[HEADER.FIELDS (Subject)]
S (8): 100E OK Success
Fetched field "Subject" of example@gmail.com@imap.gmail.com/temp_inbox[11].
=?utf-8?q?=5bslack=5d_notifications_from_the_company_workspace_for_march_=31=33th=2c_=32=30=31=38_at_=35=3a=32=36_pm?=

What would be the best way to retrieve [Slack] Notifications from the company workspace for March 13th, 2018 at 5:26 PM instead of the encoded string? Is there any helper function I could use, or could you expose one, if there isn't?

@leipert
Copy link
Author

leipert commented Mar 13, 2018

Addition: I did not try options.charset = 'UTF-8', but some messages are ISO encoded and some are UTF-8 encoded.

Thank you very much!

@leipert
Copy link
Author

leipert commented Mar 16, 2018

I wrote this helper function:

magicQ = "=?utf-8?q?="
magicQLength = string.len(magicQ)

function qdecode(value)
  if value == nil then
    return value
  end
  if string.sub(value, 1, magicQLength):lower()==magicQ then
    return value:sub(magicQLength)
              :gsub("_", " ")
              :gsub(
                "=([a-fA-F0-9][a-fA-F0-9])",
                function (codePoint)
                  return utf8.char(tonumber(codePoint, 16))
                end
              )
  end
  return value
end

If there isn't any better way, feel free to close this issue :)

@SOwOphie
Copy link

SOwOphie commented Jun 6, 2020

Your script didn't work for me as I encountered headers with embedded encoded sections, as well as "B" (base64) encodings, so I decided to modify it a bit:

function hdr_decode(s)
	local i, j = s:lower():find("=?utf-8?q?", 1, true);
	if i then
		local k, l = s:find("?=", j, true);
		local s_ = s
			:sub(j+1, k-1)
			:gsub("_", " ")
			:gsub("=([a-fA-F0-9][a-fA-F0-9])", function(c) return string.char(tonumber(c, 16)) end);
		return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1))
	end

	i, j = s:lower():find("=?utf-8?b?", 1, true);
	if i then
		local k, l = s:find("?=", j, true);
		local s_ = s:sub(j+1, k-1):gsub("[%w%+/][%w%+/][%w%+/=][%w%+/=]",
			function(w)
				local digits = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
				local a = digits:find(w:sub(1, 1), 1, true);
				local b = digits:find(w:sub(2, 2), 1, true);
				local c = digits:find(w:sub(3, 3), 1, true);
				local d = digits:find(w:sub(4, 4), 1, true);
				return string.char(
					(a-1)*4 + math.floor((b-1)/16),
					(b-1)%16*16 + math.floor(((c or 1)-1)/4),
					((c or 1)-1)%4*64 + ((d or 1)-1)
				):sub(1, d and 3 or c and 2 or 1);
			end
		);
		return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1));
	end

	return s;
end

print(hdr_decode("From: =?utf-8?b?SGVsbG8sIFdvcmxkIQ==?= <=?utf-8?q?hello=5Fworld=40example=2ecom?=>"));
-- > From: Hello, World! <hello_world@example.com>
os.exit(0);

Note: The recursive approach is slow for larger strings, but it should work well enough for e-mail headers. Also, failure will not be graceful if the input is not well-formed. Finally, I'm using string.char instead of utf8.char because I'm currently locked to Lua 5.1 (get your act together gentoo grr...), but that should be easy to replace.

Feel free to use / improve further =)

@hi-flyer
Copy link

Thank you both for the helper. I used LadyBoonami code and it worked just fine. Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants