Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing a list of addresses returns empty set if there is an invalid email #190

Open
indrora opened this issue Apr 10, 2018 · 4 comments
Open

Comments

@indrora
Copy link

indrora commented Apr 10, 2018

Python 3.6.2 (v3.6.2:5fd33b5, Jul  8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from flanker.addresslib import address
INFO:flanker.addresslib._parser.parser:building mailbox parser
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url_list' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'delim' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'url' is unreachable
INFO:flanker.addresslib._parser.parser:building addr_spec parser
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url_list' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'delim' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'url' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'angle_addr' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'name_addr' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'phrase' is unreachable
INFO:flanker.addresslib._parser.parser:building url parser
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url_list' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'delim' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'addr_spec' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'angle_addr' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'name_addr' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'phrase' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'local_part' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'domain' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'quoted_string' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'domain_literal' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'quoted_string_text' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'domain_literal_text' is unreachable
INFO:flanker.addresslib._parser.parser:building mailbox_or_url parser
WARNING:flanker.addresslib._parser.parser:Symbol 'mailbox_or_url_list' is unreachable
WARNING:flanker.addresslib._parser.parser:Symbol 'delim' is unreachable
INFO:flanker.addresslib._parser.parser:building mailbox_or_url_list parser
>>> address.parse_list('bob@example.net,steve@example.com')
[bob@example.net, steve@example.com]
>>> address.parse_list('bob@example.net,steve@example.com, potato')
[]
@horkhe
Copy link
Member

horkhe commented Apr 10, 2018

Passing comma separated string to parse_list is an anti-pattern, because the ambiguity of grama does not allow reliably parse lists that contain garbage like this one. It is better pass a list of strings to parse_list when possible e.g.: address.parse_list(['bob@example.net','steve@example.com','potato'])

@indrora
Copy link
Author

indrora commented Apr 10, 2018

The documented behavior and observed behavior differ. It would probably make sense to update the documentation that indicates this is totally reasonable and supported, similarly with the API reference that indicates that this is allowed.

@horkhe
Copy link
Member

horkhe commented Apr 10, 2018

You are right, the documentation needs to be updated. The comma separated string parser used to work kind of ok, before support for UTF-8 addresses was added. At that point grama became too ambiguous because we needed to support not only RFC valid addresses but also addresses that are technically invalid but de-facto used on the Internet. That made it difficult to distinguish whether a comma is an address separator or a part of an address.

@wedi
Copy link

wedi commented Apr 22, 2020

I just stumpled upon such a case with an email from a law firm, where the From header looked like this:

From: =?UTF-8?Q?Argonaut_=7c_D=c3=bcmpling=2c_Giant_=26_Partner?= <mollusc@example.com>
<==>
From: Argonaut | Dümpling, Giant & Partner <mollusc@example.com>

parse_list() fails due to the comma:

Python 3.7.7 (default, Mar 10 2020, 15:43:03)
[Clang 11.0.0 (clang-1100.0.33.17)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from flanker.addresslib import address
>>> address.parse_list("Argonaut | Dümpling, Giant & Partner <mollusc@example.com>")
[]
>>> address.parse_list("Argonaut | Dümpling Giant & Partner <mollusc@example.com>")
[Argonaut | Dümpling Giant & Partner <mollusc@example.com>]

Is it reasonable/possible to add a non strict mode to parse_list() like parse() has?

>>> address.parse("Argonaut | Dümpling, Giant & Partner <mollusc@example.com>")
"Argonaut | Dümpling, Giant & Partner" <mollusc@example.com>
>>> address.parse("Argonaut | Dümpling, Giant & Partner <mollusc@example.com>", strict=True)
>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants