Skip to content

Commit

Permalink
fix(msg): use python-oxmsg for MSG email parsing
Browse files Browse the repository at this point in the history
`partition_msg()` previously used the `msg_parser` library for parsing
Outlook MSG email files (.msg files). The `msg_parser` library is
unmaintained and has several major shortcomings such as not being able
to parse MSG files with 8-bit encoded strings and not reliably
extracting attachments.

Use the new and permissively licenced `python-oxmsg` library instead.

For reviewability purposes, this PR temporarily places the new
`partition_msg()` implementation in `new_msg.py` and references that
implementation from `msg.py`. `new_msg.py` will be renamed to `msg.py`
in a closely following PR. This avoids a very messing interleaving of
hunks in a diff between the old and re-written new implementation.
  • Loading branch information
scanny committed Jun 4, 2024
1 parent 1dede50 commit 334ce68
Show file tree
Hide file tree
Showing 9 changed files with 587 additions and 372 deletions.
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
## 0.14.5-dev0

### Enhancements

* **Use `python-oxmsg` for `partition_msg()`.** Outlook MSG emails are now partitioned using the `python-oxmsg` package which resolves some shortcomings of the prior MSG parser.

### Features

### Fixes

* **8-bit string Outlook MSG files are parsed.** `partition_msg()` is now able to parse non-unicode Outlook MSG emails.
* **Attachments to Outlook MSG files are extracted intact.** `partition_msg()` is now able to extract attachments without corruption.

## 0.14.4

### Enhancements
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ lint.select = [
]
lint.ignore = [
"COM812", # -- over aggressively insists on trailing commas where not desireable --
"PT001", # -- wants empty parens on @pytest.fixture where not used (essentially always) --
"PT005", # -- flags mock fixtures with names intentionally matching private method name --
"PT011", # -- pytest.raises({exc}) too broad, use match param or more specific exception --
"PT012", # -- pytest.raises() block should contain a single simple statement --
Expand Down
2 changes: 1 addition & 1 deletion requirements/extra-msg.in
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
-c ./deps/constraints.txt
-c base.txt

msg_parser
python-oxmsg
14 changes: 11 additions & 3 deletions requirements/extra-msg.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@
#
# pip-compile ./extra-msg.in
#
msg-parser==1.2.0
# via -r ./extra-msg.in
click==8.1.7
# via
# -c ./base.txt
# python-oxmsg
olefile==0.47
# via msg-parser
# via python-oxmsg
python-oxmsg==0.0.1
# via -r ./extra-msg.in
typing-extensions==4.12.0
# via
# -c ./base.txt
# python-oxmsg
Loading

0 comments on commit 334ce68

Please sign in to comment.