msg_obj.save_email_file() is saving eml with empty attachments #218

danieldiezmallo · 2021-07-14T09:34:27Z

Hello,

I have been experimenting with the library to load .msg files in that format and convert them to .eml using the msg_obj.save_email_file() method. The msg file object is loaded normally and everything is successful.

The method correctly saves the bodies and metadata of the emails in the .eml file, but all the attachments are saved in the saved file empty. They contain nothing at all. Is this an issue?

Thanks.

danieldiezmallo · 2021-07-14T14:11:00Z

Hello,

I have found the issue: the attachments that are read as binary files, where being corrupted when passed to the olefile library in the msg_parser.py file, in the Message._get_propery_data(). I have corrected the method:

def _get_property_data(self, directory_name, directory_entry, is_list=False):
        directory_entry_name = directory_entry.name
        if is_list:
            stream_name = [directory_name, directory_entry_name]
        else:
            stream_name = [directory_entry_name]

        ole_file = directory_entry.olefile
        property_details = self._get_canonical_property_name(directory_entry_name)
        if not property_details:
            return None

        property_name = property_details.get("name")
        property_type = property_details.get("data_type")
        if not property_type:
            return None

        try:
            raw_content = ole_file.openstream(stream_name).read()
        except IOError:
            raw_content = None
        property_value = self._data_model.get_value(
            raw_content, data_type=property_type
        )
        if property_value:
            
            # If the propery is the data of the attachment it has to be provided raw to preven corruption
            if property_name == 'AttachDataObject':
                property_detail = {property_name: raw_content}
            # Otherwhisle use the olefile lib to get the value
            else:
                property_detail = {property_name: property_value}
        else:
            property_detail = None
        return property_detail

Then, the EmailFormatter._proces_attachments() method, in the email_builder module, method should not decode the bytes stream:

def _process_attachments(self, attachments):
        for attachment in attachments:
            ctype = attachment.AttachMimeTag
            data = attachment.data
            filename = attachment.Filename
            maintype, subtype = ctype.split("/", 1)
                        
            if data is None:
                continue

# Next lines corrupt bynary files and make them unreadable
#             if isinstance(data, bytes):
#                 data = data.decode("utf-8", "ignore")
    
            if maintype == "text" or "message" in maintype:
                attach = MIMEText(data, _subtype=subtype)
            elif maintype == "image":
                attach = MIMEImage(data, _subtype=subtype)
            elif maintype == "audio":
                attach = MIMEAudio(data, _subtype=subtype)
            else:
                attach = MIMEBase(maintype, subtype)
                attach.set_payload(data)

                # Encode the payload using Base64
                encoders.encode_base64(attach)
            # Set the filename parameter
            base_filename = os.path.basename(filename)
            attach.add_header("Content-ID", "<{}>".format(base_filename))
            attach.add_header(
                "Content-Disposition", "attachment", filename=base_filename
            )
            self.message.attach(attach)

Thanks.

vikramarsid · 2021-07-27T22:18:20Z

@danieldiezmallo Thank you for finding the bug. Can you open a PR for the above change ?

DayDotMe · 2022-01-28T10:13:43Z

I just opened a PR with similar bug fix but kept bytes decoding in case MimeType is text. Tests ran fine with Python3.10/Windows 10. I'd be very grateful if you could merge the PR, bump version to 1.2.1 and publish it to pypi.

Many thanks for this project !

BenjaminHoegh · 2022-04-09T07:16:03Z

has this been released? :)

DayDotMe added a commit to DayDotMe/msg_parser that referenced this issue Jan 28, 2022

Fix for vikramarsid#218 - use raw attachment bytes to prevent corruption

1254f60

vikramarsid pushed a commit that referenced this issue Mar 19, 2022

Fix for #218 - use raw attachment bytes to prevent corruption (#244)

4b07d12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

msg_obj.save_email_file() is saving eml with empty attachments #218

msg_obj.save_email_file() is saving eml with empty attachments #218

danieldiezmallo commented Jul 14, 2021

danieldiezmallo commented Jul 14, 2021 •

edited

Loading

vikramarsid commented Jul 27, 2021

DayDotMe commented Jan 28, 2022

BenjaminHoegh commented Apr 9, 2022

msg_obj.save_email_file() is saving eml with empty attachments #218

msg_obj.save_email_file() is saving eml with empty attachments #218

Comments

danieldiezmallo commented Jul 14, 2021

danieldiezmallo commented Jul 14, 2021 • edited Loading

vikramarsid commented Jul 27, 2021

DayDotMe commented Jan 28, 2022

BenjaminHoegh commented Apr 9, 2022

danieldiezmallo commented Jul 14, 2021 •

edited

Loading