Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msg_obj.save_email_file() is saving eml with empty attachments #218

Open
danieldiezmallo opened this issue Jul 14, 2021 · 4 comments
Open

Comments

@danieldiezmallo
Copy link

Hello,

I have been experimenting with the library to load .msg files in that format and convert them to .eml using the msg_obj.save_email_file() method. The msg file object is loaded normally and everything is successful.

The method correctly saves the bodies and metadata of the emails in the .eml file, but all the attachments are saved in the saved file empty. They contain nothing at all. Is this an issue?

Thanks.

@danieldiezmallo
Copy link
Author

danieldiezmallo commented Jul 14, 2021

Hello,

I have found the issue: the attachments that are read as binary files, where being corrupted when passed to the olefile library in the msg_parser.py file, in the Message._get_propery_data(). I have corrected the method:

def _get_property_data(self, directory_name, directory_entry, is_list=False):
        directory_entry_name = directory_entry.name
        if is_list:
            stream_name = [directory_name, directory_entry_name]
        else:
            stream_name = [directory_entry_name]

        ole_file = directory_entry.olefile
        property_details = self._get_canonical_property_name(directory_entry_name)
        if not property_details:
            return None

        property_name = property_details.get("name")
        property_type = property_details.get("data_type")
        if not property_type:
            return None

        try:
            raw_content = ole_file.openstream(stream_name).read()
        except IOError:
            raw_content = None
        property_value = self._data_model.get_value(
            raw_content, data_type=property_type
        )
        if property_value:
            
            # If the propery is the data of the attachment it has to be provided raw to preven corruption
            if property_name == 'AttachDataObject':
                property_detail = {property_name: raw_content}
            # Otherwhisle use the olefile lib to get the value
            else:
                property_detail = {property_name: property_value}
        else:
            property_detail = None
        return property_detail

Then, the EmailFormatter._proces_attachments() method, in the email_builder module, method should not decode the bytes stream:

def _process_attachments(self, attachments):
        for attachment in attachments:
            ctype = attachment.AttachMimeTag
            data = attachment.data
            filename = attachment.Filename
            maintype, subtype = ctype.split("/", 1)
                        
            if data is None:
                continue

# Next lines corrupt bynary files and make them unreadable
#             if isinstance(data, bytes):
#                 data = data.decode("utf-8", "ignore")
    
            if maintype == "text" or "message" in maintype:
                attach = MIMEText(data, _subtype=subtype)
            elif maintype == "image":
                attach = MIMEImage(data, _subtype=subtype)
            elif maintype == "audio":
                attach = MIMEAudio(data, _subtype=subtype)
            else:
                attach = MIMEBase(maintype, subtype)
                attach.set_payload(data)

                # Encode the payload using Base64
                encoders.encode_base64(attach)
            # Set the filename parameter
            base_filename = os.path.basename(filename)
            attach.add_header("Content-ID", "<{}>".format(base_filename))
            attach.add_header(
                "Content-Disposition", "attachment", filename=base_filename
            )
            self.message.attach(attach)

Thanks.

@vikramarsid
Copy link
Owner

@danieldiezmallo Thank you for finding the bug. Can you open a PR for the above change ?

DayDotMe added a commit to DayDotMe/msg_parser that referenced this issue Jan 28, 2022
@DayDotMe
Copy link
Contributor

I just opened a PR with similar bug fix but kept bytes decoding in case MimeType is text. Tests ran fine with Python3.10/Windows 10. I'd be very grateful if you could merge the PR, bump version to 1.2.1 and publish it to pypi.

Many thanks for this project !

@BenjaminHoegh
Copy link

has this been released? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants