Skip to content

Email MIME base64 parsing issue #137687

@spencerwuwu

Description

@spencerwuwu

Bug report

Bug description:

Hi Cpython Developers,

I was testing and comparing different email parsers, and found a parsing discrepancy that seems to be a problem.

MIME-Version: 1.0
Content-Type: application/zip
Content-Disposition: attachment; filename=archive.zip
Content-Transfer-Encoding: base64

UEsDBBQAAAAIAA==
emVkIGZpbGUgY29udGVudA==

With the python's email get_payload method, the return content would stopped at the first "==" as it seems to be the default behavior of base64.b64decode.
Meanwhile, peer implementations (e.g. apache.commons.mal (java), MimeKit (c#), PhpMimeMailParser (php)) will return the whole content.

Below is an running example in python.

import base64
import email

"""
  Parsing the mime format
"""
request = """MIME-Version: 1.0
Content-Type: application/zip
Content-Disposition: attachment; filename=archive.zip
Content-Transfer-Encoding: base64

UEsDBBQAAAAIAA==
emVkIGZpbGUgY29udGVudA==
"""
msg = email.message_from_string(request)
print("Part content:", repr(msg.get_payload(decode=True)))
print()

"""
  Examples of base64
"""
contents = [
        "UEsDBBQAAAAIAA==\nemVkIGZpbGUgY29udGVudA==",
        "UEsDBBQAAAAIAA==emVkIGZpbGUgY29udGVudA==",
        "UEsDBBQAAAAIAA=emVkIGZpbGUgY29udGVudA==",
        "UEsDBBQAAAAIAAemVkIGZpbGUgY29udGVudA==",
        "UEsDBBQAAAAIAA==",
        "emVkIGZpbGUgY29udGVudA=="
        ]
for content in contents:
    decoded_bytes = base64.b64decode(content)
    print(repr(content), " ->")
    print("  ", decoded_bytes)

Output:

Part content: b'PK\x03\x04\x14\x00\x00\x00\x08\x00'

'UEsDBBQAAAAIAA==\nemVkIGZpbGUgY29udGVudA=='  ->
   b'PK\x03\x04\x14\x00\x00\x00\x08\x00'
'UEsDBBQAAAAIAA==emVkIGZpbGUgY29udGVudA=='  ->
   b'PK\x03\x04\x14\x00\x00\x00\x08\x00'
'UEsDBBQAAAAIAA=emVkIGZpbGUgY29udGVudA=='  ->
   b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x07\xa6VB\x06f\x96\xc6R\x066\xf6\xe7FV\xe7@'
'UEsDBBQAAAAIAAemVkIGZpbGUgY29udGVudA=='  ->
   b'PK\x03\x04\x14\x00\x00\x00\x08\x00\x07\xa6VB\x06f\x96\xc6R\x066\xf6\xe7FV\xe7@'
'UEsDBBQAAAAIAA=='  ->
   b'PK\x03\x04\x14\x00\x00\x00\x08\x00'
'emVkIGZpbGUgY29udGVudA=='  ->
   b'zed file content'

Thank you,
Wei-Cheng

CPython versions tested on:

3.15

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions