-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Description
Bug report
Bug description:
Summary
The mimetypes
module in Python 3.13 is missing mappings for modern Microsoft Office OpenXML formats, causing guess_extension()
and guess_all_extensions()
to return None
/[]
for common MIME types like application/vnd.openxmlformats-officedocument.wordprocessingml.document
.
Environment
- Python Version: 3.13.3
- Platform: Docker containers (minimal base images)
- Module:
mimetypes
Problem Description
When using mimetypes.guess_extension()
with modern Office MIME types, the function returns None
instead of the expected file extensions:
import mimetypes
# These should return extensions but don't:
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.wordprocessingml.document')) # Should be '.docx'
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')) # Should be '.xlsx'
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.presentationml.presentation')) # Should be '.pptx'
# All return: None
Missing MIME Type Mappings
The following critical modern Office formats are missing:
MIME Type | Expected Extension | Status |
---|---|---|
application/vnd.openxmlformats-officedocument.wordprocessingml.document |
.docx |
❌ Missing |
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
.xlsx |
❌ Missing |
application/vnd.openxmlformats-officedocument.presentationml.presentation |
.pptx |
❌ Missing |
application/msword |
.doc |
✅ Present |
application/vnd.ms-excel |
.xls |
✅ Present |
application/vnd.ms-powerpoint |
.ppt |
✅ Present |
Impact
This affects applications that:
- Process files from URLs without extensions (signed URLs, cloud storage)
- Rely on HTTP Content-Type headers to determine file types
- Run in Docker environments with minimal system MIME databases
- Need to handle modern Office documents (very common use case)
Environment Differences
- Local development: Often works due to fuller system MIME databases
- Docker containers: Fails due to minimal mimetypes database (154 mappings vs 400+ locally)
Reproduction
import mimetypes
print(f"Database size: {len(mimetypes.types_map)} types")
print(f"DOCX support: {mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.wordprocessingml.document')}")
print(f"Available extensions: {mimetypes.guess_all_extensions('application/vnd.openxmlformats-officedocument.wordprocessingml.document')}")
Expected: .docx
extension returned
Actual: None
returned
Suggested Fix
Add the missing OpenXML MIME type mappings to the mimetypes
module's built-in database:
# Should be added to mimetypes default mappings:
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
These formats have been standard since Microsoft Office 2007 (17+ years) and are widely used in web applications.
Workaround
Currently using a manual fallback mapping:
FALLBACK_MIME_MAPPING = {
'application/vnd.openxmlformats-officedocument.wordprocessingml.document': '.docx',
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': '.xlsx',
'application/vnd.openxmlformats-officedocument.presentationml.presentation': '.pptx',
}
CPython versions tested on:
3.13
Operating systems tested on:
Linux, Windows