Skip to content

mimetypes module missing modern OpenXML Office formats (.docx, .xlsx, .pptx) in Python 3.13 #137946

@ivan-mihailov

Description

@ivan-mihailov

Bug report

Bug description:

Summary

The mimetypes module in Python 3.13 is missing mappings for modern Microsoft Office OpenXML formats, causing guess_extension() and guess_all_extensions() to return None/[] for common MIME types like application/vnd.openxmlformats-officedocument.wordprocessingml.document.

Environment

  • Python Version: 3.13.3
  • Platform: Docker containers (minimal base images)
  • Module: mimetypes

Problem Description

When using mimetypes.guess_extension() with modern Office MIME types, the function returns None instead of the expected file extensions:

import mimetypes

# These should return extensions but don't:
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.wordprocessingml.document'))  # Should be '.docx'
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'))  # Should be '.xlsx'  
print(mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.presentationml.presentation'))  # Should be '.pptx'

# All return: None

Missing MIME Type Mappings

The following critical modern Office formats are missing:

MIME Type Expected Extension Status
application/vnd.openxmlformats-officedocument.wordprocessingml.document .docx ❌ Missing
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet .xlsx ❌ Missing
application/vnd.openxmlformats-officedocument.presentationml.presentation .pptx ❌ Missing
application/msword .doc ✅ Present
application/vnd.ms-excel .xls ✅ Present
application/vnd.ms-powerpoint .ppt ✅ Present

Impact

This affects applications that:

  • Process files from URLs without extensions (signed URLs, cloud storage)
  • Rely on HTTP Content-Type headers to determine file types
  • Run in Docker environments with minimal system MIME databases
  • Need to handle modern Office documents (very common use case)

Environment Differences

  • Local development: Often works due to fuller system MIME databases
  • Docker containers: Fails due to minimal mimetypes database (154 mappings vs 400+ locally)

Reproduction

import mimetypes

print(f"Database size: {len(mimetypes.types_map)} types")
print(f"DOCX support: {mimetypes.guess_extension('application/vnd.openxmlformats-officedocument.wordprocessingml.document')}")
print(f"Available extensions: {mimetypes.guess_all_extensions('application/vnd.openxmlformats-officedocument.wordprocessingml.document')}")

Expected: .docx extension returned
Actual: None returned

Suggested Fix

Add the missing OpenXML MIME type mappings to the mimetypes module's built-in database:

# Should be added to mimetypes default mappings:
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',

These formats have been standard since Microsoft Office 2007 (17+ years) and are widely used in web applications.

Workaround

Currently using a manual fallback mapping:

FALLBACK_MIME_MAPPING = {
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document': '.docx',
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': '.xlsx',
    'application/vnd.openxmlformats-officedocument.presentationml.presentation': '.pptx',
}

CPython versions tested on:

3.13

Operating systems tested on:

Linux, Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    pendingThe issue will be closed if no feedback is providedtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions