Skip to content

gh-111791: delegating extraction to zipfile module's extractall() method #111824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

sepastian
Copy link

@sepastian sepastian commented Nov 7, 2023

shutil.unpack_archive: deletage extracting ZIP files to zipfile (#111791)

gh-111791: rely on `zipfile.extractall` when extraxting ZIP archives, 
           preventing false negatives and code duplication

As reported in #111791, if the path of a file inside a ZIP file contains "..", e.g. myfile..txt (probably misspelled), shutil.unpack_archive will silently skip extracting the file, because it wrongly assumes a relative path.

This is problematic for two reasons:

  1. The current implementation of shutil.unpack_archive wrongly identifies relative path components. Scanning for ".." does not tell whether a path contains relative components, or not; one must scan for "../" instead.
  2. Besides, files inside a ZIP archive described through relative paths should be extracted, see below.

Python's own zipfile module and the unzip are handling relative path components "../"and names containins ".." correctly. For reference, the man unzip page says:

For security reasons, unzip normally removes parent dir path components (../) from the names of extracted file. This safety feature (new for version 5.50) prevents unzip from accidentally writing files to sensitive areas outside the active extraction folder tree head.

Solution: delegate extracting ZIP archives to Python's own zipfile.extractall method.

Appendix

The following example shows, how extracting a ZIP archive containing paths containing relative components "../" and files with names containing ".." differs in shutil.unpack_archive, zipfile.extractall and the Linux tool unzip.

# Create some files and directories.
$ cd /tmp
$ mkdir -p a/b/c
$ touch a/a.txt a/a..txt a/b/b.txt a/b/b..txt a/b/c/c.txt a/b/c/c..txt
$ find a
a
a/b
a/b/b.txt
a/b/b..txt
a/b/c
a/b/c/c.txt
a/b/c/c..txt
a/a.txt
a/a..txt

# Create a ZIP file for testing.
$ cd /tmp/a/b
$ zip /tmp/test.zip ../a.txt /tmp/a/b/c/c.txt ../../a/b/b.txt

# Inspect the newly created ZIP file.
$ zipinfo /tmp/test.zip
zipinfo /tmp/test.zip 
Archive:  /tmp/test.zip
Zip file size: 482 bytes, number of entries: 3
-rw-rw-r--  3.0 unx        0 bx stor 23-Nov-07 16:03 ../a.txt
-rw-rw-r--  3.0 unx        0 bx stor 23-Nov-07 16:04 tmp/a/b/c/c.txt
-rw-rw-r--  3.0 unx        0 bx stor 23-Nov-07 16:03 ../../a/b/b.txt
3 files, 0 bytes uncompressed, 0 bytes compressed:  0.0%

# Prepare directories holding extraction results.
$ mkdir -p /tmp/extract/unzip /tmp/extract/zipfile /tmp/extract/shutil

# Extract using 'unzip' extracts all files 
# and warns about relative path components.
$ cd /tmp/extract/unzip
$ unzip /tmp/test.zip
Archive:  /tmp/test.zip
warning:  skipped "../" path component(s) in ../a.txt
 extracting: a.txt                   
 extracting: tmp/a/b/c/c.txt         
warning:  skipped "../" path component(s) in ../../a/b/b.txt
 extracting: a/b/b.txt               
$ find
.
./a
./a/b
./a/b/b.txt
./tmp
./tmp/a
./tmp/a/b
./tmp/a/b/c
./tmp/a/b/c/c.txt
./a.txt

# Extract using 'zipfile' (CLI).
$ cd /tmp/extract/zipfile
$ python3 -m zipfile -e /tmp/test.zip .
$ find
.
./a
./a/b
./a/b/b.txt
./tmp
./tmp/a
./tmp/a/b
./tmp/a/b/c
./tmp/a/b/c/c.txt
./a.txt

# Extract using 'shutil.unpack_archive'.
$ cd /tmp/extract/shutil
$ python3 -c 'import shutil; shutil.unpack_archive("/tmp/test.zip",".")' 
$ find
.
./tmp
./tmp/a
./tmp/a/b
./tmp/a/b/c
./tmp/a/b/c/c.txt

# As can be seen, and as can be confirmed using `diff`,
# zipfile and unzip produce the same result,
# while shutil silently skips paths containing "..",
# i.e. both names containing ".." and paths containing
# actual relative components.

# Diff.
$ cd /tmp/extract
$ diff -r zipfile/ unzip/

$ diff -r unzip/ shutil/
Only in unzip/: a
Only in unzip/: a..txt
Only in unzip/: a.txt
Only in unzip/: tmp

…to zipfile

shutil.unpack_archive fails, if file name contains '..'; zipfile handles everything correctly, i.e. in the same way than 'unzip'; let zipfile unpack archives, instead of reinventing the wheel here
@ghost
Copy link

ghost commented Nov 7, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Nov 7, 2023

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Nov 7, 2023

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@sepastian sepastian changed the title fixing #111791: delegating extraction to zipfile delegating extraction to zipfile, fixing #111791 Nov 7, 2023
@sepastian sepastian changed the title delegating extraction to zipfile, fixing #111791 gh-111791: delegating extraction to zipfile module's extractall() method Nov 7, 2023
@sepastian
Copy link
Author

This is a bug that needs to be fixed. Any progress on this?

@python-cla-bot
Copy link

python-cla-bot bot commented Apr 18, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants