Docker has (somewhat) recently been ported to Windows containers/hyper-v. Layers are exported as TAR files with pax headers and, at least in the case of the base windows layers, the tar contents are not listed/parsed by 7-Zip (but is properly parsed by docker and by Python's tarfile module)
It appears pax headers are used to reference empty folders to include their windows permissions(under MSWINDOWS.rawsd) and 7-zip is getting stuck on this(?). The pax header attribute path is used for the full name of a file due to the tar limitation of 100 bytes per filename (where needed, the tar filename is truncated and the path attribute is included).
I admit the possiblity that these tar files are not perfectly valid, but they are in production, for better or worse, and should be unpacked properly as a result.
Thanks.
Example (tar.gz) pulled from the docker log.
debug: Pulling sha256:bce2fbc256ea437a87dadac2f69aabd25bed4f56255549090056c1131fad0277 from foreign URL https://go.microsoft.com/fwlink/?linkid=837858
debug: Pulling sha256:cb1aafb7147372cc64faa070b94a893b8cd2e3de3a0e8001dc225c627d991c58 from foreign URL https://go.microsoft.com/fwlink/?linkid=867858
Example listing
Scanning the drive for archives:
1 file, 733859328 bytes (700 MiB)
Listing archive: nanoserver_10.0.14393
--
Path = nanoserver_10.0.14393
Type = tar
ERRORS:
Headers Error
WARNINGS:
There are data after the end of archive
Offset = 1024
Physical Size = 512
Tail Size = 733857792
Headers Size = 512
Code Page = UTF-8
Date Time Attr Size Compressed Name
------------------- ----- ------------ ------------ ------------------------
2016-11-20 13:32:27 D.... 0 0 Files
------------------- ----- ------------ ------------ ------------------------
2016-11-20 13:32:27 0 0 0 files, 1 folders
Warnings: 1
Errors: 1
Thanks.
There are two problems with these TARs:
1) ZEROs in Modified Timestamp field. So 7-Zip thinks that header is incorrect and stops parsing. I'll fix that code in next version of 7-Zip. So it will show all items in TARs.
2) Some special (PAX ?) extension to TAR format. It's more difficult to add support for these things. Maybe later I'll return to these things.
While I haven't dug deep into these, as I understand it, each file comes prefixed with a pax extended header. (there's an IBM z/OS article on these and I've seen a Microsoft C# implementation here. the spec is here)
When iterating you're supposed to parse the pax header and apply its attributes to the next file, as some of them are not appropriate for/extend on the standard tar/ustar header. I believe in this specific case that the idea is that the tar header will only allow an integer modification timestamp, but you may find mtime in the pax header specified as a float.
I believe I've seen a suggestion that you're supposed to treat these as individual files/skip them if pax header processing is unavailable, so maybe, for now, presence of the header can be treated as a flag to handle invalid fields with defaults.
Thanks.
Last edit: Cats 2018-03-14
Is there any timeline for the inclusion of full POSIX.1-2001 (pax) support in 7-zip? I just tested 19.00, and indeed PAX headers are still decompressed as extra files and not parsed, and the archive is just treated as ustar.
To be clear, it isn't some "special" format, it is the portable, interoperable modern POSIX standard, used or recommended by default with many tools. The spec was published nearly a decade and a half ago, the other major comperable archivers (GNU tar, bsdtar/libarchive, star, Python tarfile, etc.) have all added it over a decade ago, and there are active plans in Python 3.8 to finally switch to pax by default with GNU tar indicating for many years now they plan to do so as well.
Of course, I don't want to see 7-zip/p7zip left behind the rest, nor an increasing volume of your users reporting issues trying open the rising tide of tars using the modern pax format, so do you have any updates on the progress of adding full support? Thanks!
This seemingly obscure issue is now a major problem for me, because I'm seeing it in the TAR.GZ archives for all recent versions of MediaWiki (1.34.3, 1.34.4, 1.35). MediaWiki has started recommending that Windows users AVOID 7-Zip because of this: https://www.mediawiki.org/wiki/Special:MyLanguage/Download
Just to follow up, we did switch to the modern POSIX.1-2001 (pax) TAR format as the default in Python 3.8, which is now the current stable version, so as Python 3.8+ steadily gains adoption, this is increasingly going to become an issue for any of your users who consume TARs produced by vendors (e.g. build toolchains) that rely on it. No idea if this is the case for MediaWiki, but as archivers and distributors in general continue increasingly adopt the more modern, standardized format, this problem only going to get more serious with time.
I was pointed to this report by the Eclipse support. All Eclipse installer archives since 2018-10 are affected by this problem.