-
-
Notifications
You must be signed in to change notification settings - Fork 32.7k
Description
Feature or enhancement
Proposal:
tl;dr
I'd like __spec__
to be passed through by importlib.util._LazyModule.__getattribute__
without triggering the full load of the module. That way, the regular, internal import machinery doesn't accidentally trigger the full load when fishing the lazy module out of sys.modules
. This can be caused by re-(lazy-)importing the module, and the result, a fully loaded module, is pretty unexpected.
Full Story
I've been trying to use importlib.util.LazyLoader
lately and found a way it could be made more ergonomic.
To start off, a demonstration of what I want to work, based on the lazy import recipe in the importlib docs:
import importlib.util
import sys
def lazy_import(name):
# Personal addition to take advantage of the module cache.
try:
return sys.modules[name]
except KeyError:
pass
spec = importlib.util.find_spec(name)
loader = importlib.util.LazyLoader(spec.loader)
spec.loader = loader
module = importlib.util.module_from_spec(spec)
sys.modules[name] = module
loader.exec_module(module)
return module
lazy_typing = lazy_import("typing")
# Let's import it a second time before actually using it here.
# This could even happen in another file. Ideally, *still* doesn't execute yet
# because we pull from the sys.modules cache.
lazy_typing = lazy_import("typing")
lazy_typing.TYPE_CHECKING # Only *now* does the actual module execute.
The above recipe works, but without the sys.modules caching I added, the second import would cause the module to execute and populate, even though the user hasn't gotten an attribute from it yet.
Fair enough, it's just a recipe for the docs. It's not meant to cover all the edge cases and use cases. What about a different code snippet that tries not to manually perform every part of the import process, though?
Let's try again, but using an import hook like, say, a custom finder on the meta path to wrap the found spec's loader with LazyLoader
. That way, it'll take advantage all the thread locks importlib uses internally and can even affect normal import statements. Here's an example:
# NOTE: This is not as robust as it could be, but it serves well enough for demonstration.
import importlib.util
import sys
# threading is needed due to circular import issues from importlib.util importing it while
# LazyFinder is on the meta path. Not relevant to this issue.
import threading
class LazyFinder:
"""A module spec finder that wraps a spec's loader, if it exists, with LazyLoader."""
@classmethod
def find_spec(cls, fullname: str, path=None, target=None, /):
for finder in sys.meta_path:
if finder is not cls:
spec = finder.find_spec(fullname, path, target)
if spec is not None:
break
else:
raise ModuleNotFoundError(...)
if spec.loader is not None:
spec.loader = importlib.util.LazyLoader(spec.loader)
return spec
class LazyFinderContext:
"""Temporarily "lazify" some types of import statements in the runtime context."""
def __enter__(self):
if LazyFinder not in sys.meta_path:
sys.meta_path.insert(0, LazyFinder)
def __exit__(self, *exc_info):
try:
sys.meta_path.remove(LazyFinder)
except ValueError:
pass
lazy_finder = LazyFinderContext()
with lazy_finder:
import typing # Does the same thing as the earlier snippet, but for a normal import statement.
Unfortunately, the above code has the same flaw as the original importlib recipe when used directly: the module cache isn't being taken advantage of. However, it's not possible to work around from user code without a ton of copying.
Adding import typing
again at the bottom will cause typing
to get fully executed. This is demonstrable in two ways:
-
Adding print statements checking the type of the module:
... with lazy_finder: import typing print(type(typing)) # Doesn't matter if we're using the context manager again or not, the result is the same. # with lazy_finder: import typing print(type(typing))
Output:
> python scratch.py <class 'importlib.util._LazyModule'> <class 'module'>
-
By putting the above code snippet in a file(e.g.
scratch.py
) then checking the output ofpython -X importtime -c "import scratch"
before and after adding a secondimport typing
statement:Before
import time: self [us] | cumulative | imported package ... import time: 566 | 2035 | site ... import time: 196 | 196 | _weakrefset import time: 622 | 3045 | threading import time: 86 | 86 | typing import time: 1569 | 5583 | scratch
After
import time: self [us] | cumulative | imported package ... import time: 504 | 1868 | site ... import time: 1127 | 3662 | threading import time: 75 | 75 | typing ... import time: 394 | 3374 | re import time: 53 | 53 | _typing import time: 3539 | 12346 | scratch
The reason for this, in my eyes, lack of correspondence, is a small implementation detail:
cpython/Lib/importlib/_bootstrap.py
Lines 1348 to 1355 in 1c0a104
def _find_and_load(name, import_): | |
"""Find and load the module.""" | |
# Optimization: we avoid unneeded module locking if the module | |
# already exists in sys.modules and is fully initialized. | |
module = sys.modules.get(name, _NEEDS_LOADING) | |
if (module is _NEEDS_LOADING or | |
getattr(getattr(module, "__spec__", None), "_initializing", False)): |
Because the __spec__
is requested even when checking the module cache, and importlib.util._LazyModule
makes no exceptions for attribute requests, well, the original loader will always execute and the module will populate. To get around this, a user would have to copy importlib.util._LazyModule
and importlib.util.LazyLoader
, modify them (see suggested patch below), and use those local versions instead.
Thus, I propose adding a small special case within _LazyModule to make usage with sys.modules
more ergonomic:
If __spec__
is requested, just return that without loading the whole module yet. That way, instances of _LazyModule within sys.modules won't be forced to resolve immediately by regular import machinery, not until the module is visibly accessed by the user. The diff would be quite small:
--- current_3.14.py 2024-11-19 15:35:57.218717430 -0500
+++ modified_3.14.py 2024-11-19 15:36:21.608717512 -0500
@@ -171,6 +171,10 @@
def __getattribute__(self, attr):
"""Trigger the load of the module and return the attribute."""
__spec__ = object.__getattribute__(self, '__spec__')
+
+ if "__spec__" == attr:
+ return __spec__
+
loader_state = __spec__.loader_state
with loader_state['lock']:
# Only the first thread to get the lock should trigger the load
I hope this makes sense and isn't too long-winded.
EDIT: Added a tl;dr at the top.
EDIT2: Added an easier way to demonstrate the full load being triggered.
EDIT3: Adjusted phrasing.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response