-
-
Notifications
You must be signed in to change notification settings - Fork 3k
More efficient (fixed-format) serialization #19668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
Oh, hm I completely forgot 32-bit architectures are still a thing. Probably an easy fix, not sure I will have time to take a look at it today, but it should not stop any reviewers from reviewing. |
This comment has been minimized.
This comment has been minimized.
(Not a full review.) Nice, this is a major improvement! I measured a ~35% performance improvement in |
I did a quick review, and this seems good enough to merge (or at least close enough). Let's leave this open for another day in case somebody still wants to have a look. Before making this an official feature, there are some changes that I think would be nice to have (beyond a way of distributing this), but these are better done as follow-up PRs and I won't discuss them here. I also have some smaller comments that are easier for me to address in a follow-up PR than writing detailed review comments here. I also want to remove the JSON based format eventually, maybe a few releases after we've made this the default. This may require a way of converting the binary cache to JSON, since at least at Dropbox we have some tools that process mypy cache files, and processing a binary format in tools sounds pretty painful. I have some ideas about how to achieve this without making the serialization code significantly harder to maintain or less efficient. I'll write about this in a follow-up issue. |
Added link to related issue #3456 in summary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting another day doesn't bring any value, so let's merge this soon. Left one comment, otherwise looks good. We can start iterating on this once this is merged.
mypy/main.py
Outdated
incremental_group.add_argument( | ||
"--fixed-format-cache", | ||
action="store_true", | ||
help="Use experimental binary fixed format cache", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hide the flag from --help
output (help=argparse.SUPPRESS
) until we've figured out how to distribute this properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hoped we will do this before next mypy release, but sure, we can expose this easily when ready.
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
Oh, I completely forgot about that issue, and I see you even posted a proposal there. It is interesting that I wrote something very similar to what you propose, few comments:
OK, yeah, this will definitely speed things up. |
This makes deserialization ~2.5x faster compared to
orjson
. This is fully functional, but still PoC in terms of distribution logic. Some comments:MYPY_USE_MYPYC=1
, this will install the extension automatically (for now)pip install mypyc/lib-rt
Some technical notes:
try/except
import blob inmypy/cache.py
is temporary, it is needed for now to be able to run tests without installing mypy itself (only withtest-requirements.txt
).complex
and/orNone
in some cases, but not in other cases.MypyFile
, which is special). There is no convention for few classes that are not types/symbols.native_internal.Buffer
(and possible more type in future fromnative
) for better/automatic method call specializations. If this feels wrong/risky, I can convert this to a more ad-hoc logic intransform_call_expr()
Related issue: #3456