-
Notifications
You must be signed in to change notification settings - Fork 1.3k
fast-path hash() for exact PyStr #6102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
str.__hash__ can’t be overridden, so for exact PyStr instances we skip the attribute lookup and MRO walk and call the string hash directly. This reduces overhead on hot paths (e.g. dict/set key hashing) while keeping subclass semantics intact (downcast_ref_if_exact).
WalkthroughPyObject::hash now dispatches via the class MRO to a type slot hash (cls.slots.hash) and invokes it as hash(self, vm) when found; if no slot is present, it raises TypeError: "unhashable type: ''". No public signatures changed. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant PyObject
participant TypeSystem
Note over PyObject,TypeSystem: Slot-based hash dispatch via MRO
Caller->>PyObject: hash(vm)
PyObject->>TypeSystem: mro_find_map(cls => cls.slots.hash.load())
alt hash slot found
TypeSystem-->>PyObject: hash slot
PyObject->>PyObject: call hash(self, vm)
PyObject-->>Caller: int
else no hash slot
TypeSystem-->>PyObject: None
PyObject-->>Caller: TypeError (unhashable type: '<ClassName>')
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: .coderabbit.yml 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
vm/src/protocol/object.rs (1)
645-649
: Nit: clarify the rationale in the comment and add a regression test for subclass override.The comment is correct but a bit terse. Suggest expanding it to capture why the fast-path is safe and how subclass semantics are preserved. Also, it would be good to add tests that: (1) a str subclass with a custom hash still uses the override, and (2) assigning to str.hash raises TypeError.
Apply this small comment tweak:
- // __hash__ function in str cannot be overwritten + // Fast-path: for exact str objects we can bypass attribute lookup/MRO walk and + // call PyStr::hash directly. The builtin str type is a static (immutable) type, + // so str.__hash__ is not settable; restricting to exact PyStr preserves subclass + // semantics (overrides on str subclasses still apply).Example tests to consider adding:
# tests/test_str_hash_fastpath.py import unittest class MyStr(str): def __hash__(self): return 42 class TestStrHashFastPath(unittest.TestCase): def test_exact_str_fast_path_semantics(self): s = "abc" self.assertEqual(hash(s), hash("abc")) # stable equality # sanity: repeated calls equal self.assertEqual(hash(s), hash(s)) def test_subclass_override_is_respected(self): t = MyStr("abc") self.assertEqual(hash(t), 42) def test_cannot_assign_str_hash(self): with self.assertRaises(TypeError): str.__hash__ = lambda self: 0
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
vm/src/protocol/object.rs
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs
: Follow the default rustfmt code style (cargo fmt
to format)
Always run clippy to lint code (cargo clippy
) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/protocol/object.rs
🔇 Additional comments (1)
vm/src/protocol/object.rs (1)
645-649
: Solid fast-path for exact PyStr hashing; semantics preserved.
- Using downcast_ref_if_exact ensures only builtin str objects take the fast path, so subclasses continue to respect their own hash.
- Directly calling PyStr::hash(vm) keeps the salted hash semantics via the VM and avoids the attribute lookup + MRO walk on a very hot path.
Thank you for contributing! I agree checking |
Thank you for your feedback. I intended to reduce to a minimum the number of calls and lock acquisitions. On your note, my proposal then is to remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
vm/src/protocol/object.rs (1)
645-652
: Optional: add exact PyStr fast path and check the immediate class slot before walking the MRO.
- Exact PyStr cannot override hash, so an early downcast avoids any MRO walk or slot fetch.
- Checking the immediate class slot first avoids scanning the full MRO for the common case where the class itself defines the hash slot.
- Use
new_type_error
for consistency with the rest of this file.Apply this diff:
- if let Some(hash) = self.class().mro_find_map(|cls| cls.slots.hash.load()) { - return hash(self, vm); - } + // Fast path 1: exact built-in str — cannot override __hash__, so we can + // bypass MRO/descriptor lookups entirely. + if let Ok(s) = self.to_owned().downcast_exact::<PyStr>(vm) { + return PyStr::hash(&s, vm); + } + // Fast path 2: check the immediate class slot before walking the MRO. + let cls = self.class(); + if let Some(hash) = cls.slots.hash.load() { + return hash(self, vm); + } + // Fallback: search along the MRO (RustPython doesn't inherit slots yet). + if let Some(hash) = cls.mro_find_map(|cls| cls.slots.hash.load()) { + return hash(self, vm); + } - return Err(vm.new_exception_msg( - vm.ctx.exceptions.type_error.to_owned(), - format!("unhashable type: '{}'", self.class().name()), - )); + return Err(vm.new_type_error(format!( + "unhashable type: '{}'", + self.class().name() + )));
📜 Review details
Configuration used: .coderabbit.yml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
vm/src/protocol/object.rs
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit Inference Engine (.github/copilot-instructions.md)
**/*.rs
: Follow the default rustfmt code style (cargo fmt
to format)
Always run clippy to lint code (cargo clippy
) before completing tasks. Fix any warnings or lints that are introduced by your changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass
,pymodule
,pyfunction
, etc.) when implementing Python functionality in Rust
Files:
vm/src/protocol/object.rs
🧬 Code Graph Analysis (1)
vm/src/protocol/object.rs (6)
vm/src/types/slot.rs (1)
hash
(973-973)vm/src/builtins/str.rs (2)
hash
(589-594)hash
(1533-1535)vm/src/builtins/tuple.rs (1)
hash
(421-423)vm/src/builtins/int.rs (1)
hash
(735-737)vm/src/builtins/memory.rs (1)
hash
(1032-1042)vm/src/stdlib/builtins.rs (1)
hash
(439-441)
🔇 Additional comments (1)
vm/src/protocol/object.rs (1)
645-652
: Slot-based hash dispatch via type slots is correct and removes attribute-lookup overhead.Using the hash slot via MRO aligns with CPython’s behavior and avoids the cost of attribute lookup/descriptors on hot paths. The TypeError fallback is also correct.
Reduce calls and lock acquisitions on hot paths by bypassing get_class_attr(__hash__) and directly resolving the hash implementation with mro_find_map(|cls| cls.slots.hash.load()).
Description
str.__hash__
can’t be overwritten, so for exactPyStr
instances we skip the attribute lookup and MRO walk and call the string hash directly. This reduces overhead on hot paths (e.g., dict/set key hashing) while keeping subclass semantics unchanged.in Python 3.13.5.
Summary by CodeRabbit