-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Use the shared type-tracking library for virtual dispatch resolution #20249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…' and slightly reorganize the code in preparation for the next commit.
…ke it's done in Java.
…spatch local flow predicate.
…accept test changes.
d8e7390
to
02bf923
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to take several passes, I think. Let's start with the below comment. I also think I need some explanation of what is going on in d4188d5, because the diff there is pretty much unreadable.
DataFlowPrivate::DataFlowCallable defaultViableCallable(DataFlowPrivate::DataFlowCall call) { | ||
result = defaultViableCallableWithoutLambda(call) | ||
or | ||
result = DataFlowImplCommon::viableCallableLambda(call, _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this extra case needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming from 42fcfca
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defaultViableCallable
should also resolve function pointers.
On main
we essentially have one way of dealing with both function pointer resolution and virtual dispatch resolution. That's the line marked as // Virtual dispatch
here (which uses the old library that we're deleting in this PR). Then in #17788 we added another way of handling function pointer resolution.
So on main
we currently have one way of dealing with virtual functions, and two ways of dealing with function pointers (i.e., the // Virtual dispatch
way and the lambda flow one I added in #17788). Sadly, only one of those function pointer resolutions were exposed in defaultViableCallable
(the one marked as // Virtual dispatch
).
This PR totally removes the old library that did both function pointer resolution and virtual dispatch, and instead we now only have one library for virtual dispatch, and one library for function pointers. So in order to continue to resolve function pointers in defaultViableCallable
we now need to expose the lambda flow resolution since that's the canonical way we now resolve function pointer calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not helping me, as it basically reiterates what is written in the PR description. Might be most effective to have a brief call about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally fair. Happy to have a chat about it tomorrow (or whenever it fits you)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some history
C/C++ has it its own virtual/function pointer dispatch resolution code for a very long time. Most of the code was introduced in #2638 as a port of the even older resolution from the old
security.TaintTracking
library.Since then, we've gotten a number of shared libraries and improvements to existing shared libraries. In particular:
In #17788 we used 2. to track flow through function pointers, but we kept the old function pointer dispatch code. This PR does 1. while also removing the old code that we left around for doing function pointer resolution even after we merged #17788.
What this PR does
This PR does what I promised in #17788:
This PR gets rid of this old library and instead performs virtual dispatch resolution much like Java: We instantiate the shared type-tracking library and track the type of an allocation (or a cast-to-base-class conversion) to a qualifier of a virtual function call.
This means future dataflow improvements will improve our virtual dispatch resolution automagically since it's no longer a separate ad-hoc library.
Additionally, it means that virtual dispatch resolution now also has flow through global variables and supports field flow - neither of which we supported before!
Of course, the allocation can flow into (and out of) function calls. So the library needs some dispatch resolution logic in order to implement the dispatch resolution logic. Java solves this by sticking the whole library into a module that's parameterized by a dispatch resolution predicate, and then running the entire thing 6 times. Thus, we can resolve stuff like:
I tried playing with various number of rounds. There are some projects for which additional iterations yield new resolutions. I found 1 project for which we didn't reach a fixed point (i.e., another round of resolution didn't give us new targets) until iteration 15, but there was only 1 such project on MRVA. So I don't think it's worth computing more than what has found to be useful.
Results
The results are surprisingly boring! We lose 6
cpp/type-confusion
results on openjdk/jdk. All of these are FPs which came because the old library just dispatched to any override of a virtual function when resolving functions. Now, we actually track the dynamic type of the object, so in many cases we can resolve it precisely to the right target.The largest slowdown is on
erlang/otp
(about 24% analysis time). Nothing seems to be wrong when looking at the tuple counts. The shared type-tracking library just does lots of iterations, and luckily none of them look particularly scary.Commit-by-commit review recommended.