Skip to content

C++: Use the shared type-tracking library for virtual dispatch resolution #20249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 21, 2025

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Aug 19, 2025

Some history

C/C++ has it its own virtual/function pointer dispatch resolution code for a very long time. Most of the code was introduced in #2638 as a port of the even older resolution from the old security.TaintTracking library.

Since then, we've gotten a number of shared libraries and improvements to existing shared libraries. In particular:

  1. Shared: Add a qlpack with a parameterized module defining type-trackers. #11521 introduced a shared type-tracking library
  2. Data flow: Move C# lambda flow logic into shared library #5311 added support for tracking lambdas

In #17788 we used 2. to track flow through function pointers, but we kept the old function pointer dispatch code. This PR does 1. while also removing the old code that we left around for doing function pointer resolution even after we merged #17788.

What this PR does

This PR does what I promised in #17788:

Ideally, I would have liked to also remove the old function pointer resolution and virtual dispatch code, but since this is such a small change I think we should just get this improvement in now.

This PR gets rid of this old library and instead performs virtual dispatch resolution much like Java: We instantiate the shared type-tracking library and track the type of an allocation (or a cast-to-base-class conversion) to a qualifier of a virtual function call.

This means future dataflow improvements will improve our virtual dispatch resolution automagically since it's no longer a separate ad-hoc library.

Additionally, it means that virtual dispatch resolution now also has flow through global variables and supports field flow - neither of which we supported before!

Of course, the allocation can flow into (and out of) function calls. So the library needs some dispatch resolution logic in order to implement the dispatch resolution logic. Java solves this by sticking the whole library into a module that's parameterized by a dispatch resolution predicate, and then running the entire thing 6 times. Thus, we can resolve stuff like:

struct Base {
  Base* getNext() { /* ... */ }
};

struct Derived1 : Base {
  Base* getNext() override  { return new Derived2; }
};

struct Derived2 : Base { ... }
// etc.

...
Base* b1 = new Derived1;
Base* b2 = b1->getNext(); // this is resolved after 1 round of dispatch resolution
Base* b3 = b2->getNext(); // this is resolved after 2 round of dispatch resolution
Base* b4 = b3->getNext(); // this is resolved after 3 round of dispatch resolution
// etc.

I tried playing with various number of rounds. There are some projects for which additional iterations yield new resolutions. I found 1 project for which we didn't reach a fixed point (i.e., another round of resolution didn't give us new targets) until iteration 15, but there was only 1 such project on MRVA. So I don't think it's worth computing more than what has found to be useful.

Results

The results are surprisingly boring! We lose 6 cpp/type-confusion results on openjdk/jdk. All of these are FPs which came because the old library just dispatched to any override of a virtual function when resolving functions. Now, we actually track the dynamic type of the object, so in many cases we can resolve it precisely to the right target.

The largest slowdown is on erlang/otp (about 24% analysis time). Nothing seems to be wrong when looking at the tuple counts. The shared type-tracking library just does lots of iterations, and luckily none of them look particularly scary.

Commit-by-commit review recommended.

@MathiasVP MathiasVP force-pushed the type-tracking-for-cpp-3 branch from d8e7390 to 02bf923 Compare August 19, 2025 11:57
@MathiasVP MathiasVP marked this pull request as ready for review August 19, 2025 12:38
@MathiasVP MathiasVP requested a review from a team as a code owner August 19, 2025 12:38
@Copilot Copilot AI review requested due to automatic review settings August 19, 2025 12:38
Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to take several passes, I think. Let's start with the below comment. I also think I need some explanation of what is going on in d4188d5, because the diff there is pretty much unreadable.

DataFlowPrivate::DataFlowCallable defaultViableCallable(DataFlowPrivate::DataFlowCall call) {
result = defaultViableCallableWithoutLambda(call)
or
result = DataFlowImplCommon::viableCallableLambda(call, _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this extra case needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is coming from 42fcfca

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defaultViableCallable should also resolve function pointers.

On main we essentially have one way of dealing with both function pointer resolution and virtual dispatch resolution. That's the line marked as // Virtual dispatch here (which uses the old library that we're deleting in this PR). Then in #17788 we added another way of handling function pointer resolution.

So on main we currently have one way of dealing with virtual functions, and two ways of dealing with function pointers (i.e., the // Virtual dispatch way and the lambda flow one I added in #17788). Sadly, only one of those function pointer resolutions were exposed in defaultViableCallable (the one marked as // Virtual dispatch).

This PR totally removes the old library that did both function pointer resolution and virtual dispatch, and instead we now only have one library for virtual dispatch, and one library for function pointers. So in order to continue to resolve function pointers in defaultViableCallable we now need to expose the lambda flow resolution since that's the canonical way we now resolve function pointer calls.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not helping me, as it basically reiterates what is written in the PR description. Might be most effective to have a brief call about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally fair. Happy to have a chat about it tomorrow (or whenever it fits you)

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MathiasVP MathiasVP merged commit dfda5a0 into github:main Aug 21, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants