Skip to content

Data flow: Move C# lambda flow logic into shared library #5311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 19, 2021

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Mar 2, 2021

This PR adds support for resolving lambda calls to the shared data-flow library. The interface to be implemented by each language is

/** Holds if `creation` is an expression that creates a lambda for `c`. */
predicate lambdaCreation(Node creation, DataFlowCallable c) { ... }

/** Holds if `call` is a lambda call where `receiver` is the lambda expression. */
predicate lambdaCall(DataFlowCall call, Node receiver) { ... }

/** Extra data-flow steps needed for lamba flow analysis. */
predicate additionalLambdaFlowStep(Node nodeFrom, Node nodeTo, boolean preservesValue) { ... }

with the semantics that call will resolve to c if there is a data-flow path from creation to receiver.

The implementation is, for now, quite simple (ported from the existing C# implementation): read/store steps and flow-through is not taken into account. However, higher-order flow is supported, which makes the data-flow analysis non-linearly recursive. Adding support for contents and flow-through is postponed as future work.

The implementation keeps track of a one-level call context, which means that we are able to handle situations like this:

Apply(f, x) { f(x); }

Apply(x => NonSink(x), "tainted"); // GOOD

Apply(x => Sink(x), "not tainted"); // GOOD

However, since we only track one level the following example will have false-positive flow:

Apply(f, x) { f(x); }

ApplyWrapper(f, x) { Apply(f, x) }

ApplyWrapper(x => NonSink(x), "tainted"); // GOOD (FALSE POSITIVE)

ApplyWrapper(x => Sink(x), "not tainted"); // GOOD (FALSE POSITIVE)

Commit-by-commit review is strongly encouraged.

https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/914/
https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1786/
https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1201/
https://jenkins.internal.semmle.com/job/Changes/job/Python-Differences/445/

https://jenkins.internal.semmle.com/job/Changes/job/CSharp-Differences/946/
https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1839/
https://jenkins.internal.semmle.com/job/Changes/job/Java-Differences/1227/
https://jenkins.internal.semmle.com/job/Changes/job/Python-Differences/464/

@github-actions github-actions bot added the C# label Mar 2, 2021
@hvitved hvitved marked this pull request as ready for review March 5, 2021 07:53
@hvitved hvitved requested review from a team as code owners March 5, 2021 07:53
@hvitved hvitved added the no-change-note-required This PR does not need a change note label Mar 5, 2021
@aschackmull
Copy link
Contributor

In Java, lambdas are just a special case of construction of anonymous classes. This means that the implicit assumption that a lambda object only has one method that can be called breaks for Java, so the interface needs to account for that somehow.

@jbj
Copy link
Contributor

jbj commented Mar 5, 2021

The 4% increases in analysis time for two projects in https://jenkins.internal.semmle.com/job/Changes/job/CPP-Differences/1786/ appear to be just noise: they're attributed to queries without data flow.

@hvitved
Copy link
Contributor Author

hvitved commented Mar 5, 2021

In Java, lambdas are just a special case of construction of anonymous classes. This means that the implicit assumption that a lambda object only has one method that can be called breaks for Java, so the interface needs to account for that somehow.

I did consider having an extra "call kind" column in both lambdaCreation and lambdaCall. Would you rather have this already on this PR, or once we adopt it for Java?

@aschackmull
Copy link
Contributor

did consider having an extra "call kind" column in both lambdaCreation and lambdaCall. Would you rather have this already on this PR, or once we adopt it for Java?

I'm not entirely sure. I missed this in our last round of coordination, so I haven't thought a lot about what it would look like.

@hvitved
Copy link
Contributor Author

hvitved commented Mar 16, 2021

I have rebased this PR, added a "call kind" column, and restarted difference jobs.

Comment on lines +2072 to +2082
exists(Ssa::Definition def |
LocalFlow::localSsaFlowStep(def, nodeFrom, nodeTo) and
LocalFlow::usesInstanceField(def) and
preservesValue = true
)
or
exists(LambdaConfiguration x, DelegateCreation dc |
x.hasExprPath(dc.getArgument(), nodeFrom.(ExprNode).getControlFlowNode(), dc,
nodeTo.(ExprNode).getControlFlowNode()) and
preservesValue = false
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these two steps? I can guess what the event-step below is for, but these two are somewhat opaque to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first step is for including SSA with fields, because field flow is not yet modelled in the data flow loop. The second consists of two steps that are specific to delegates.

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shared bits LGTM. I have not reviewed the C# changes.

@hvitved
Copy link
Contributor Author

hvitved commented Mar 18, 2021

The shared bits LGTM. I have not reviewed the C# changes.

@tamasvajk : Could you take a look at the C# bits?

Copy link
Contributor

@tamasvajk tamasvajk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly checked the test result changes, they seem correct. The diff job shows a couple of new issues (cs/cleartext-storage-of-sensitive-information, cs/user-controlled-bypass) that doesn't seem to be caused by simple wobbliness. Are those true positives?

@hvitved
Copy link
Contributor Author

hvitved commented Mar 19, 2021

I mostly checked the test result changes, they seem correct. The diff job shows a couple of new issues (cs/cleartext-storage-of-sensitive-information, cs/user-controlled-bypass) that doesn't seem to be caused by simple wobbliness. Are those true positives?

This is caused by 25adcfc, which adds missing flow from parameters into SSA phi nodes.

@hvitved hvitved removed request for a team March 19, 2021 10:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C# C++ Java no-change-note-required This PR does not need a change note Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants