Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Deduplicate dataflow query results #14151

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

MathiasVP
Copy link
Contributor

This PR fixes the problem of duplicated dataflow results we've seen since we switched to IR-based dataflow.

To see why we get such duplicated results, consider the following query:

import cpp
import semmle.code.cpp.dataflow.new.DataFlow

module C implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node node) { node.asExpr().(Call).getTarget().hasName("source") }

  predicate isSink(DataFlow::Node node) {
    node.asExpr() = any(Call call | call.getTarget().hasName("sink")).getAnArgument()
  }
}

module MyDataFlow = TaintTracking::Global<C>;

from DataFlow::Node source, DataFlow::Node sink
where MyDataFlow::flow(source, sink)
select sink, ""

When running on:

int source();
void sink(long);

void test() {
  int x = source();
  sink(x); // Important: There's an int-to-long conversion here
}

we get two identical results. This is because both the dataflow node for the unconverted x and the dataflow node for the converted x map to the x expression.

This PR fixes this issue by ensuring that only the fully converted expression has a result for asExpr(). Calling asExpr() on a dataflow node gives the unconverted expression, and calling asConvertedExpr() gives the converted expression.

Commit-by-commit strongly encouraged (You can see the effects of each commit in the updated .expected files):

  • The first commit introduces a new predicate that we'll use to convert an instruction to an expression.
  • The second predicate naively replaces the use of Instruction::getConvertedResultExpression() with this new predicate.
  • The next couple of commits fixes various predicates to implement the logic around "only the fully converted instruction should have a result for asExpr.
  • Finally, the last two commits fixes a bunch of queries to use asExpr instead of asConvertedExpr, and removes manual deduplication. These workarounds are no longer needed.

DCA shows significant result deduplication 🎉. There are a couple of reported lost results, but the only ones that are genuinely lost are the two cpp/very-likely-overrunning-write on vim. This is a bit unfortunate, but I can't see a good way of bringing them back with the current semantics of unique.

I'll create a follow-up issue for this last part. I don't think it should block the PR, though.

@MathiasVP MathiasVP requested a review from a team as a code owner September 6, 2023 12:03
@github-actions github-actions bot added the C++ label Sep 6, 2023
@MathiasVP MathiasVP added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Sep 6, 2023
@MathiasVP
Copy link
Contributor Author

Note: This PR isn't green on CI because there are some internal tests that need to be updated. There's a backlink to the internal PR (for those with access) where the PR is green 🟢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant