Skip to content

Conversation

@jketema
Copy link
Contributor

@jketema jketema commented Mar 18, 2022

Rework cpp/command-line-injection to use flow states. Only thing I'm not sure about is whether the location of a dataflow node uniquely determines the node, which I depend upon:

class ExecState extends DataFlow::FlowState {
  DataFlow::Node fst;
  DataFlow::Node snd;

  ExecState() {
    this = "ExecState (" + fst.getLocation() + ", " + snd.getLocation() + ")" and
    interestingConcatenation(fst, snd)
  }
...

@jketema jketema requested a review from rdmarsh2 March 18, 2022 19:10
@jketema jketema requested a review from a team as a code owner March 18, 2022 19:10
@github-actions github-actions bot added the C++ label Mar 18, 2022
@rdmarsh2
Copy link
Contributor

whether the location of a dataflow node uniquely determines the node, which I depend upon

It doesn't - there's a lot of cases with pointer arguments where we have input and output nodes for the pointed-to values that share a location with the argument.

@jketema
Copy link
Contributor Author

jketema commented Mar 18, 2022

It doesn't - there's a lot of cases with pointer arguments where we have input and output nodes for the pointed-to values that share a location with the argument.

Good to know. Then the question is, whether it's unique enough for what we're doing here, or whether there's something that better identifies a data flow node.

@hvitved
Copy link
Contributor

hvitved commented Mar 21, 2022

It doesn't - there's a lot of cases with pointer arguments where we have input and output nodes for the pointed-to values that share a location with the argument.

Good to know. Then the question is, whether it's unique enough for what we're doing here, or whether there's something that better identifies a data flow node.

Note that in this case it will only be a problem if you have two snd nodes with the same location that are also in interestingConcatenation(_, snd), which is probably unlikely in practice? If not, appending the toString() representation to the location may be enough to disambiguate.

select sinkAsArgumentIndirection(sinkNode.getNode()), TPathNode1(sourceNode).(MergedPathNode),
TPathNode2(sinkNode).(MergedPathNode),
shellCommand(sinkAsArgumentIndirection(sinkNode.getNode()), callChain) and
concatResult = sinkNode.getState().(ExecState).getSndNode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you only care about the snd node, there should be no need to record the fst node in the flow state as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I'm probably unnecessarily conservative there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retract my statement. Turns out there is a case where fst does matter. See the test19 I added. When I omit fst, 3 paths instead of 2 paths are reported involving l. 220. The same happens when I move interestingConcatenation from ExecState to isAdditionalTaintStep.

@jketema
Copy link
Contributor Author

jketema commented Mar 21, 2022

If not, appending the toString() representation to the location may be enough to disambiguate.

Indeed, also because that + the location is what the user eventually will see.

@MathiasVP
Copy link
Contributor

MathiasVP commented Mar 21, 2022

whether the location of a dataflow node uniquely determines the node, which I depend upon

It doesn't - there's a lot of cases with pointer arguments where we have input and output nodes for the pointed-to values that share a location with the argument.

And more generally, PostUpdateNodes piggyback on their pre update nodes for locations as well.

If the idOf trick worked on IPA types we could use that here, but sadly that feature is still unavailable.

If not, appending the toString() representation to the location may be enough to disambiguate.

Indeed, also because that + the location is what the user eventually will see.

Yeah, that's probably good enough. A couple of notes on this:

  • Many nodes will definitely have the same toString because we try quite hard to reduce the number of unique strings for performance reasons, but when combined with locations it's probably okay.
  • We don't like having query logic depend on the implementation of the toString. We even have a ql-for-ql query for this. So it's a bit of a shame that we have to do this.

@jketema
Copy link
Contributor Author

jketema commented Mar 21, 2022

DCA Wireshark observations without any of the discussed changes around toString:

  • cpp/cleartext-storage-buffer and cpp/command-line-injection get faster. The two DCA experiments show about the same numbers here, although the data from the second experiment if very messy.
  • cpp/tainted-format-string get slower (goes from 12 to 20 seconds)

Also:

  • The false positives in vim that are due to the execution stacks not matching up on the old version of the query disappear

@jketema
Copy link
Contributor Author

jketema commented Mar 21, 2022

Many nodes will definitely have the same toString because we try quite hard to reduce the number of unique strings for performance reasons, but when combined with locations it's probably okay.

Updated accordingly.

We don't like having query logic depend on the implementation of the toString. We even have a ql-for-ql query for this. So it's a bit of a shame that we have to do this.

Is this run on PRs? I didn't explicitly write toString() and don't see a warning.

@MathiasVP
Copy link
Contributor

Is this run on PRs? I didn't explicitly write toString() and don't see a warning.

It's only at @precision medium - so it's not part of the Code Scanning suite.

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I accidentally committed my own suggestion. Sorry about that!)

In any case, this LGTM!

@MathiasVP
Copy link
Contributor

The Code Scanning errors are unrelated. So I'll just go ahead and merge this.

@MathiasVP MathiasVP merged commit 0eab54d into github:main Mar 23, 2022
@jketema jketema deleted the command-line-injection-with-flow-state branch April 8, 2022 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants