C++: New query for 'Cleartext transmission of sensitive information' #6713

geoffw0 · 2021-09-17T14:05:39Z

Add a new query 'Cleartext transmission of sensitive information' (cpp/cleartext-transmission). This fills a gap in our existing queries 'Cleartext storage of sensitive information in file' (cpp/cleartext-storage-file), 'Cleartext storage of sensitive information in buffer' (cpp/cleartext-storage-buffer) and 'Cleartext storage of sensitive information in an SQLite database' (cpp/cleartext-storage-database) - and allows us to catch the SAMATE Juliet tests cases for CWE-319.

Results on LGTM: https://lgtm.com/query/8348355846815574667/ --- decent number of results, without being noisy on any particular project. Most results look worthwhile, though it's hard to judge which if any are actual security vulnerabilities without digging deep into the design of the software in question. The alibaba/AliSQL is notable as a likely FP, as the path goes through a function encrypt_and_write_file, which presumably does what it says. The result in openwall/john also hints at encryption being present.

Results on SAMATE Juliet tests: https://jenkins.internal.semmle.com/job/Security/job/SAMATE/job/SAMATE-cpp-detailed/163/ --- we catch all of the CWE-319 tests, but unfortunately produce a large number of FPs on that same CWE.

Results on DCA: https://github.com/github/codeql-dca-main/issues/285 --- 448 results in the SAMATE Juliet test, 1 result in git. The latter looks like a reasonable result, and also shows up in the LGTM run (where it's easier to view).

I've given the query @precision medium for now, with the hopes of fixing most of the FPs in a follow-up and upgrading it to high.

…Ps in SAMATE Juliet.

MathiasVP

This looks really good! I'm pretty happy with the code as-is, but I'll give the rest of the team some time to review it as well.

MathiasVP · 2021-09-17T14:09:44Z

cpp/ql/src/Security/CWE/CWE-311/CleartextTransmission.ql

+ *
+ * note: functions such as `write` may be writing to a network source or a file. We could attempt to determine which, and sort results into `cpp/cleartext-transmission` and perhaps `cpp/cleartext-storage-file`. In practice it usually isn't very important which query reports a result as long as its reported exactly once.
+ */
+class NetworkSend extends NetworkSendRecv {


Could we replace this whole class with RemoteFlowSinkFunction?

Yes! Ish...

First, this class is a call to a RemoteFlowSinkFunction/RemoteFlowSourceFunction, not a RemoteFlowSinkFunction/RemoteFlowSourceFunction itself, so I'll still have to wrap those - unless there's a cleaner way to wire it up. But this does avoid duplicating the list of functions, which is a good thing.

Second, RemoteFlowSourceFunction doesn't have an equivalent of getSocketExpr() for ruling out reads from standard input. We could have RemoteFlowSourceFunction expose this information or even do the work to rule out these sources itself?

Third, RemoteFlowSourceFunction also includes getEnv (I'm not sure why, I thought the whole point was to be able to specify sources a bit more precisely). So ideally we'd use something finer that doesn't include that (but Recv is currently private).

Second, RemoteFlowSourceFunction doesn't have an equivalent of getSocketExpr() for ruling out reads from standard input. We could have RemoteFlowSourceFunction expose this information or even do the work to rule out these sources itself?

This seems like a thing we would want to do in RemoteFlowSourceFunction, yeah.

Third, RemoteFlowSourceFunction also includes getEnv (I'm not sure why, I thought the whole point was to be able to specify sources a bit more precisely). So ideally we'd use something finer that doesn't include that (but Recv is currently private).

I think getEnv is only included in LocalFlowSourceFunction, right? That's at least what I see here.

First, this class is a call to a RemoteFlowSinkFunction/RemoteFlowSourceFunction, not a RemoteFlowSinkFunction/RemoteFlowSourceFunction itself, so I'll still have to wrap those - unless there's a cleaner way to wire it up. But this does avoid duplicating the list of functions, which is a good thing.

We have RemoteFlowSource wrapping RemoteFlowSourceFunction in semmle.code.cpp.models.interfaces.FlowSource, but no equivalent for sinks.

Could we replace this whole class with RemoteFlowSinkFunction?
Could we replace this whole file with RemoteFlowSourceFunction?

I've updated the code to (more-or-less) do this.

The following are considered a RemoteFlowSourceFunction (# = not already modelled by NetworkRecv): recv, recvfrom, recvmsg, read, pread, readv, preadv, preadv2#, gets#, fgets#, fgetws#, getdelim#, getwdelim#, __getdelim#, fread#. Some of the new functions read from a socket, some from a FILE *, and I'm a bit concerned that we'll get overlaps with cpp/cleartext-storage-file from the latter. They're differently structured queries though so I'm not sure this will really be an issue in practice. Probably best to fix it when we see it.

The following are considered a RemoteFlowSinkFunction (all already modelled by NetworkSend): send, sendto, sendmsg, write, writev, pwritev, pwritev2. So no problems there.

We have RemoteFlowSource wrapping RemoteFlowSourceFunction in semmle.code.cpp.models.interfaces.FlowSource, but no equivalent for sinks.

This looks promising, but it's in the security.FlowSources lib which conflicts with dataflow.TaintTracking (and as you point out RemoteFlowSink would need to be added). This PR has enough in it that I won't do this change here, but if you feel strongly I can do it as a follow-up?

Second, RemoteFlowSourceFunction doesn't have an equivalent of getSocketExpr() for ruling out reads from standard input. We could have RemoteFlowSourceFunction expose this information or even do the work to rule out these sources itself?

This seems like a thing we would want to do in RemoteFlowSourceFunction, yeah.

It turns out RemoteFlowSourceFunction can't rule out stdin/stdout by itself, as it's modelling the function not the call and thus has no idea of parameter values. We could potentially do this work in an improved RemoteFlowSource class at some point. But for now I've added a hasSocketInput predicate to the two models so that users can tell which parameter is the socket (if any).

I think getEnv is only included in LocalFlowSourceFunction, right?

Correct, thanks.

TL;DR: I've updated the query to use the RemoteFlowSourceFunction and RemoteFlowSinkFunction models. I believe all points above have been addressed, apart from the possible follow-up PR improving RemoteFlowSource.

One of the reasons we generally prefer modelling Function rather than Call is that a Function model also applies directly to IR dataflow, whereas a Call model has to be translated. Of course, a Call model can have more details. The hybrid you've arrived at here LGTM.

It might be better (at least, less code) to join the NetworkSend and NetworkReceive parts at the Function level, then look at calls further down. I'll have a go at a refactor...

It doesn't make much difference to code quantity or clarity IMO. The awkward bit remains converting from a FunctionInput / FunctionOutput to an Expr or DataFlow::Node. Is there a reason we don't have predicates to do this in FunctionInputsAndOutputs.qll?

AFAIK there is no reason why we don't have this logic centralized in FunctionInputsAndOutputs.qll, no. I'll open an issue for it.

cpp/ql/src/Security/CWE/CWE-311/CleartextTransmission.ql

geoffw0 · 2021-09-24T15:51:36Z

As discussed elsewhere, the plan is to merge this after the feature freeze (Monday).

cpp/ql/src/Security/CWE/CWE-311/CleartextStorage.inc.qhelp

jbj · 2021-09-27T13:31:58Z

cpp/ql/src/Security/CWE/CWE-311/CleartextTransmission.ql

+      // a zero socket descriptor is standard input, which is not interesting for this query.
+      not exists(Zero zero |
+        DataFlow::localFlow(DataFlow::exprNode(zero),
+          DataFlow::exprNode(transmission.getSocketExpr()))
+      )


Isn't the same problem present on the transmission side? Writing to stdout and stderr would be file descriptors 1 and 2, respectively.

Yes absolutely. Addressing this is on my list of things to do in the CWE-319 ticket, but I don't think there's an existing library class to replace Zero with that will just do it. There is some configurability in Nullness.qll (that defines Zero), but it might be better to use GVN or dataflow instead.

I think it would be fine to exclude any constant-valued socket expression. The ones we care about here are always created from calling socket or similar functions, right?

Yes, I think that's sensible. Not only does it avoid a list of cases, any constant we're unaware of is likely to be something special we shouldn't make assumptions about.

jbj · 2021-09-27T13:35:43Z

cpp/ql/src/Security/CWE/CWE-311/CleartextTransmission.ql

+ *
+ * note: functions such as `write` may be writing to a network source or a file. We could attempt to determine which, and sort results into `cpp/cleartext-transmission` and perhaps `cpp/cleartext-storage-file`. In practice it usually isn't very important which query reports a result as long as its reported exactly once.
+ */
+class NetworkSend extends NetworkSendRecv {


One of the reasons we generally prefer modelling Function rather than Call is that a Function model also applies directly to IR dataflow, whereas a Call model has to be translated. Of course, a Call model can have more details. The hybrid you've arrived at here LGTM.

jbj · 2021-09-27T14:15:10Z

I really like the LGTM results for this query even though there are a few FPs. I'm guessing most projects don't realize they've vendored code that will do unencrypted auth to an HTTP server or a proxy server. Maybe that seemed fine 10 years ago.

Co-authored-by: Jonas Jensen <jbj@github.com>

geoffw0 · 2021-10-01T08:42:13Z

Fixed autoformat. There are a couple of things that came up in discussions to be addressed as follow-up, but I believe this PR itself is ready to merge.

MathiasVP

LGTM! (assuming it passes CI)

geoffw0 added 12 commits September 6, 2021 18:11

C++: Add CleartextTransmission query.

2463024

C++: Add basic test.

cd5a534

C++: Test dataflow and other slightly more complex cases.

29ad3bf

C++: Support 'send' as well.

1707d67

C++: Support various functions / variants.

3ba9e80

C++: Fix false positives involving STDIN_FILENO.

e696eaa

C++: Full dataflow version.

f58177f

C++: Upgrade to path problem.

ee7ccd7

C++: Add a test demonstrating taint.

0e8064d

C++: We get many more real world results using taint tracking.

67c6b35

C++: Assign precision and severity; medium for now, since there are F…

a3de94e

…Ps in SAMATE Juliet.

CPP: Fix QLDoc comments.

90bc138

geoffw0 added the C++ label Sep 17, 2021

geoffw0 requested a review from a team as a code owner September 17, 2021 14:05

github-actions bot added the documentation label Sep 17, 2021

C++: Change note.

5124345

MathiasVP reviewed Sep 17, 2021

View reviewed changes

geoffw0 added 2 commits September 17, 2021 16:04

Merge branch 'main' into cwe139

24668b2

C++: Accept subpaths in tests.

e7c82d7

geoffw0 force-pushed the cwe139 branch from 34e52eb to e7c82d7 Compare September 17, 2021 15:54

geoffw0 added 2 commits September 24, 2021 15:12

C++: Naive translation to use RemoteFlow*Function.

9f59bc8

C++: Add and use getRemoteSocket predicates.

6901d9d

jbj reviewed Sep 27, 2021

View reviewed changes

geoffw0 and others added 3 commits September 28, 2021 15:13

Update cpp/ql/src/Security/CWE/CWE-311/CleartextStorage.inc.qhelp

10323ac

Co-authored-by: Jonas Jensen <jbj@github.com>

C++: Correct comment.

89098f5

C++: Autoformat.

679b0f9

MathiasVP approved these changes Oct 1, 2021

View reviewed changes

MathiasVP merged commit a3cf721 into github:main Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: New query for 'Cleartext transmission of sensitive information' #6713

C++: New query for 'Cleartext transmission of sensitive information' #6713

geoffw0 commented Sep 17, 2021

MathiasVP left a comment

MathiasVP Sep 17, 2021 •

edited

Loading

geoffw0 Sep 17, 2021

MathiasVP Sep 17, 2021 •

edited

Loading

rdmarsh2 Sep 17, 2021

geoffw0 Sep 24, 2021

jbj Sep 27, 2021

geoffw0 Sep 28, 2021

geoffw0 Sep 28, 2021

MathiasVP Oct 1, 2021

geoffw0 commented Sep 24, 2021

jbj Sep 27, 2021

geoffw0 Sep 28, 2021

MathiasVP Sep 28, 2021

geoffw0 Sep 28, 2021

jbj Sep 27, 2021

jbj commented Sep 27, 2021

geoffw0 commented Oct 1, 2021

MathiasVP left a comment

C++: New query for 'Cleartext transmission of sensitive information' #6713

C++: New query for 'Cleartext transmission of sensitive information' #6713

Conversation

geoffw0 commented Sep 17, 2021

MathiasVP left a comment

Choose a reason for hiding this comment

MathiasVP Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MathiasVP Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

geoffw0 commented Sep 24, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbj commented Sep 27, 2021

geoffw0 commented Oct 1, 2021

MathiasVP left a comment

Choose a reason for hiding this comment

MathiasVP Sep 17, 2021 •

edited

Loading

MathiasVP Sep 17, 2021 •

edited

Loading