Skip to content

Conversation

@alexrford
Copy link
Contributor

@alexrford alexrford commented Feb 2, 2023

This mostly affects JS, but also changes some behaviour for Ruby and Python as well. This change should allow us to add the Python version of weak-cryptographic-algorithm as a (new) JS query.

The JS and Ruby/Python versions of CryptographicOperation were pretty similar, with 2 main differences:

  • getInput() in JS vs getAnInput() in Ruby/Python
  • JS lacked the BlockMode getBlockMode() predicate

I added a deprecated getInput() predicate to the shared concept for the first difference, and implemented getBlockMode() for the existing JS CryptographicOperation implementations. There were 10 implementations in CryptoLibraries.qll, but only 4 of these libraries seemed to support encryption using block ciphers.

There are a couple of other changes that affect all languages.

  • The first is the addition of a BlockMode#matchesString(string) predicate for convenience
  • The other is a pair of more significant change to make CryptographicAlgorithm#matchesName(string).
    • We now consider splitting the candidate name on underscores - e.g. asmCrypto uses "AES_OFB" as an algorithm name to specify AES encryption using the OFB block mode - previously matchesString would have just read this as "AESOFB" and not matched against AES
    • matchesName now only holds when the algorithm is the "most specific" matching algorithm - in practice this means the matching algorithm with the longest name. This is specifically for things like "SHA3_224" which has a valid match as both SHA3 and SHA3_224, but the latter is a more precise match. This is relevant for e.g. https://pycryptodome.readthedocs.io/en/latest/src/hash/sha3_224.html

@alexrford alexrford added JS Python no-change-note-required This PR does not need a change note Ruby labels Feb 2, 2023
@github-actions github-actions bot added the ATM label Feb 2, 2023
@alexrford alexrford marked this pull request as ready for review February 3, 2023 15:18
@alexrford alexrford requested a review from a team February 3, 2023 15:18
@alexrford alexrford requested review from a team as code owners February 3, 2023 15:18
@erik-krogh erik-krogh self-requested a review February 3, 2023 16:45
Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some minor comments.

But we need a loud change-note, and some more instructions on how to migrate to the new code (both in change-note and QLDoc).

/**
* An application of a cryptographic algorithm.
*/
abstract class CryptographicOperation extends DataFlow::Node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is problematic.
You are changing an abstract class to a non abstract class that instead uses the ::Range pattern.

This will at minimum require a loud change-note.
CryptographicOperation is on the "global" scope in JavaScript, so we can't leave a deprecated class behind.
Anyone overridding the old CryptographicOperation class will get deprecation warnings, so I think it's good enough with a change-note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a change note that hopefully explains the required changes to any external implementations of CryptographicOperation clearly enough.

DataFlow::Node getAnInput() { result = super.getAnInput() }

/** DEPRECATED. This predicate has been renamed to `getAnInput`. */
deprecated final DataFlow::Node getInput() { result = super.getInput() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause compile-errors for any external users that have extended this.
(Also see my former comment about CryptographicOperation and change-note).

I think we'll have to use breaking in the change-note.

I think the QLDoc should loudly explain what to do, which is both extend getAnInput instead, and make sure you extend CryptographicOperation::Range instead of CryptographicOperation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the QLDoc to mention this (the predicate itself has moved to Concepts.qll to avoid introducing a deprecated predicate to Ruby/Py as well). It's a little bit clumsy as the deprecation message has to read properly for both users and implementers of CryptographicOperation#getInput().

abstract DataFlow::Node getAnInput();

/** DEPRECATED. This predicate has been renamed to `getAnInput`. */
deprecated final DataFlow::Node getInput() { result = this.getAnInput() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CryptographicOperation::Range#getInput() has been removed - I don't think there's a need to include this as the ::Range class is new for JS.

@calumgrant calumgrant requested a review from RasmusWL February 6, 2023 10:29
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me 👍 Thanks for moving our languages closer to each other 💪

I'm wondering whether the changes in b968b59 to matchesName is important for anything, or it just seemed like a good idea? -- We've previously seen performance problems with the Python analysis when doing too much string manipulation, so that is why I'm interested in this bit in particular (and would at least like to see a performance evaluation before giving a final approval for this)

Comment on lines 46 to 47
/** DEPRECATED. This predicate has been renamed to `getAnInput`. */
deprecated final DataFlow::Node getInput() { result = super.getInput() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: We don't need to add this deprecated predicate to the shared file, it's possible to only have it part of JS like we've done for Ruby in the past (example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it seems worth avoiding adding a deprecated predicate for all languages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this from Ruby/Py in c7aaad9. The explicit aliases feel a bit clumsy, but I'm not aware of a good way to avoid them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes those aliases are a bit clumsy. I wonder if the new shadowing mechanism in QL would allow us to do it more nicely 🤔

* is a stream cipher rather than a block cipher.
*/
abstract BlockMode getBlockMode();
BlockMode getBlockMode() { none() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in other comments, it's possible to do this change of making it non-abstract for JS only.

My default stance on this is that it's nicer if the predicates you're supposed to implement are abstract, since that means you won't forget to do it. However, there is also part of me that is annoyed at having to implement getBlockMode for hashing cryptographic operations...

So I guess now is a good time to think about whether we want to change this to have a default of none() for all languages or not. What are your own thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been uncertain on this as the need to implement this for hashing operations is irritating. I'd briefly considered creating separate EncryptionOperation and HashingOperation (and maybe also PasswordHashingOperation) classes and only including the getBlockMode() predicate in EncryptionOperation, but I think this is maybe overcomplicating things since getAlgorithm() already works to restrict the operation to a particular kind of operation.

On reflection I think this change to have a default none() is a mistake. It's too easy to just forget to implement this predicate entirely if it's not abstract. I also don't think there's a need to make this change for JS specifically, since this is already a breaking change which will require changes to external implementations of CryptographicOperation.

My plan here is to change this back to being abstract (for all languages) and to improve the documentation on this to make it clearer when this should be implemented with a non-none() result (i.e. only when this is an encryption operation using a block cipher).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me 👍

@alexrford
Copy link
Contributor Author

alexrford commented Feb 15, 2023

I'm wondering whether the changes in b968b59 to matchesName is important for anything, or it just seemed like a good idea? -- We've previously seen performance problems with the Python analysis when doing too much string manipulation, so that is why I'm interested in this bit in particular (and would at least like to see a performance evaluation before giving a final approval for this)

There were some specific cases that I was trying to cover here. In asmCrypto (JS) we have:

asmCrypto.AES_OFB.encrypt(input, key, iv)

where the version of matchesName before this PR would try to match against the string AESOFB (as \w includes underscores) and would not match against AES as it should. I started to address this in 1435ef1, but found that it meant that Cryptodome (Py) had ambiguity in cases like:

from Cryptodome.Hash import SHA3_224

hasher = SHA3_224.new()

which would match against both SHA3 alone (split before the underscore) and SHA3224 (full string). Both are "correct", but including both means that getAlgorithm() doesn't necessarily have a unique result and it seems redundant to return SHA3 when SHA3224 is more precise, hence b968b59.

I'll start some DCA runs for all languages after a few changes are finalized here.

Copy link
Contributor

@erik-krogh erik-krogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS 👍

@alexrford alexrford requested a review from RasmusWL February 17, 2023 11:52
@alexrford
Copy link
Contributor Author

@RasmusWL I finally got together 3 successful DCA runs - I didn't notice any performance differences, though admittedly the test suites don't have that many projects.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks for making the changes. Performance evaluation looked fine as well 👍

@alexrford alexrford merged commit 7c85448 into github:main Feb 27, 2023
@alexrford alexrford deleted the js-use-shared-cryptography branch February 27, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants