Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

Merged
merged 8 commits into from
Dec 5, 2023

Conversation

NagyDonat
Copy link
Contributor

@NagyDonat NagyDonat commented Nov 13, 2023

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt<...> callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by steakhal. Those reviews were both abandoned, but the problems that led to abandonment were unrelated to the change that is introduced in this PR.

...instead of the currently used, more abstract Location callback. The
main advantage of this change is that after it the checker will check
`array[index].field` while the previous implementation ignored this
situation (because here the ElementRegion is wrapped in a FieldRegion
object). This improvement fixes PR llvm#70187.

Note that after this change `&array[idx]` will be handled as an access
to the `idx`th element of `array`, which is technically incorrect but
matches the programmer intuitions. In my opinion it's more helpful if
the report points to the source location where the indexing happens
(instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case
when the past-the-end pointer of an array is formed as `&arr[length]`. I
think this is just unimportant pedantery (because it's cleaner to write
the past-the-end pointer as `(arr+length)` and that form isn't affected
by this checker), but if it does appear in real code, then we could
introduce a special case for it.

In addition to this primary improvement, this change tweaks the message
for the tainted index/offset case (using the more concrete information
that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing `check::Location` with
`check::PostStmt<...>` callbacks) was already proposed in my change
https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by
@steakhal. Those reviews were both abandoned, but the issues were
unrelated to the change that is introduced in this PR.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Nov 13, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Nov 13, 2023

@llvm/pr-subscribers-clang

Author: None (DonatNagyE)

Changes

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &amp;array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &amp;arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt&lt;...&gt; callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.


Full diff: https://github.com/llvm/llvm-project/pull/72107.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+60-36)
  • (modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+67-7)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+2-2)
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
 namespace {
 enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-class ArrayBoundCheckerV2 :
-    public Checker<check::Location> {
+struct Messages {
+  std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+                                           check::PostStmt<UnaryOperator>,
+                                           check::PostStmt<MemberExpr>> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
+  void performCheck(const Expr *E, CheckerContext &C) const;
+
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
-                 NonLoc Offset, std::string RegName, std::string Msg) const;
+                 NonLoc Offset, Messages Msgs) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
 
 public:
-  void checkLocation(SVal l, bool isLoad, const Stmt *S,
-                     CheckerContext &C) const;
+  void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+    performCheck(E, C);
+  }
+  void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+    if (E->getOpcode() == UO_Deref)
+      performCheck(E, C);
+  }
+  void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+    if (E->isArrow())
+      performCheck(E->getBase(), C);
+  }
 };
+
 } // anonymous namespace
 
 /// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
   return formatv(ShortMsgTemplates[Kind], RegName);
 }
 
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+  std::string RegName = getRegionName(Region);
   SmallString<128> Buf;
   llvm::raw_svector_ostream Out(Buf);
   Out << "Access of " << RegName << " at negative byte offset";
   if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
     Out << ' ' << ConcreteIdx->getValue();
-  return std::string(Buf);
+  return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
 }
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
-                                 NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+                               NonLoc Offset, NonLoc Extent, SVal Location) {
+  std::string RegName = getRegionName(Region);
   const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
   assert(EReg && "this checker only handles element access");
   QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
       Out << "s";
   }
 
-  return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
-  SmallString<128> Buf;
-  llvm::raw_svector_ostream Out(Buf);
-  Out << "Access of " << RegName
-      << " with a tainted offset that may be too large";
-  return std::string(Buf);
+  return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
-                                        const Stmt *LoadS,
-                                        CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+  std::string RegName = getRegionName(Region);
+  return {formatv("Potential out of bound access to {0} with tainted {1}",
+                  RegName, OffsetName),
+          formatv("Access of {0} with a tainted {1} that may be too large",
+                  RegName, OffsetName)};
+}
 
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
   // NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
   // some new logic here that reasons directly about memory region extents.
   // Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
   // have some flexibility in defining the base region, we can achieve
   // various levels of conservatism in our buffer overflow checking.
 
+  const SVal Location = C.getSVal(E);
+
   // The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
   //   #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
   // and incomplete analysis of these leads to false positives. As even
   // accurate reports would be confusing for the users, just disable reports
   // from these macros:
-  if (isFromCtypeMacro(LoadS, C.getASTContext()))
+  if (isFromCtypeMacro(E, C.getASTContext()))
     return;
 
   ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
     if (PrecedesLowerBound && !WithinLowerBound) {
       // We know that the index definitely precedes the lower bound.
-      std::string RegName = getRegionName(Reg);
-      std::string Msg = getPrecedesMsg(RegName, ByteOffset);
-      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+      Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
       return;
     }
 
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
     if (ExceedsUpperBound) {
       if (!WithinUpperBound) {
         // We know that the index definitely exceeds the upper bound.
-        std::string RegName = getRegionName(Reg);
-        std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
-                                        *KnownSize, Location);
-        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+        Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+                                       *KnownSize, Location);
+        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
         return;
       }
       if (isTainted(State, ByteOffset)) {
-        // Both cases are possible, but the index is tainted, so report.
+        // Both cases are possible, but the offset is tainted, so report.
         std::string RegName = getRegionName(Reg);
-        std::string Msg = getTaintMsg(RegName);
-        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+        // Diagnostic detail: "tainted offset" is always correct, but the
+        // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+        // nicer to say "tainted index".
+        const char *OffsetName = "offset";
+        if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+          if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+            OffsetName = "index";
+
+        Messages Msgs = getTaintMsgs(Reg, OffsetName);
+        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
         return;
       }
     }
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
 void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, OOB_Kind Kind,
-                                    NonLoc Offset, std::string RegName,
-                                    std::string Msg) const {
+                                    NonLoc Offset, Messages Msgs) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
-  std::string ShortMsg = getShortMsg(Kind, RegName);
-
   auto BR = std::make_unique<PathSensitiveBugReport>(
-      Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+      Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
 
   // Track back the propagation of taintedness.
   if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
   // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
 }
 
+int underflowWithDeref(void) {
+  int *p = array;
+  --p;
+  return *p;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
 int scanf(const char *restrict fmt, ...);
 
 void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
   // expected-note@-1 {{Taint originated here}}
   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+  // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  int *p = array + index;
+  p[0] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
   // expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
 }
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
   // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
 }
 
+void flippedOverflow(void) {
+  12[array] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+  // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+  // I think people who rely on this confusing corner case deserve a warning,
+  // but if this is widespread, then we could introduce a special case for it
+  // in the checker.
+  return &array[10];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+  // This is UB, it's invalid to form an after-after-the-end pointer.
+  return &array[11];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
 int scalar;
 int scalarOverflow(void) {
   return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
   // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
 }
 
-short convertedArray(void) {
-  return ((short*)array)[47];
-  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
-  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
 struct vec {
   int len;
   double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
   // expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
 }
 
+struct item {
+  int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+  return itemArray[35].a;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+  return (itemArray + 35)->b;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
 struct two_bytes {
   char lo, hi;
 };
@@ -85,7 +143,9 @@ int intFromString(void) {
 }
 
 int intFromStringDivisible(void) {
-  // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+  // However, this is reported with indices/elements, because the extent
+  // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+  // multiple of 4 bytes (= sizeof(int)).
   return ((const int*)"abc")[20];
   // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
   // expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
                        // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
-                       // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+                       // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
 }
 
 int taintDiagnosticDivZero(int operand) {

@llvmbot
Copy link
Collaborator

llvmbot commented Nov 13, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (DonatNagyE)

Changes

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &amp;array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &amp;arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt&lt;...&gt; callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.


Full diff: https://github.com/llvm/llvm-project/pull/72107.diff

3 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+60-36)
  • (modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+67-7)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+2-2)
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
 namespace {
 enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-class ArrayBoundCheckerV2 :
-    public Checker<check::Location> {
+struct Messages {
+  std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+                                           check::PostStmt<UnaryOperator>,
+                                           check::PostStmt<MemberExpr>> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
+  void performCheck(const Expr *E, CheckerContext &C) const;
+
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
-                 NonLoc Offset, std::string RegName, std::string Msg) const;
+                 NonLoc Offset, Messages Msgs) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
 
 public:
-  void checkLocation(SVal l, bool isLoad, const Stmt *S,
-                     CheckerContext &C) const;
+  void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+    performCheck(E, C);
+  }
+  void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+    if (E->getOpcode() == UO_Deref)
+      performCheck(E, C);
+  }
+  void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+    if (E->isArrow())
+      performCheck(E->getBase(), C);
+  }
 };
+
 } // anonymous namespace
 
 /// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
   return formatv(ShortMsgTemplates[Kind], RegName);
 }
 
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+  std::string RegName = getRegionName(Region);
   SmallString<128> Buf;
   llvm::raw_svector_ostream Out(Buf);
   Out << "Access of " << RegName << " at negative byte offset";
   if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
     Out << ' ' << ConcreteIdx->getValue();
-  return std::string(Buf);
+  return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
 }
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
-                                 NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+                               NonLoc Offset, NonLoc Extent, SVal Location) {
+  std::string RegName = getRegionName(Region);
   const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
   assert(EReg && "this checker only handles element access");
   QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
       Out << "s";
   }
 
-  return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
-  SmallString<128> Buf;
-  llvm::raw_svector_ostream Out(Buf);
-  Out << "Access of " << RegName
-      << " with a tainted offset that may be too large";
-  return std::string(Buf);
+  return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
-                                        const Stmt *LoadS,
-                                        CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+  std::string RegName = getRegionName(Region);
+  return {formatv("Potential out of bound access to {0} with tainted {1}",
+                  RegName, OffsetName),
+          formatv("Access of {0} with a tainted {1} that may be too large",
+                  RegName, OffsetName)};
+}
 
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
   // NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
   // some new logic here that reasons directly about memory region extents.
   // Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
   // have some flexibility in defining the base region, we can achieve
   // various levels of conservatism in our buffer overflow checking.
 
+  const SVal Location = C.getSVal(E);
+
   // The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
   //   #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
   // and incomplete analysis of these leads to false positives. As even
   // accurate reports would be confusing for the users, just disable reports
   // from these macros:
-  if (isFromCtypeMacro(LoadS, C.getASTContext()))
+  if (isFromCtypeMacro(E, C.getASTContext()))
     return;
 
   ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
     if (PrecedesLowerBound && !WithinLowerBound) {
       // We know that the index definitely precedes the lower bound.
-      std::string RegName = getRegionName(Reg);
-      std::string Msg = getPrecedesMsg(RegName, ByteOffset);
-      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+      Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
       return;
     }
 
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
     if (ExceedsUpperBound) {
       if (!WithinUpperBound) {
         // We know that the index definitely exceeds the upper bound.
-        std::string RegName = getRegionName(Reg);
-        std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
-                                        *KnownSize, Location);
-        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+        Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+                                       *KnownSize, Location);
+        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
         return;
       }
       if (isTainted(State, ByteOffset)) {
-        // Both cases are possible, but the index is tainted, so report.
+        // Both cases are possible, but the offset is tainted, so report.
         std::string RegName = getRegionName(Reg);
-        std::string Msg = getTaintMsg(RegName);
-        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+        // Diagnostic detail: "tainted offset" is always correct, but the
+        // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+        // nicer to say "tainted index".
+        const char *OffsetName = "offset";
+        if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+          if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+            OffsetName = "index";
+
+        Messages Msgs = getTaintMsgs(Reg, OffsetName);
+        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
         return;
       }
     }
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
 void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, OOB_Kind Kind,
-                                    NonLoc Offset, std::string RegName,
-                                    std::string Msg) const {
+                                    NonLoc Offset, Messages Msgs) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
-  std::string ShortMsg = getShortMsg(Kind, RegName);
-
   auto BR = std::make_unique<PathSensitiveBugReport>(
-      Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+      Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
 
   // Track back the propagation of taintedness.
   if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
   // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
 }
 
+int underflowWithDeref(void) {
+  int *p = array;
+  --p;
+  return *p;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
 int scanf(const char *restrict fmt, ...);
 
 void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
   // expected-note@-1 {{Taint originated here}}
   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+  // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  int *p = array + index;
+  p[0] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
   // expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
 }
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
   // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
 }
 
+void flippedOverflow(void) {
+  12[array] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+  // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+  // I think people who rely on this confusing corner case deserve a warning,
+  // but if this is widespread, then we could introduce a special case for it
+  // in the checker.
+  return &array[10];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+  // This is UB, it's invalid to form an after-after-the-end pointer.
+  return &array[11];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
 int scalar;
 int scalarOverflow(void) {
   return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
   // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
 }
 
-short convertedArray(void) {
-  return ((short*)array)[47];
-  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
-  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
 struct vec {
   int len;
   double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
   // expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
 }
 
+struct item {
+  int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+  return itemArray[35].a;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+  return (itemArray + 35)->b;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
 struct two_bytes {
   char lo, hi;
 };
@@ -85,7 +143,9 @@ int intFromString(void) {
 }
 
 int intFromStringDivisible(void) {
-  // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+  // However, this is reported with indices/elements, because the extent
+  // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+  // multiple of 4 bytes (= sizeof(int)).
   return ((const int*)"abc")[20];
   // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
   // expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
                        // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
-                       // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+                       // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
 }
 
 int taintDiagnosticDivZero(int operand) {

@NagyDonat NagyDonat removed the clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html label Nov 13, 2023
@NagyDonat
Copy link
Contributor Author

I ran an analysis that compares this commit with its parent on many open source projects. The results revealed that this commit "converts" many alpha.security.ArrayBound (V1) results into alpha.security.ArrayBoundV2 results because (if I understand it correctly) the new PostStmt callbacks fire before the Location callback used by the V1 checker (while it seems that previously the order of the tied Location callbacks was resolved in the favor of the V1 checker).

This is a mostly irrelevant effect because I presume that the users don't want to enable both of the ArrayBound checkers at the same time (I was testing with options that enable almost all alpha checkers, but that's not a "normal" config).

For the sake of completeness I'm pasting the result links for this first run where the diff is polluted by all the "lost report for V1, new report for V2" differences; but I started a clean run with -d alpha.security.ArrayBound (I'll post the results soon) and I'll do a more detailed evaluation on that.

Project New Reports Lost Reports
memcached new reports lost reports
tmux new reports lost reports
curl new reports lost reports
twin new reports lost reports
vim new reports lost reports
openssl new reports lost reports
sqlite new reports lost reports
ffmpeg new reports lost reports
postgres new reports lost reports
tinyxml2 new reports lost reports
libwebm new reports lost reports
xerces new reports lost reports
bitcoin new reports lost reports
protobuf new reports lost reports
qtbase new reports lost reports
contour new reports lost reports
acid new reports lost reports
openrct2 new reports lost reports

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Nov 16, 2023

I evaluated a mostly [1] clean analysis run and it reveals that this change has almost no effects when CSA analyses stable open source code (this is the expected behavior, stable code doesn't contain out of bounds memory access). The only differences are apparently caused by the inherent instability of the exploded graph traversal:

Project New Reports Lost Reports Evaluation
memcached new reports lost reports no change
tmux new reports lost reports no change
curl new reports lost reports no change
twin new reports lost reports no change
vim new reports lost reports no change
openssl new reports lost reports no change
sqlite new reports lost reports no change
ffmpeg new reports lost reports one alpha.core.Conversion report lost, presumably due to graph perturbations
postgres new reports lost reports no change
tinyxml2 new reports lost reports no change
libwebm new reports lost reports no change
xerces new reports lost reports no change
bitcoin new reports lost reports no change
protobuf new reports lost reports no change
qtbase new reports lost reports two convoluted core.DivideZero reports are replaced by one that's very similar to them

[1] During the analysis of contour both the baseline (main snapshot created a few days ago) and the new revision crashed with the message fatal error: error in backend: Z3 error: out of memory (at slightly different steps of the analysis). As this crash seems to be non-deterministic, environment-dependent and not affected by the commit under review, I think that we can safely ignore it. As this interrupted the CI job, I have no results for contour, acid and openrct2, but I think the rest of the projects are already sufficient to evaluate this patch.

@Xazax-hun
Copy link
Collaborator

Note that &array[idx] is perfectly valid code when idx == number of elements. And it is relatively common to do that when one is using STL algorithms on arrays:

auto it = std::find(&array[0], &array[size], foo);

Of course, one could use the begin/end free functions, but those are only available since C++11.

Could you elaborate on alternative approaches you considered fixing the problem and why you chose this one? E.g., would trying to look at the parent regions for expressions like foo[idx].bar work? Or is the source of the problem that you'd also need the exact expression for the subscript instead of the MemberExpr?

Alternatively, would it be possible to suppress warnings on the common pattern &array[idx] by checking the parent of the subscript expression in the AST (but still emitting a warning when the pointer is dereferenced)?

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Nov 24, 2023

Note that &array[idx] is perfectly valid code when idx == number of elements. And it is relatively common to do that when one is using STL algorithms on arrays:

auto it = std::find(&array[0], &array[size], foo);

Of course, one could use the begin/end free functions, but those are only available since C++11.

Oh, dammit, this language is stupid... I couldn't imagine a reason to use &array[size] instead of the cleaner (array + size); but of course STL containers need &array[size] with their overloaded operator[] (note: this is not an ArraySubscriptExpr but an overloaded operator call, so it's not handled by this checker) and then (if you say so, I don't have experience with pre-C++11 code that uses both STL and plain arrays) some developers will use this for plain arrays as well...

Could you elaborate on alternative approaches you considered fixing the problem and why you chose this one? E.g., would trying to look at the parent regions for expressions like foo[idx].bar work? Or is the source of the problem that you'd also need the exact expression for the subscript instead of the MemberExpr?

I considered two approaches, this one and walking on the parent region layers (I don't think that there is a third distinct approach). I chose this because:

  • This way the implementation is easier to understand and more "connected" to the concrete details of the analyzed code.
  • A few months ago @steakhal realized that check::Location is not sufficient on its own and it should've been supplemented with a check::Bind callback to cover all relevant cases. The commit implementing this (https://reviews.llvm.org/D159106) was abandoned, but if we keep check::Location, we would need to reintroduce something like that as well.
  • I wouldn't say that I need the exact expression for the subscript (or other dereferencing expression, like *ptr or ptr->foo), but I like that I can get it because this puts the warning message to the right source location (e.g. in a convoluted multiline expressions with several [] and member access layers) and allows/could allow better customization of the diagnostics.

Alternatively, would it be possible to suppress warnings on the common pattern &array[idx] by checking the parent of the subscript expression in the AST (but still emitting a warning when the pointer is dereferenced)?

Yes, that was my plan for resolving this issue; and as you say that it's needed, I'll implement it. I'll squeeze the parent lookup into the if where we know that there's an overflow, so that (AFAIK expensive) operation won't run in the common case when the array access is in bounds. The "still emitting a warning when the pointer is dereferenced" will happen automatically as at the dereferencing expression we'll have the same overflowing pointer but without the & that negates the result.

I think that I won't suppress the warning in situations like

int array[10];
int *f(int arg) {
  if (arg >= 10)
    return &array[arg];
  return array;
}

where the analyzer knows that idx >= size and doesn't know that idx == size. Are there any crazy situations where this is also legitimate?

@NagyDonat
Copy link
Contributor Author

NagyDonat commented Nov 24, 2023

I updated the PR to allow creating the after-the-end pointer of an array as &array[size] . My implementation interprets this corner case narrowly to limit the amount of false negatives: for example if (idx >= size) return &array[idx]; is reported as an overflow instead of very optimistically assuming that idx must be equal to size.

if (EqualsToThreshold && !NotEqualToThreshold) {
// We are definitely in the exceptional case, so return early
// instead of reporting a bug.
C.addTransition(EqualsToThreshold);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this transition needed? This checker should not add assumptions to the state, only check for conditions and add only error transitions. EqualsToThreshold probably does not contain new information compared to State.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right that EqualsToThreshold does not contain new information compared to State, but I'm using it because I'm trying to follow the pattern that I'm always using the most recent state variable.

On the other hand, the variable State may contain new information compared to the state of the node before the start of this callback: e.g. if we start with a range set like $[-\infty, -1] \cup \{ 10\}$ for the index variable, then we first test for underflow and update the variable State when we assume that there is no underflow.

I think it's valuable to record these assumptions with a state transition, because they improve the accuracy of the modeling. (Otherwise the analyzer could produce bug reports that rely on assumptions that contradict each other.) Currently the assumptions of this checker are added silently but I'll add note tags like "Assuming index is non-negative" for them in a followup commit.

@steakhal
Copy link
Contributor

I'm in favor of this change. I'll pull the patch downstream and report back how it performed.
Coming back to the &array[size] example, actually I believe that it's well-formed in C, but UB in C++, but I'm not a language lawyer. @shafik probably knows this better 😅
However, (array + size) should be well-formed in both C and C++.

@steakhal
Copy link
Contributor

FYI I edited the PR summary so that I'm not tagged there directly because if someone is tagged in a commit message on GH, that person will be notified each time that commit is pushed in public. Consequently, the tagged person gets spammed by GH from all the public forks each time they rebase and push this commit :D
Such a great feature, right?

@NagyDonat
Copy link
Contributor Author

@steakhal thanks for the checking and sorry for the unintentional spamming.

Such a great feature, right?

Just wonderful 😄

@steakhal
Copy link
Contributor

@steakhal thanks for the checking and sorry for the unintentional spamming.

Such a great feature, right?

Just wonderful 😄

To clarify, you did not spam me. I'm worried about merging a commit directly mentioning people. That would be the point when forks start to pick up the commit (and my name e.g.) and start spamming. I just wanted to raise awareness of this being a thing, and why I did that edit.

@NagyDonat
Copy link
Contributor Author

Note that (if I understand it correctly) after this change the ArrayBoundV2 check may be triggered at a slightly earlier point of the analysis and this may impact situations where both this checker and other checkers would emit a bug report. (See e.g. #72402 (comment) where I discuss a concrete candidate for this.)

@Xazax-hun
Copy link
Collaborator

I think another question is whether warning on code like int& val = arr[i] where val is not actually used on the path where i == size is OK. I would not block this PR on a case like this, but it might be nice to add a test and potentially a comment about the desired behavior.

@NagyDonat
Copy link
Contributor Author

@Xazax-hun I added a few testcases that show the current behavior of the checker w.r.t. invalid references. I think this is reasonably correct behavior, but I'm not too interested in the legality of these gray areas and I don't have strong opinions.

By the way, with the new PostStmt callbacks, "the result is assigned to a reference" is no longer a corner case. Perhaps it had a strange behavior with the old Location callback (because it performs an implicit address-of in some sense), but after this commit ArrayBoundV2 only looks for the syntactic presence of a (non-overloaded) subscript operator.

NagyDonat added a commit to Ericsson/llvm-project that referenced this pull request Dec 1, 2023
This commit extends the class `SValBuilder` with the methods
`getMinValue()` and `getMaxValue()` to that work like
`SValBuilder::getKnownValue()` but return the minimal/maximal possible
value the `SVal` is not perfectly constrained.

This extension of the ConstraintManager API is discussed at:
https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192

As a simple proof-of-concept application of this new API, this commit
extends a message from `core.BitwiseShift` with some range information
that reports the assumptions of the analyzer.

My main motivation for adding these methods is that I'll also want to
use them in `ArrayBoundCheckerV2` to make the error messages less
awkward, but I'm starting with this simpler and less important usecase
because I want to avoid merge conflicts with my other commit
llvm#72107 which is currently under
review.

The testcase `too_large_right_operand_compound()` shows a situation
where querying the range information does not work (and the extra
information is not added to the error message). This also affects the
debug utility `clang_analyzer_value()`, so the problem isn't in the
fresh code. I'll do some investigations to resolve this, but I think
that this commit is a step forward even with this limitation.
Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for reporting back only after a week or so. The analysis was done in like a day, but then we had some credential issues that blocked me actually doing the diffing.
Now about the diffing: a lot of things changed because now we report earlier, so a different range gets highlighted. This caused me some problems, so I decided to not do anything fancy, but randomly pick a bunch of issues.
In short, the picked issues looked better; but effectively remained the same. This is good.

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp Outdated Show resolved Hide resolved
NagyDonat added a commit that referenced this pull request Dec 4, 2023
…74141)

This commit extends the class `SValBuilder` with the methods
`getMinValue()` and `getMaxValue()` to that work like
`SValBuilder::getKnownValue()` but return the minimal/maximal possible
value the `SVal` is not perfectly constrained.

This extension of the ConstraintManager API is discussed at:
https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192

As a simple proof-of-concept application of this new API, this commit
extends a message from `core.BitwiseShift` with some range information
that reports the assumptions of the analyzer.

My main motivation for adding these methods is that I'll also want to
use them in `ArrayBoundCheckerV2` to make the error messages less
awkward, but I'm starting with this simpler and less important usecase
because I want to avoid merge conflicts with my other commit
#72107 which is currently under
review.

The testcase `too_large_right_operand_compound()` shows a situation
where querying the range information does not work (and the extra
information is not added to the error message). This also affects the
debug utility `clang_analyzer_value()`, so the problem isn't in the
fresh code. I'll do some investigations to resolve this, but I think
that this commit is a step forward even with this limitation.
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish we could clean up checkLocation to work as expected for all cases. It is more future proof in the sense that if C++ introduces new kinds of statements or expressions that could index into something (like multi-dimensional operator[]), checks continue to work without change after the analysis engine is updated.

It is a higher-level abstraction that lets checker authors focus on the semantics without having to enumerate all the patterns the check might be interested in which is often impossible in C++, let alone considering all the extensions implemented by Clang. So, these high-level abstractions supposed to deduplicate a lot of work by doing all the pattern matching once and for all in the engine, so checkers could benefit from this work without duplicating some of that pattern matching in every checker.

All that being said, I understand that it is more practical to just work these engine problems around in this check for now as we do not have a volunteer to revamp checkLocation at the moment. I know @steakhal looked into it, but it turned out to be a bigger fish than we expected. So, I'd say let's land this as a practical middle-term solution and hope that we will be able to come back and clean this up at some point using something that is more future proof.

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang static analyzer] false negative related to alpha.security.ArrayBoundV2
5 participants