[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

NagyDonat · 2023-11-13T11:58:23Z

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt<...> callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by steakhal. Those reviews were both abandoned, but the problems that led to abandonment were unrelated to the change that is introduced in this PR.

@steakhal

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check `array[index].field` while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR llvm#70187. Note that after this change `&array[idx]` will be handled as an access to the `idx`th element of `array`, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced). This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as `&arr[length]`. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as `(arr+length)` and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it. In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases. The main change of this commit (replacing `check::Location` with `check::PostStmt<...>` callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.

llvmbot · 2023-11-13T11:58:57Z

@llvm/pr-subscribers-clang

Author: None (DonatNagyE)

Changes

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt<...> callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.

Full diff: https://github.com/llvm/llvm-project/pull/72107.diff

3 Files Affected:

(modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+60-36)
(modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+67-7)
(modified) clang/test/Analysis/taint-diagnostic-visitor.c (+2-2)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
 namespace {
 enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-class ArrayBoundCheckerV2 :
-    public Checker<check::Location> {
+struct Messages {
+  std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+                                           check::PostStmt<UnaryOperator>,
+                                           check::PostStmt<MemberExpr>> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
+  void performCheck(const Expr *E, CheckerContext &C) const;
+
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
-                 NonLoc Offset, std::string RegName, std::string Msg) const;
+                 NonLoc Offset, Messages Msgs) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
 
 public:
-  void checkLocation(SVal l, bool isLoad, const Stmt *S,
-                     CheckerContext &C) const;
+  void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+    performCheck(E, C);
+  }
+  void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+    if (E->getOpcode() == UO_Deref)
+      performCheck(E, C);
+  }
+  void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+    if (E->isArrow())
+      performCheck(E->getBase(), C);
+  }
 };
+
 } // anonymous namespace
 
 /// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
   return formatv(ShortMsgTemplates[Kind], RegName);
 }
 
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+  std::string RegName = getRegionName(Region);
   SmallString<128> Buf;
   llvm::raw_svector_ostream Out(Buf);
   Out << "Access of " << RegName << " at negative byte offset";
   if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
     Out << ' ' << ConcreteIdx->getValue();
-  return std::string(Buf);
+  return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
 }
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
-                                 NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+                               NonLoc Offset, NonLoc Extent, SVal Location) {
+  std::string RegName = getRegionName(Region);
   const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
   assert(EReg && "this checker only handles element access");
   QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
       Out << "s";
   }
 
-  return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
-  SmallString<128> Buf;
-  llvm::raw_svector_ostream Out(Buf);
-  Out << "Access of " << RegName
-      << " with a tainted offset that may be too large";
-  return std::string(Buf);
+  return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
-                                        const Stmt *LoadS,
-                                        CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+  std::string RegName = getRegionName(Region);
+  return {formatv("Potential out of bound access to {0} with tainted {1}",
+                  RegName, OffsetName),
+          formatv("Access of {0} with a tainted {1} that may be too large",
+                  RegName, OffsetName)};
+}
 
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
   // NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
   // some new logic here that reasons directly about memory region extents.
   // Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
   // have some flexibility in defining the base region, we can achieve
   // various levels of conservatism in our buffer overflow checking.
 
+  const SVal Location = C.getSVal(E);
+
   // The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
   //   #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
   // and incomplete analysis of these leads to false positives. As even
   // accurate reports would be confusing for the users, just disable reports
   // from these macros:
-  if (isFromCtypeMacro(LoadS, C.getASTContext()))
+  if (isFromCtypeMacro(E, C.getASTContext()))
     return;
 
   ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
     if (PrecedesLowerBound && !WithinLowerBound) {
       // We know that the index definitely precedes the lower bound.
-      std::string RegName = getRegionName(Reg);
-      std::string Msg = getPrecedesMsg(RegName, ByteOffset);
-      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+      Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
       return;
     }
 
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
     if (ExceedsUpperBound) {
       if (!WithinUpperBound) {
         // We know that the index definitely exceeds the upper bound.
-        std::string RegName = getRegionName(Reg);
-        std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
-                                        *KnownSize, Location);
-        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+        Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+                                       *KnownSize, Location);
+        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
         return;
       }
       if (isTainted(State, ByteOffset)) {
-        // Both cases are possible, but the index is tainted, so report.
+        // Both cases are possible, but the offset is tainted, so report.
         std::string RegName = getRegionName(Reg);
-        std::string Msg = getTaintMsg(RegName);
-        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+        // Diagnostic detail: "tainted offset" is always correct, but the
+        // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+        // nicer to say "tainted index".
+        const char *OffsetName = "offset";
+        if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+          if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+            OffsetName = "index";
+
+        Messages Msgs = getTaintMsgs(Reg, OffsetName);
+        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
         return;
       }
     }
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
 void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, OOB_Kind Kind,
-                                    NonLoc Offset, std::string RegName,
-                                    std::string Msg) const {
+                                    NonLoc Offset, Messages Msgs) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
-  std::string ShortMsg = getShortMsg(Kind, RegName);
-
   auto BR = std::make_unique<PathSensitiveBugReport>(
-      Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+      Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
 
   // Track back the propagation of taintedness.
   if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
   // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
 }
 
+int underflowWithDeref(void) {
+  int *p = array;
+  --p;
+  return *p;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
 int scanf(const char *restrict fmt, ...);
 
 void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
   // expected-note@-1 {{Taint originated here}}
   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+  // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  int *p = array + index;
+  p[0] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
   // expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
 }
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
   // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
 }
 
+void flippedOverflow(void) {
+  12[array] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+  // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+  // I think people who rely on this confusing corner case deserve a warning,
+  // but if this is widespread, then we could introduce a special case for it
+  // in the checker.
+  return &array[10];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+  // This is UB, it's invalid to form an after-after-the-end pointer.
+  return &array[11];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
 int scalar;
 int scalarOverflow(void) {
   return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
   // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
 }
 
-short convertedArray(void) {
-  return ((short*)array)[47];
-  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
-  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
 struct vec {
   int len;
   double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
   // expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
 }
 
+struct item {
+  int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+  return itemArray[35].a;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+  return (itemArray + 35)->b;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
 struct two_bytes {
   char lo, hi;
 };
@@ -85,7 +143,9 @@ int intFromString(void) {
 }
 
 int intFromStringDivisible(void) {
-  // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+  // However, this is reported with indices/elements, because the extent
+  // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+  // multiple of 4 bytes (= sizeof(int)).
   return ((const int*)"abc")[20];
   // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
   // expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
                        // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
-                       // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+                       // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
 }
 
 int taintDiagnosticDivZero(int operand) {

llvmbot · 2023-11-13T11:58:57Z

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (DonatNagyE)

Changes

...instead of the currently used, more abstract Location callback. The main advantage of this change is that after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object). This improvement fixes PR #70187.

Note that after this change &array[idx] will be handled as an access to the idxth element of array, which is technically incorrect but matches the programmer intuitions. In my opinion it's more helpful if the report points to the source location where the indexing happens (instead of the location where a pointer is finally dereferenced).

This change introduces false positives in the exceptional corner case when the past-the-end pointer of an array is formed as &arr[length]. I think this is just unimportant pedantery (because it's cleaner to write the past-the-end pointer as (arr+length) and that form isn't affected by this checker), but if it does appear in real code, then we could introduce a special case for it.

In addition to this primary improvement, this change tweaks the message for the tainted index/offset case (using the more concrete information that's available now) and clarifies/improves a few testcases.

The main change of this commit (replacing check::Location with check::PostStmt<...> callbacks) was already proposed in my change https://reviews.llvm.org/D150446 and https://reviews.llvm.org/D159107 by @steakhal. Those reviews were both abandoned, but the issues were unrelated to the change that is introduced in this PR.

Full diff: https://github.com/llvm/llvm-project/pull/72107.diff

3 Files Affected:

(modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+60-36)
(modified) clang/test/Analysis/out-of-bounds-diagnostics.c (+67-7)
(modified) clang/test/Analysis/taint-diagnostic-visitor.c (+2-2)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index ffc7236d1e2551a..ef3ae42d57c6e88 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -34,20 +34,37 @@ using llvm::formatv;
 namespace {
 enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
 
-class ArrayBoundCheckerV2 :
-    public Checker<check::Location> {
+struct Messages {
+  std::string Short, Full;
+};
+
+class ArrayBoundCheckerV2 : public Checker<check::PostStmt<ArraySubscriptExpr>,
+                                           check::PostStmt<UnaryOperator>,
+                                           check::PostStmt<MemberExpr>> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
+  void performCheck(const Expr *E, CheckerContext &C) const;
+
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
-                 NonLoc Offset, std::string RegName, std::string Msg) const;
+                 NonLoc Offset, Messages Msgs) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
 
 public:
-  void checkLocation(SVal l, bool isLoad, const Stmt *S,
-                     CheckerContext &C) const;
+  void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const {
+    performCheck(E, C);
+  }
+  void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const {
+    if (E->getOpcode() == UO_Deref)
+      performCheck(E, C);
+  }
+  void checkPostStmt(const MemberExpr *E, CheckerContext &C) const {
+    if (E->isArrow())
+      performCheck(E->getBase(), C);
+  }
 };
+
 } // anonymous namespace
 
 /// For a given Location that can be represented as a symbolic expression
@@ -217,16 +234,19 @@ static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
   return formatv(ShortMsgTemplates[Kind], RegName);
 }
 
-static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+static Messages getPrecedesMsgs(const SubRegion *Region, NonLoc Offset) {
+  std::string RegName = getRegionName(Region);
   SmallString<128> Buf;
   llvm::raw_svector_ostream Out(Buf);
   Out << "Access of " << RegName << " at negative byte offset";
   if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
     Out << ' ' << ConcreteIdx->getValue();
-  return std::string(Buf);
+  return {getShortMsg(OOB_Precedes, RegName), std::string(Buf)};
 }
-static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
-                                 NonLoc Offset, NonLoc Extent, SVal Location) {
+
+static Messages getExceedsMsgs(ASTContext &ACtx, const SubRegion *Region,
+                               NonLoc Offset, NonLoc Extent, SVal Location) {
+  std::string RegName = getRegionName(Region);
   const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
   assert(EReg && "this checker only handles element access");
   QualType ElemType = EReg->getElementType();
@@ -273,20 +293,18 @@ static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
       Out << "s";
   }
 
-  return std::string(Buf);
-}
-static std::string getTaintMsg(std::string RegName) {
-  SmallString<128> Buf;
-  llvm::raw_svector_ostream Out(Buf);
-  Out << "Access of " << RegName
-      << " with a tainted offset that may be too large";
-  return std::string(Buf);
+  return {getShortMsg(OOB_Exceeds, RegName), std::string(Buf)};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
-                                        const Stmt *LoadS,
-                                        CheckerContext &C) const {
+static Messages getTaintMsgs(const SubRegion *Region, const char *OffsetName) {
+  std::string RegName = getRegionName(Region);
+  return {formatv("Potential out of bound access to {0} with tainted {1}",
+                  RegName, OffsetName),
+          formatv("Access of {0} with a tainted {1} that may be too large",
+                  RegName, OffsetName)};
+}
 
+void ArrayBoundCheckerV2::performCheck(const Expr *E, CheckerContext &C) const {
   // NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
   // some new logic here that reasons directly about memory region extents.
   // Once that logic is more mature, we can bring it back to assumeInBound()
@@ -297,12 +315,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
   // have some flexibility in defining the base region, we can achieve
   // various levels of conservatism in our buffer overflow checking.
 
+  const SVal Location = C.getSVal(E);
+
   // The header ctype.h (from e.g. glibc) implements the isXXXXX() macros as
   //   #define isXXXXX(arg) (LOOKUP_TABLE[arg] & BITMASK_FOR_XXXXX)
   // and incomplete analysis of these leads to false positives. As even
   // accurate reports would be confusing for the users, just disable reports
   // from these macros:
-  if (isFromCtypeMacro(LoadS, C.getASTContext()))
+  if (isFromCtypeMacro(E, C.getASTContext()))
     return;
 
   ProgramStateRef State = C.getState();
@@ -331,9 +351,8 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
     if (PrecedesLowerBound && !WithinLowerBound) {
       // We know that the index definitely precedes the lower bound.
-      std::string RegName = getRegionName(Reg);
-      std::string Msg = getPrecedesMsg(RegName, ByteOffset);
-      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
+      Messages Msgs = getPrecedesMsgs(Reg, ByteOffset);
+      reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, Msgs);
       return;
     }
 
@@ -350,17 +369,25 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
     if (ExceedsUpperBound) {
       if (!WithinUpperBound) {
         // We know that the index definitely exceeds the upper bound.
-        std::string RegName = getRegionName(Reg);
-        std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
-                                        *KnownSize, Location);
-        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
+        Messages Msgs = getExceedsMsgs(C.getASTContext(), Reg, ByteOffset,
+                                       *KnownSize, Location);
+        reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, Msgs);
         return;
       }
       if (isTainted(State, ByteOffset)) {
-        // Both cases are possible, but the index is tainted, so report.
+        // Both cases are possible, but the offset is tainted, so report.
         std::string RegName = getRegionName(Reg);
-        std::string Msg = getTaintMsg(RegName);
-        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
+
+        // Diagnostic detail: "tainted offset" is always correct, but the
+        // common case is that 'idx' is tainted in 'arr[idx]' and then it's
+        // nicer to say "tainted index".
+        const char *OffsetName = "offset";
+        if (const auto *ASE = dyn_cast<ArraySubscriptExpr>(E))
+          if (isTainted(State, ASE->getIdx(), C.getLocationContext()))
+            OffsetName = "index";
+
+        Messages Msgs = getTaintMsgs(Reg, OffsetName);
+        reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, Msgs);
         return;
       }
     }
@@ -374,17 +401,14 @@ void ArrayBoundCheckerV2::checkLocation(SVal Location, bool IsLoad,
 
 void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, OOB_Kind Kind,
-                                    NonLoc Offset, std::string RegName,
-                                    std::string Msg) const {
+                                    NonLoc Offset, Messages Msgs) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
-  std::string ShortMsg = getShortMsg(Kind, RegName);
-
   auto BR = std::make_unique<PathSensitiveBugReport>(
-      Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+      Kind == OOB_Taint ? TaintBT : BT, Msgs.Short, Msgs.Full, ErrorNode);
 
   // Track back the propagation of taintedness.
   if (Kind == OOB_Taint)
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
index 4f5d0d0bc91c54a..e549e86b09c2312 100644
--- a/clang/test/Analysis/out-of-bounds-diagnostics.c
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -9,6 +9,14 @@ void arrayUnderflow(void) {
   // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
 }
 
+int underflowWithDeref(void) {
+  int *p = array;
+  --p;
+  return *p;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -4}}
+}
+
 int scanf(const char *restrict fmt, ...);
 
 void taintedIndex(void) {
@@ -17,6 +25,17 @@ void taintedIndex(void) {
   // expected-note@-1 {{Taint originated here}}
   // expected-note@-2 {{Taint propagated to the 2nd argument}}
   array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with tainted index}}
+  // expected-note@-2 {{Access of 'array' with a tainted index that may be too large}}
+}
+
+void taintedOffset(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  int *p = array + index;
+  p[0] = 5;
   // expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
   // expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
 }
@@ -27,6 +46,29 @@ void arrayOverflow(void) {
   // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
 }
 
+void flippedOverflow(void) {
+  12[array] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int *afterTheEndPtr(void) {
+  // FIXME: this is an unusual but standard-compliant way of writing (array + 10).
+  // I think people who rely on this confusing corner case deserve a warning,
+  // but if this is widespread, then we could introduce a special case for it
+  // in the checker.
+  return &array[10];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 10, while it holds only 10 'int' elements}}
+}
+
+int *afterAfterTheEndPtr(void) {
+  // This is UB, it's invalid to form an after-after-the-end pointer.
+  return &array[11];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 11, while it holds only 10 'int' elements}}
+}
+
 int scalar;
 int scalarOverflow(void) {
   return (&scalar)[1];
@@ -41,12 +83,6 @@ int oneElementArrayOverflow(void) {
   // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
 }
 
-short convertedArray(void) {
-  return ((short*)array)[47];
-  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
-  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
-}
-
 struct vec {
   int len;
   double elems[64];
@@ -64,6 +100,28 @@ double arrayInStructPtr(struct vec *pv) {
   // expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
 }
 
+struct item {
+  int a, b;
+} itemArray[20] = {0};
+
+int structOfArrays(void) {
+  return itemArray[35].a;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+int structOfArraysArrow(void) {
+  return (itemArray + 35)->b;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'itemArray'}}
+  // expected-note@-2 {{Access of 'itemArray' at index 35, while it holds only 20 'struct item' elements}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
 struct two_bytes {
   char lo, hi;
 };
@@ -85,7 +143,9 @@ int intFromString(void) {
 }
 
 int intFromStringDivisible(void) {
-  // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+  // However, this is reported with indices/elements, because the extent
+  // (of the string that consists of 'a', 'b', 'c' and '\0') happens to be a
+  // multiple of 4 bytes (= sizeof(int)).
   return ((const int*)"abc")[20];
   // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
   // expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
diff --git a/clang/test/Analysis/taint-diagnostic-visitor.c b/clang/test/Analysis/taint-diagnostic-visitor.c
index 67dc67e627b3b9c..a3fa1639bffeee5 100644
--- a/clang/test/Analysis/taint-diagnostic-visitor.c
+++ b/clang/test/Analysis/taint-diagnostic-visitor.c
@@ -29,8 +29,8 @@ int taintDiagnosticOutOfBound(void) {
   int Array[] = {1, 2, 3, 4, 5};
   scanf("%d", &index); // expected-note {{Taint originated here}}
                        // expected-note@-1 {{Taint propagated to the 2nd argument}}
-  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted offset}}
-                       // expected-note@-1 {{Access of 'Array' with a tainted offset that may be too large}}
+  return Array[index]; // expected-warning {{Potential out of bound access to 'Array' with tainted index}}
+                       // expected-note@-1 {{Access of 'Array' with a tainted index that may be too large}}
 }
 
 int taintDiagnosticDivZero(int operand) {

NagyDonat · 2023-11-14T00:22:41Z

I ran an analysis that compares this commit with its parent on many open source projects. The results revealed that this commit "converts" many alpha.security.ArrayBound (V1) results into alpha.security.ArrayBoundV2 results because (if I understand it correctly) the new PostStmt callbacks fire before the Location callback used by the V1 checker (while it seems that previously the order of the tied Location callbacks was resolved in the favor of the V1 checker).

This is a mostly irrelevant effect because I presume that the users don't want to enable both of the ArrayBound checkers at the same time (I was testing with options that enable almost all alpha checkers, but that's not a "normal" config).

For the sake of completeness I'm pasting the result links for this first run where the diff is polluted by all the "lost report for V1, new report for V2" differences; but I started a clean run with -d alpha.security.ArrayBound (I'll post the results soon) and I'll do a more detailed evaluation on that.

Project	New Reports	Lost Reports
memcached	new reports	lost reports
tmux	new reports	lost reports
curl	new reports	lost reports
twin	new reports	lost reports
vim	new reports	lost reports
openssl	new reports	lost reports
sqlite	new reports	lost reports
ffmpeg	new reports	lost reports
postgres	new reports	lost reports
tinyxml2	new reports	lost reports
libwebm	new reports	lost reports
xerces	new reports	lost reports
bitcoin	new reports	lost reports
protobuf	new reports	lost reports
qtbase	new reports	lost reports
contour	new reports	lost reports
acid	new reports	lost reports
openrct2	new reports	lost reports

NagyDonat · 2023-11-16T15:22:51Z

I evaluated a mostly [1] clean analysis run and it reveals that this change has almost no effects when CSA analyses stable open source code (this is the expected behavior, stable code doesn't contain out of bounds memory access). The only differences are apparently caused by the inherent instability of the exploded graph traversal:

Project	New Reports	Lost Reports	Evaluation
memcached	new reports	lost reports	no change
tmux	new reports	lost reports	no change
curl	new reports	lost reports	no change
twin	new reports	lost reports	no change
vim	new reports	lost reports	no change
openssl	new reports	lost reports	no change
sqlite	new reports	lost reports	no change
ffmpeg	new reports	lost reports	one alpha.core.Conversion report lost, presumably due to graph perturbations
postgres	new reports	lost reports	no change
tinyxml2	new reports	lost reports	no change
libwebm	new reports	lost reports	no change
xerces	new reports	lost reports	no change
bitcoin	new reports	lost reports	no change
protobuf	new reports	lost reports	no change
qtbase	new reports	lost reports	two convoluted core.DivideZero reports are replaced by one that's very similar to them

[1] During the analysis of contour both the baseline (main snapshot created a few days ago) and the new revision crashed with the message fatal error: error in backend: Z3 error: out of memory (at slightly different steps of the analysis). As this crash seems to be non-deterministic, environment-dependent and not affected by the commit under review, I think that we can safely ignore it. As this interrupted the CI job, I have no results for contour, acid and openrct2, but I think the rest of the projects are already sufficient to evaluate this patch.

clang/test/Analysis/out-of-bounds-diagnostics.c

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp

Xazax-hun · 2023-11-24T00:29:04Z

Note that &array[idx] is perfectly valid code when idx == number of elements. And it is relatively common to do that when one is using STL algorithms on arrays:

auto it = std::find(&array[0], &array[size], foo);

Of course, one could use the begin/end free functions, but those are only available since C++11.

Could you elaborate on alternative approaches you considered fixing the problem and why you chose this one? E.g., would trying to look at the parent regions for expressions like foo[idx].bar work? Or is the source of the problem that you'd also need the exact expression for the subscript instead of the MemberExpr?

Alternatively, would it be possible to suppress warnings on the common pattern &array[idx] by checking the parent of the subscript expression in the AST (but still emitting a warning when the pointer is dereferenced)?

NagyDonat · 2023-11-24T10:23:26Z

Note that &array[idx] is perfectly valid code when idx == number of elements. And it is relatively common to do that when one is using STL algorithms on arrays:
auto it = std::find(&array[0], &array[size], foo);
Of course, one could use the begin/end free functions, but those are only available since C++11.

Oh, dammit, this language is stupid... I couldn't imagine a reason to use &array[size] instead of the cleaner (array + size); but of course STL containers need &array[size] with their overloaded operator[] (note: this is not an ArraySubscriptExpr but an overloaded operator call, so it's not handled by this checker) and then (if you say so, I don't have experience with pre-C++11 code that uses both STL and plain arrays) some developers will use this for plain arrays as well...

Could you elaborate on alternative approaches you considered fixing the problem and why you chose this one? E.g., would trying to look at the parent regions for expressions like foo[idx].bar work? Or is the source of the problem that you'd also need the exact expression for the subscript instead of the MemberExpr?

I considered two approaches, this one and walking on the parent region layers (I don't think that there is a third distinct approach). I chose this because:

This way the implementation is easier to understand and more "connected" to the concrete details of the analyzed code.
A few months ago @steakhal realized that check::Location is not sufficient on its own and it should've been supplemented with a check::Bind callback to cover all relevant cases. The commit implementing this (https://reviews.llvm.org/D159106) was abandoned, but if we keep check::Location, we would need to reintroduce something like that as well.
I wouldn't say that I need the exact expression for the subscript (or other dereferencing expression, like *ptr or ptr->foo), but I like that I can get it because this puts the warning message to the right source location (e.g. in a convoluted multiline expressions with several [] and member access layers) and allows/could allow better customization of the diagnostics.

Alternatively, would it be possible to suppress warnings on the common pattern &array[idx] by checking the parent of the subscript expression in the AST (but still emitting a warning when the pointer is dereferenced)?

Yes, that was my plan for resolving this issue; and as you say that it's needed, I'll implement it. I'll squeeze the parent lookup into the if where we know that there's an overflow, so that (AFAIK expensive) operation won't run in the common case when the array access is in bounds. The "still emitting a warning when the pointer is dereferenced" will happen automatically as at the dereferencing expression we'll have the same overflowing pointer but without the & that negates the result.

I think that I won't suppress the warning in situations like

int array[10];
int *f(int arg) {
  if (arg >= 10)
    return &array[arg];
  return array;
}

where the analyzer knows that idx >= size and doesn't know that idx == size. Are there any crazy situations where this is also legitimate?

NagyDonat · 2023-11-24T18:11:11Z

I updated the PR to allow creating the after-the-end pointer of an array as &array[size] . My implementation interprets this corner case narrowly to limit the amount of false negatives: for example if (idx >= size) return &array[idx]; is reported as an overflow instead of very optimistically assuming that idx must be equal to size.

balazske · 2023-11-27T17:15:29Z

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp

+          if (EqualsToThreshold && !NotEqualToThreshold) {
+            // We are definitely in the exceptional case, so return early
+            // instead of reporting a bug.
+            C.addTransition(EqualsToThreshold);


Is this transition needed? This checker should not add assumptions to the state, only check for conditions and add only error transitions. EqualsToThreshold probably does not contain new information compared to State.

You're right that EqualsToThreshold does not contain new information compared to State, but I'm using it because I'm trying to follow the pattern that I'm always using the most recent state variable.

On the other hand, the variable State may contain new information compared to the state of the node before the start of this callback: e.g. if we start with a range set like $[-\infty, -1] \cup \{ 10\}$ for the index variable, then we first test for underflow and update the variable State when we assume that there is no underflow.

I think it's valuable to record these assumptions with a state transition, because they improve the accuracy of the modeling. (Otherwise the analyzer could produce bug reports that rely on assumptions that contradict each other.) Currently the assumptions of this checker are added silently but I'll add note tags like "Assuming index is non-negative" for them in a followup commit.

steakhal · 2023-11-28T08:36:28Z

I'm in favor of this change. I'll pull the patch downstream and report back how it performed.
Coming back to the &array[size] example, actually I believe that it's well-formed in C, but UB in C++, but I'm not a language lawyer. @shafik probably knows this better 😅
However, (array + size) should be well-formed in both C and C++.

steakhal · 2023-11-28T09:54:22Z

FYI I edited the PR summary so that I'm not tagged there directly because if someone is tagged in a commit message on GH, that person will be notified each time that commit is pushed in public. Consequently, the tagged person gets spammed by GH from all the public forks each time they rebase and push this commit :D
Such a great feature, right?

NagyDonat · 2023-11-28T10:28:40Z

@steakhal thanks for the checking and sorry for the unintentional spamming.

Such a great feature, right?

Just wonderful 😄

steakhal · 2023-11-28T11:12:56Z

@steakhal thanks for the checking and sorry for the unintentional spamming.

Such a great feature, right?

Just wonderful 😄

To clarify, you did not spam me. I'm worried about merging a commit directly mentioning people. That would be the point when forks start to pick up the commit (and my name e.g.) and start spamming. I just wanted to raise awareness of this being a thing, and why I did that edit.

NagyDonat · 2023-11-28T15:56:46Z

Note that (if I understand it correctly) after this change the ArrayBoundV2 check may be triggered at a slightly earlier point of the analysis and this may impact situations where both this checker and other checkers would emit a bug report. (See e.g. #72402 (comment) where I discuss a concrete candidate for this.)

Xazax-hun · 2023-11-28T20:03:46Z

I think another question is whether warning on code like int& val = arr[i] where val is not actually used on the path where i == size is OK. I would not block this PR on a case like this, but it might be nice to add a test and potentially a comment about the desired behavior.

NagyDonat · 2023-11-30T13:39:22Z

@Xazax-hun I added a few testcases that show the current behavior of the checker w.r.t. invalid references. I think this is reasonably correct behavior, but I'm not too interested in the legality of these gray areas and I don't have strong opinions.

By the way, with the new PostStmt callbacks, "the result is assigned to a reference" is no longer a corner case. Perhaps it had a strange behavior with the old Location callback (because it performs an implicit address-of in some sense), but after this commit ArrayBoundV2 only looks for the syntactic presence of a (non-overloaded) subscript operator.

This commit extends the class `SValBuilder` with the methods `getMinValue()` and `getMaxValue()` to that work like `SValBuilder::getKnownValue()` but return the minimal/maximal possible value the `SVal` is not perfectly constrained. This extension of the ConstraintManager API is discussed at: https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192 As a simple proof-of-concept application of this new API, this commit extends a message from `core.BitwiseShift` with some range information that reports the assumptions of the analyzer. My main motivation for adding these methods is that I'll also want to use them in `ArrayBoundCheckerV2` to make the error messages less awkward, but I'm starting with this simpler and less important usecase because I want to avoid merge conflicts with my other commit llvm#72107 which is currently under review. The testcase `too_large_right_operand_compound()` shows a situation where querying the range information does not work (and the extra information is not added to the error message). This also affects the debug utility `clang_analyzer_value()`, so the problem isn't in the fresh code. I'll do some investigations to resolve this, but I think that this commit is a step forward even with this limitation.

steakhal

Sorry for reporting back only after a week or so. The analysis was done in like a day, but then we had some credential issues that blocked me actually doing the diffing.
Now about the diffing: a lot of things changed because now we report earlier, so a different range gets highlighted. This caused me some problems, so I decided to not do anything fancy, but randomly pick a bunch of issues.
In short, the picked issues looked better; but effectively remained the same. This is good.

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp

…74141) This commit extends the class `SValBuilder` with the methods `getMinValue()` and `getMaxValue()` to that work like `SValBuilder::getKnownValue()` but return the minimal/maximal possible value the `SVal` is not perfectly constrained. This extension of the ConstraintManager API is discussed at: https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192 As a simple proof-of-concept application of this new API, this commit extends a message from `core.BitwiseShift` with some range information that reports the assumptions of the analyzer. My main motivation for adding these methods is that I'll also want to use them in `ArrayBoundCheckerV2` to make the error messages less awkward, but I'm starting with this simpler and less important usecase because I want to avoid merge conflicts with my other commit #72107 which is currently under review. The testcase `too_large_right_operand_compound()` shows a situation where querying the range information does not work (and the extra information is not added to the error message). This also affects the debug utility `clang_analyzer_value()`, so the problem isn't in the fresh code. I'll do some investigations to resolve this, but I think that this commit is a step forward even with this limitation.

Xazax-hun

I wish we could clean up checkLocation to work as expected for all cases. It is more future proof in the sense that if C++ introduces new kinds of statements or expressions that could index into something (like multi-dimensional operator[]), checks continue to work without change after the analysis engine is updated.

It is a higher-level abstraction that lets checker authors focus on the semantics without having to enumerate all the patterns the check might be interested in which is often impossible in C++, let alone considering all the extensions implemented by Clang. So, these high-level abstractions supposed to deduplicate a lot of work by doing all the pattern matching once and for all in the engine, so checkers could benefit from this work without duplicating some of that pattern matching in every checker.

All that being said, I understand that it is more practical to just work these engine problems around in this check for now as we do not have a volunteer to revamp checkLocation at the moment. I know @steakhal looked into it, but it turned out to be a bigger fish than we expected. So, I'd say let's land this as a practical middle-term solution and hope that we will be able to come back and clean this up at some point using something that is more future proof.

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>

llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Nov 13, 2023

NagyDonat requested review from haoNoQ, gamesh411 and spaits November 13, 2023 11:58

NagyDonat requested a review from steakhal November 13, 2023 11:59

NagyDonat linked an issue Nov 13, 2023 that may be closed by this pull request

[clang static analyzer] false negative related to alpha.security.ArrayBoundV2 #70187

Closed

NagyDonat removed the clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html label Nov 13, 2023

NagyDonat requested a review from balazske November 16, 2023 09:56

NagyDonat requested a review from Xazax-hun November 21, 2023 09:51

balazske reviewed Nov 23, 2023

View reviewed changes

clang/test/Analysis/out-of-bounds-diagnostics.c Outdated Show resolved Hide resolved

Xazax-hun reviewed Nov 24, 2023

View reviewed changes

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp Show resolved Hide resolved

NagyDonat added 4 commits November 24, 2023 15:18

Explain the use of PostStmt callbacks

819a21b

Fix an inaccurate testcase name

5ae2f66

Add special case for &array[size]

4044f86

Improve comment that explains the use of PostStmt callbacks

4fb7355

balazske reviewed Nov 27, 2023

View reviewed changes

NagyDonat mentioned this pull request Nov 28, 2023

[analyzer] Use AllocaRegion in MallocChecker #72402

Merged

NagyDonat added 2 commits November 30, 2023 14:26

Add two tests for references

9c85c7e

Add a third reference-related test

017b1d6

NagyDonat mentioned this pull request Dec 1, 2023

[analyzer] Let the checkers query upper and lower bounds on symbols #74141

Merged

steakhal approved these changes Dec 4, 2023

View reviewed changes

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp Outdated Show resolved Hide resolved

Xazax-hun approved these changes Dec 4, 2023

View reviewed changes

Simplify isInAddressOf()

a422b32

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>

NagyDonat merged commit dfdedaf into llvm:main Dec 5, 2023
3 of 4 checks passed

NagyDonat deleted the arraybound_change_callback branch December 5, 2023 15:17

steakhal mentioned this pull request Mar 23, 2024

[analyzer] Copy-constructor should trigger a load for check::Location #86403

Open

steakhal mentioned this pull request Apr 17, 2024

[analyzer] Serious slowdown on FFmpeg sheervideo.c when using ArrayBoundV2 #89045

Closed

NagyDonat mentioned this pull request Apr 23, 2024

Changing ArrayBoundV2 callback type increased symbol complexity #89720

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

NagyDonat commented Nov 13, 2023 •

edited by steakhal

Loading

llvmbot commented Nov 13, 2023

llvmbot commented Nov 13, 2023

NagyDonat commented Nov 14, 2023

NagyDonat commented Nov 16, 2023 •

edited

Loading

Xazax-hun commented Nov 24, 2023

NagyDonat commented Nov 24, 2023 •

edited

Loading

NagyDonat commented Nov 24, 2023 •

edited

Loading

balazske Nov 27, 2023

NagyDonat Nov 28, 2023

steakhal commented Nov 28, 2023

steakhal commented Nov 28, 2023

NagyDonat commented Nov 28, 2023

steakhal commented Nov 28, 2023

NagyDonat commented Nov 28, 2023

Xazax-hun commented Nov 28, 2023

NagyDonat commented Nov 30, 2023

steakhal left a comment

Xazax-hun left a comment •

edited

Loading

[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

[analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107

Conversation

NagyDonat commented Nov 13, 2023 • edited by steakhal Loading

llvmbot commented Nov 13, 2023

llvmbot commented Nov 13, 2023

NagyDonat commented Nov 14, 2023

NagyDonat commented Nov 16, 2023 • edited Loading

Xazax-hun commented Nov 24, 2023

NagyDonat commented Nov 24, 2023 • edited Loading

NagyDonat commented Nov 24, 2023 • edited Loading

balazske Nov 27, 2023

Choose a reason for hiding this comment

NagyDonat Nov 28, 2023

Choose a reason for hiding this comment

steakhal commented Nov 28, 2023

steakhal commented Nov 28, 2023

NagyDonat commented Nov 28, 2023

steakhal commented Nov 28, 2023

NagyDonat commented Nov 28, 2023

Xazax-hun commented Nov 28, 2023

NagyDonat commented Nov 30, 2023

steakhal left a comment

Choose a reason for hiding this comment

Xazax-hun left a comment • edited Loading

Choose a reason for hiding this comment

NagyDonat commented Nov 13, 2023 •

edited by steakhal

Loading

NagyDonat commented Nov 16, 2023 •

edited

Loading

NagyDonat commented Nov 24, 2023 •

edited

Loading

NagyDonat commented Nov 24, 2023 •

edited

Loading

Xazax-hun left a comment •

edited

Loading