Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[compiler-rt][ASan] Add function copying annotations #91702

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

AdvenamTacet
Copy link
Member

@AdvenamTacet AdvenamTacet commented May 10, 2024

This PR adds a __sanitizer_copy_contiguous_container_annotations function, which copies annotations from one memory area to another. New area is annotated in the same way as the old region at the beginning (within limitations of ASan).

Overlapping case: The function supports overlapping containers, however no assumptions should be made outside of no false positives in new buffer area. (It doesn't modify old container annotations where it's not necessary, false negatives may happen in edge granules of the new container area.) I don't expect this function to be used with overlapping buffers, but it's designed to work with them and not result in incorrect ASan errors (false positives).

If buffers have granularity-aligned distance between them (old_beg % granularity == new_beg % granularity), copying algorithm works faster. If the distance is not granularity-aligned, annotations are copied byte after byte.

void __sanitizer_copy_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {

This function aims to help with short string annotations and similar container annotations. Right now we change trait types of std::basic_string when compiling with ASan and this function purpose is reverting that change as soon as possible.

#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
// When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
// relocatable. Because the object's memory might be poisoned when its content
// is kept inside objects memory (short string optimization), instead of in allocated
// external memory. In such cases, the destructor is responsible for unpoisoning
// the memory to avoid triggering false positives.
// Therefore it's crucial to ensure the destructor is called.
using __trivially_relocatable = void;
#else
using __trivially_relocatable = __conditional_t<
__libcpp_is_trivially_relocatable<allocator_type>::value && __libcpp_is_trivially_relocatable<pointer>::value,
basic_string,
void>;
#endif

The goal is to not change __trivially_relocatable when compiling with ASan. If this function is accepted and upstreamed, the next step is creating a function like __memcpy_with_asan moving memory with ASan. And then using this function instead of __builtin__memcpy while moving trivially relocatable objects.

} else {
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first));
}


I'm thinking if there is a good way to address fact that in a container the new buffer is usually bigger than the previous one. We may add two more arguments to the functions to address it (the beginning and the end of the whole buffer.

Another potential change is removing new_storage_end_p as it's redundant, because we require the same size.

Potential future work is creating a function __asan_unsafe_memmove, which will be basically memmove, but with turned off instrumentation (therefore it will allow copy data from poisoned area).

Copy link

github-actions bot commented May 10, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from cab5bc6 to 8ff3017 Compare May 10, 2024 07:37
@philnik777
Copy link
Contributor

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch from 8ff3017 to 7d3bc1a Compare May 17, 2024 04:56
@vitalybuka
Copy link
Collaborator

LGTM for Asan however HWASAN story is not clear.
Let's make sure we can use this for HWASAN as well.

It's probably fine, but I'd like to think about this more.

@vitalybuka vitalybuka requested a review from fmayer May 20, 2024 16:26
@AdvenamTacet
Copy link
Member Author

AdvenamTacet commented May 22, 2024

I still have to fix failing CI, I can sit to it on Monday and, if necessary, spend next week enough of time to fix it (Edit: testing it actually takes prohibitive amount of time, but I'm committed to make this PR work). I expect the problem to be a small implementation bug. Let me know if you have any high level requests for that code, I'm happy to implement them.
I will also create a separate PR cleaning compiler-rt ASan code, as I noticed a little bit of outdated code while writing this one.

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

Should we change __builtin_memcpy to __builtin_memmove? Then I would have to add support for overlapping areas to this PR. (It complicates the implementation a little bit.)

It's probably fine, but I'd like to think about this more.

Sure, let me know what are your conclusions!

@philnik777
Copy link
Contributor

trivial relocation is also interesting for erasing object from a vector, so the builtins should probably be more akin to a memmove than a memcpy.

Should we change __builtin_memcpy to __builtin_memmove? Then I would have to add support for overlapping areas to this PR. (It complicates the implementation a little bit.)

Yes. My plan is to change the __builtin_memcpy to __builtin_memmove in __uninitialized_allocator_relocate to implement more optimizations in vector.

@AdvenamTacet
Copy link
Member Author

Alright, I will update that PR accordingly in the coming week.

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume work in progress.

@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from 7725910 to 0ab9216 Compare August 26, 2024 09:31
@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch 2 times, most recently from 82a7c7e to 93ab376 Compare August 28, 2024 09:23
@AdvenamTacet AdvenamTacet marked this pull request as ready for review August 28, 2024 09:58
@llvmbot
Copy link
Collaborator

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: Tacet (AdvenamTacet)

Changes

This PR adds a __sanitizer_move_contiguous_container_annotations function, which moves annotations from one memory area to another. Old area is unpoisoned at the end. New area is annotated in the same way as the old region at the beginning (within limitations of ASan).

void __sanitizer_move_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {

This function aims to help with short string annotations and similar container annotations. Right now we change trait types of std::basic_string when compiling with ASan.

#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
// When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
// relocatable. Because the object's memory might be poisoned when its content
// is kept inside objects memory (short string optimization), instead of in allocated
// external memory. In such cases, the destructor is responsible for unpoisoning
// the memory to avoid triggering false positives.
// Therefore it's crucial to ensure the destructor is called.
using __trivially_relocatable = void;
#else
using __trivially_relocatable = __conditional_t<
__libcpp_is_trivially_relocatable<allocator_type>::value && __libcpp_is_trivially_relocatable<pointer>::value,
basic_string,
void>;
#endif

The goal is to not change __trivially_relocatable when compiling with ASan. If this function is accepted and upstreamed, the next step is creating a function like __memcpy_with_asan moving memory with ASan. And then using this function instead of __builtin__memcpy while moving trivially relocatable objects.

} else {
__builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first));
}


I'm thinking if there is a good way to address fact that in a container the new buffer is usually bigger than the previous one. We may add two more arguments to the functions to address it (the beginning and the end of the whole buffer.

Another potential change is removing new_storage_end_p as it's redundant, because we require the same size.


Patch is 27.14 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91702.diff

9 Files Affected:

  • (modified) compiler-rt/include/sanitizer/common_interface_defs.h (+24)
  • (modified) compiler-rt/lib/asan/asan_errors.cpp (+14)
  • (modified) compiler-rt/lib/asan/asan_errors.h (+19)
  • (modified) compiler-rt/lib/asan/asan_poisoning.cpp (+234)
  • (modified) compiler-rt/lib/asan/asan_report.cpp (+10)
  • (modified) compiler-rt/lib/asan/asan_report.h (+3)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_common_interface.inc (+1)
  • (modified) compiler-rt/lib/sanitizer_common/sanitizer_interface_internal.h (+4)
  • (added) compiler-rt/test/asan/TestCases/move_container_annotations.cpp (+232)
diff --git a/compiler-rt/include/sanitizer/common_interface_defs.h b/compiler-rt/include/sanitizer/common_interface_defs.h
index f9fce595b37bb8..e9101e39d72ddd 100644
--- a/compiler-rt/include/sanitizer/common_interface_defs.h
+++ b/compiler-rt/include/sanitizer/common_interface_defs.h
@@ -193,6 +193,30 @@ void SANITIZER_CDECL __sanitizer_annotate_double_ended_contiguous_container(
     const void *old_container_beg, const void *old_container_end,
     const void *new_container_beg, const void *new_container_end);
 
+/// Moves annotation from one storage to another.
+/// At the end, new buffer is annotated in the same way as old buffer at
+/// the very beginning. Old buffer is fully unpoisoned.
+/// Main purpose of that function is use while moving trivially relocatable
+/// objects, which memory may be poisoned (therefore not trivially with ASan).
+///
+/// A contiguous container is a container that keeps all of its elements
+/// in a contiguous region of memory. The container owns the region of memory
+/// <c>[old_storage_beg, old_storage_end)</c> and
+/// <c>[new_storage_beg, new_storage_end)</c>;
+/// There is no requirement where objects are kept.
+/// Poisoned and non-poisoned memory areas can alternate,
+/// there are no shadow memory restrictions.
+///
+/// Argument requirements:
+/// New containert has to have the same size as the old container.
+/// \param old_storage_beg Beginning of the old container region.
+/// \param old_storage_end End of the old container region.
+/// \param new_storage_beg Beginning of the new container region.
+/// \param new_storage_end End of the new container region.
+void SANITIZER_CDECL __sanitizer_copy_contiguous_container_annotations(
+    const void *old_storage_beg, const void *old_storage_end,
+    const void *new_storage_beg, const void *new_storage_end);
+
 /// Returns true if the contiguous container <c>[beg, end)</c> is properly
 /// poisoned.
 ///
diff --git a/compiler-rt/lib/asan/asan_errors.cpp b/compiler-rt/lib/asan/asan_errors.cpp
index 6f2fd28bfdf11a..8b712c70029f76 100644
--- a/compiler-rt/lib/asan/asan_errors.cpp
+++ b/compiler-rt/lib/asan/asan_errors.cpp
@@ -348,6 +348,20 @@ void ErrorBadParamsToAnnotateDoubleEndedContiguousContainer::Print() {
   ReportErrorSummary(scariness.GetDescription(), stack);
 }
 
+void ErrorBadParamsToCopyContiguousContainerAnnotations::Print() {
+  Report(
+      "ERROR: AddressSanitizer: bad parameters to "
+      "__sanitizer_copy_contiguous_container_annotations:\n"
+      "      old_storage_beg : %p\n"
+      "      old_storage_end : %p\n"
+      "      new_storage_beg : %p\n"
+      "      new_storage_end : %p\n",
+      (void *)old_storage_beg, (void *)old_storage_end, (void *)new_storage_beg,
+      (void *)new_storage_end);
+  stack->Print();
+  ReportErrorSummary(scariness.GetDescription(), stack);
+}
+
 void ErrorODRViolation::Print() {
   Decorator d;
   Printf("%s", d.Error());
diff --git a/compiler-rt/lib/asan/asan_errors.h b/compiler-rt/lib/asan/asan_errors.h
index 634f6da5443552..b3af655e66639a 100644
--- a/compiler-rt/lib/asan/asan_errors.h
+++ b/compiler-rt/lib/asan/asan_errors.h
@@ -353,6 +353,24 @@ struct ErrorBadParamsToAnnotateDoubleEndedContiguousContainer : ErrorBase {
   void Print();
 };
 
+struct ErrorBadParamsToCopyContiguousContainerAnnotations : ErrorBase {
+  const BufferedStackTrace *stack;
+  uptr old_storage_beg, old_storage_end, new_storage_beg, new_storage_end;
+
+  ErrorBadParamsToCopyContiguousContainerAnnotations() = default;  // (*)
+  ErrorBadParamsToCopyContiguousContainerAnnotations(
+      u32 tid, BufferedStackTrace *stack_, uptr old_storage_beg_,
+      uptr old_storage_end_, uptr new_storage_beg_, uptr new_storage_end_)
+      : ErrorBase(tid, 10,
+                  "bad-__sanitizer_annotate_double_ended_contiguous_container"),
+        stack(stack_),
+        old_storage_beg(old_storage_beg_),
+        old_storage_end(old_storage_end_),
+        new_storage_beg(new_storage_beg_),
+        new_storage_end(new_storage_end_) {}
+  void Print();
+};
+
 struct ErrorODRViolation : ErrorBase {
   __asan_global global1, global2;
   u32 stack_id1, stack_id2;
@@ -421,6 +439,7 @@ struct ErrorGeneric : ErrorBase {
   macro(StringFunctionSizeOverflow)                        \
   macro(BadParamsToAnnotateContiguousContainer)            \
   macro(BadParamsToAnnotateDoubleEndedContiguousContainer) \
+  macro(BadParamsToCopyContiguousContainerAnnotations)     \
   macro(ODRViolation)                                      \
   macro(InvalidPointerPair)                                \
   macro(Generic)
diff --git a/compiler-rt/lib/asan/asan_poisoning.cpp b/compiler-rt/lib/asan/asan_poisoning.cpp
index d600b1a0c241fa..fd83b918ac02aa 100644
--- a/compiler-rt/lib/asan/asan_poisoning.cpp
+++ b/compiler-rt/lib/asan/asan_poisoning.cpp
@@ -576,6 +576,240 @@ void __sanitizer_annotate_double_ended_contiguous_container(
   }
 }
 
+// Checks if two pointers fall within the same memory granule.
+static bool WithinOneGranule(uptr p, uptr q) {
+  if (p == q)
+    return true;
+  return RoundDownTo(p, ASAN_SHADOW_GRANULARITY) ==
+         RoundDownTo(q - 1, ASAN_SHADOW_GRANULARITY);
+}
+
+// Copies ASan memory annotation (a shadow memory value)
+// from one granule to another.
+static void CopyGranuleAnnotation(uptr dst, uptr src) {
+  *(u8 *)MemToShadow(dst) = *(u8 *)MemToShadow(src);
+}
+
+// Marks the specified number of bytes in a granule as accessible or
+// poisones the whole granule with kAsanContiguousContainerOOBMagic value.
+static void AnnotateContainerGranuleAccessibleBytes(uptr ptr, u8 n) {
+  constexpr uptr granularity = ASAN_SHADOW_GRANULARITY;
+  if (n == granularity) {
+    *(u8 *)MemToShadow(ptr) = 0;
+  } else if (n == 0) {
+    *(u8 *)MemToShadow(ptr) = static_cast<u8>(kAsanContiguousContainerOOBMagic);
+  } else {
+    *(u8 *)MemToShadow(ptr) = n;
+  }
+}
+
+// Performs a byte-by-byte copy of ASan annotations (shadow memory values).
+// Result may be different due to ASan limitations, but result cannot lead
+// to false positives (more memory than requested may get unpoisoned).
+static void SlowCopyContainerAnnotations(uptr old_storage_beg,
+                                         uptr old_storage_end,
+                                         uptr new_storage_beg,
+                                         uptr new_storage_end) {
+  constexpr uptr granularity = ASAN_SHADOW_GRANULARITY;
+  uptr new_internal_end = RoundDownTo(new_storage_end, granularity);
+  uptr old_ptr = old_storage_beg;
+  uptr new_ptr = new_storage_beg;
+
+  while (new_ptr < new_storage_end) {
+    uptr next_new = RoundUpTo(new_ptr + 1, granularity);
+    uptr granule_begin = next_new - granularity;
+    uptr unpoisoned_bytes = 0;
+
+    for (; new_ptr != next_new && new_ptr != new_storage_end;
+         ++new_ptr, ++old_ptr) {
+      if (!AddressIsPoisoned(old_ptr)) {
+        unpoisoned_bytes = new_ptr - granule_begin + 1;
+      }
+    }
+    if (new_ptr < new_storage_end || new_ptr == new_internal_end ||
+        AddressIsPoisoned(new_storage_end)) {
+      if (unpoisoned_bytes != 0 || granule_begin >= new_storage_beg) {
+        AnnotateContainerGranuleAccessibleBytes(granule_begin,
+                                                unpoisoned_bytes);
+      } else if (!AddressIsPoisoned(new_storage_beg)) {
+        AnnotateContainerGranuleAccessibleBytes(
+            granule_begin, new_storage_beg - granule_begin);
+      }
+    }
+  }
+}
+
+// This function is basically the same as SlowCopyContainerAnnotations,
+// but goes through elements in reversed order
+static void SlowRCopyContainerAnnotations(uptr old_storage_beg,
+                                          uptr old_storage_end,
+                                          uptr new_storage_beg,
+                                          uptr new_storage_end) {
+  constexpr uptr granularity = ASAN_SHADOW_GRANULARITY;
+  uptr new_internal_beg = RoundDownTo(new_storage_beg, granularity);
+  uptr new_internal_end = RoundDownTo(new_storage_end, granularity);
+  uptr old_ptr = old_storage_end;
+  uptr new_ptr = new_storage_end;
+
+  while (new_ptr > new_storage_beg) {
+    uptr granule_begin = RoundDownTo(new_ptr - 1, granularity);
+    uptr unpoisoned_bytes = 0;
+
+    for (; new_ptr != granule_begin && new_ptr != new_storage_beg;
+         --new_ptr, --old_ptr) {
+      if (unpoisoned_bytes == 0 && !AddressIsPoisoned(old_ptr - 1)) {
+        unpoisoned_bytes = new_ptr - granule_begin;
+      }
+    }
+
+    if (new_ptr >= new_internal_end && !AddressIsPoisoned(new_storage_end)) {
+      continue;
+    }
+
+    if (granule_begin == new_ptr || unpoisoned_bytes != 0) {
+      AnnotateContainerGranuleAccessibleBytes(granule_begin, unpoisoned_bytes);
+    } else if (!AddressIsPoisoned(new_storage_beg)) {
+      AnnotateContainerGranuleAccessibleBytes(granule_begin,
+                                              new_storage_beg - granule_begin);
+    }
+  }
+}
+
+// This function copies ASan memory annotations (poisoned/unpoisoned states)
+// from one buffer to another.
+// It's main purpose is to help with relocating trivially relocatable objects,
+// which memory may be poisoned, without calling copy constructor.
+// However, it does not move memory content itself, only annotations.
+// If the buffers aren't aligned (the distance between buffers isn't
+// granule-aligned)
+//     // old_storage_beg % granularity != new_storage_beg % granularity
+// the function handles this by going byte by byte, slowing down performance.
+// The old buffer annotations are not removed. If necessary,
+// user can unpoison old buffer with __asan_unpoison_memory_region.
+void __sanitizer_copy_contiguous_container_annotations(
+    const void *old_storage_beg_p, const void *old_storage_end_p,
+    const void *new_storage_beg_p, const void *new_storage_end_p) {
+  if (!flags()->detect_container_overflow)
+    return;
+
+  VPrintf(2, "contiguous_container_old: %p %p\n", old_storage_beg_p,
+          old_storage_end_p);
+  VPrintf(2, "contiguous_container_new: %p %p\n", new_storage_beg_p,
+          new_storage_end_p);
+
+  uptr old_storage_beg = reinterpret_cast<uptr>(old_storage_beg_p);
+  uptr old_storage_end = reinterpret_cast<uptr>(old_storage_end_p);
+  uptr new_storage_beg = reinterpret_cast<uptr>(new_storage_beg_p);
+  uptr new_storage_end = reinterpret_cast<uptr>(new_storage_end_p);
+
+  constexpr uptr granularity = ASAN_SHADOW_GRANULARITY;
+
+  if (!(old_storage_beg <= old_storage_end) ||
+      !(new_storage_beg <= new_storage_end) ||
+      (old_storage_end - old_storage_beg) !=
+          (new_storage_end - new_storage_beg)) {
+    GET_STACK_TRACE_FATAL_HERE;
+    ReportBadParamsToCopyContiguousContainerAnnotations(
+        old_storage_beg, old_storage_end, new_storage_beg, new_storage_end,
+        &stack);
+  }
+
+  if (old_storage_beg == old_storage_end || old_storage_beg == new_storage_beg)
+    return;
+  // The only edge cases involve edge granules when the container starts or
+  // ends within a granule. We already know that the container's start and end
+  // points lie in different granules.
+  uptr old_external_end = RoundUpTo(old_storage_end, granularity);
+  if (old_storage_beg < new_storage_beg &&
+      new_storage_beg <= old_external_end) {
+    // In this case, we have to copy elements in reversed order, because
+    // destination buffer starts in the middle of the source buffer (or shares
+    // first granule with it).
+    // It still may be possible to optimize, but reversed order has to be kept.
+    if (old_storage_beg % granularity != new_storage_beg % granularity ||
+        WithinOneGranule(new_storage_beg, new_storage_end)) {
+      SlowRCopyContainerAnnotations(old_storage_beg, old_storage_end,
+                                    new_storage_beg, new_storage_end);
+      return;
+    }
+
+    uptr new_internal_end = RoundDownTo(new_storage_end, granularity);
+    if (new_internal_end != new_storage_end &&
+        AddressIsPoisoned(new_storage_end)) {
+      // Last granule
+      uptr old_internal_end = RoundDownTo(old_storage_end, granularity);
+      if (AddressIsPoisoned(old_storage_end)) {
+        CopyGranuleAnnotation(new_internal_end, old_internal_end);
+      } else {
+        AnnotateContainerGranuleAccessibleBytes(
+            new_internal_end, old_storage_end - old_internal_end);
+      }
+    }
+
+    uptr new_internal_beg = RoundUpTo(new_storage_beg, granularity);
+    if (new_internal_end > new_internal_beg) {
+      uptr old_internal_beg = RoundUpTo(old_storage_beg, granularity);
+      __builtin_memmove((u8 *)MemToShadow(new_internal_beg),
+                        (u8 *)MemToShadow(old_internal_beg),
+                        (new_internal_end - new_internal_beg) / granularity);
+    }
+
+    if (new_internal_beg != new_storage_beg) {
+      // First granule
+      uptr new_external_beg = RoundDownTo(new_storage_beg, granularity);
+      uptr old_external_beg = RoundDownTo(old_storage_beg, granularity);
+      if (!AddressIsPoisoned(old_storage_beg)) {
+        CopyGranuleAnnotation(new_external_beg, old_external_beg);
+      } else if (!AddressIsPoisoned(new_storage_beg)) {
+        AnnotateContainerGranuleAccessibleBytes(
+            new_external_beg, new_storage_beg - new_external_beg);
+      }
+    }
+    return;
+  }
+
+  // Simple copy of annotations of all internal granules.
+  if (old_storage_beg % granularity != new_storage_beg % granularity ||
+      WithinOneGranule(new_storage_beg, new_storage_end)) {
+    SlowCopyContainerAnnotations(old_storage_beg, old_storage_end,
+                                 new_storage_beg, new_storage_end);
+    return;
+  }
+
+  uptr new_internal_beg = RoundUpTo(new_storage_beg, granularity);
+  if (new_internal_beg != new_storage_beg) {
+    // First granule
+    uptr new_external_beg = RoundDownTo(new_storage_beg, granularity);
+    uptr old_external_beg = RoundDownTo(old_storage_beg, granularity);
+    if (!AddressIsPoisoned(old_storage_beg)) {
+      CopyGranuleAnnotation(new_external_beg, old_external_beg);
+    } else if (!AddressIsPoisoned(new_storage_beg)) {
+      AnnotateContainerGranuleAccessibleBytes(
+          new_external_beg, new_storage_beg - new_external_beg);
+    }
+  }
+
+  uptr new_internal_end = RoundDownTo(new_storage_end, granularity);
+  if (new_internal_end > new_internal_beg) {
+    uptr old_internal_beg = RoundUpTo(old_storage_beg, granularity);
+    __builtin_memmove((u8 *)MemToShadow(new_internal_beg),
+                      (u8 *)MemToShadow(old_internal_beg),
+                      (new_internal_end - new_internal_beg) / granularity);
+  }
+
+  if (new_internal_end != new_storage_end &&
+      AddressIsPoisoned(new_storage_end)) {
+    // Last granule
+    uptr old_internal_end = RoundDownTo(old_storage_end, granularity);
+    if (AddressIsPoisoned(old_storage_end)) {
+      CopyGranuleAnnotation(new_internal_end, old_internal_end);
+    } else {
+      AnnotateContainerGranuleAccessibleBytes(
+          new_internal_end, old_storage_end - old_internal_end);
+    }
+  }
+}
+
 static const void *FindBadAddress(uptr begin, uptr end, bool poisoned) {
   CHECK_LE(begin, end);
   constexpr uptr kMaxRangeToCheck = 32;
diff --git a/compiler-rt/lib/asan/asan_report.cpp b/compiler-rt/lib/asan/asan_report.cpp
index fd590e401f67fe..45aa607dcda07f 100644
--- a/compiler-rt/lib/asan/asan_report.cpp
+++ b/compiler-rt/lib/asan/asan_report.cpp
@@ -367,6 +367,16 @@ void ReportBadParamsToAnnotateDoubleEndedContiguousContainer(
   in_report.ReportError(error);
 }
 
+void ReportBadParamsToCopyContiguousContainerAnnotations(
+    uptr old_storage_beg, uptr old_storage_end, uptr new_storage_beg,
+    uptr new_storage_end, BufferedStackTrace *stack) {
+  ScopedInErrorReport in_report;
+  ErrorBadParamsToCopyContiguousContainerAnnotations error(
+      GetCurrentTidOrInvalid(), stack, old_storage_beg, old_storage_end,
+      new_storage_beg, new_storage_end);
+  in_report.ReportError(error);
+}
+
 void ReportODRViolation(const __asan_global *g1, u32 stack_id1,
                         const __asan_global *g2, u32 stack_id2) {
   ScopedInErrorReport in_report;
diff --git a/compiler-rt/lib/asan/asan_report.h b/compiler-rt/lib/asan/asan_report.h
index 3540b3b4b1bfe0..3143d83abe390d 100644
--- a/compiler-rt/lib/asan/asan_report.h
+++ b/compiler-rt/lib/asan/asan_report.h
@@ -88,6 +88,9 @@ void ReportBadParamsToAnnotateDoubleEndedContiguousContainer(
     uptr storage_beg, uptr storage_end, uptr old_container_beg,
     uptr old_container_end, uptr new_container_beg, uptr new_container_end,
     BufferedStackTrace *stack);
+void ReportBadParamsToCopyContiguousContainerAnnotations(
+    uptr old_storage_beg, uptr old_storage_end, uptr new_storage_beg,
+    uptr new_storage_end, BufferedStackTrace *stack);
 
 void ReportODRViolation(const __asan_global *g1, u32 stack_id1,
                         const __asan_global *g2, u32 stack_id2);
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_common_interface.inc b/compiler-rt/lib/sanitizer_common/sanitizer_common_interface.inc
index d5981a38ff2921..d6b0662d80b940 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_common_interface.inc
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_common_interface.inc
@@ -10,6 +10,7 @@
 INTERFACE_FUNCTION(__sanitizer_acquire_crash_state)
 INTERFACE_FUNCTION(__sanitizer_annotate_contiguous_container)
 INTERFACE_FUNCTION(__sanitizer_annotate_double_ended_contiguous_container)
+INTERFACE_FUNCTION(__sanitizer_copy_contiguous_container_annotations)
 INTERFACE_FUNCTION(__sanitizer_contiguous_container_find_bad_address)
 INTERFACE_FUNCTION(
     __sanitizer_double_ended_contiguous_container_find_bad_address)
diff --git a/compiler-rt/lib/sanitizer_common/sanitizer_interface_internal.h b/compiler-rt/lib/sanitizer_common/sanitizer_interface_internal.h
index cd0d45e2f3fab1..07d6b238301c27 100644
--- a/compiler-rt/lib/sanitizer_common/sanitizer_interface_internal.h
+++ b/compiler-rt/lib/sanitizer_common/sanitizer_interface_internal.h
@@ -71,6 +71,10 @@ void __sanitizer_annotate_double_ended_contiguous_container(
     const void *old_container_beg, const void *old_container_end,
     const void *new_container_beg, const void *new_container_end);
 SANITIZER_INTERFACE_ATTRIBUTE
+void __sanitizer_copy_contiguous_container_annotations(
+    const void *old_storage_beg, const void *old_storage_end,
+    const void *new_storage_beg, const void *new_storage_end);
+SANITIZER_INTERFACE_ATTRIBUTE
 int __sanitizer_verify_contiguous_container(const void *beg, const void *mid,
                                             const void *end);
 SANITIZER_INTERFACE_ATTRIBUTE
diff --git a/compiler-rt/test/asan/TestCases/move_container_annotations.cpp b/compiler-rt/test/asan/TestCases/move_container_annotations.cpp
new file mode 100644
index 00000000000000..db3b3e96968cec
--- /dev/null
+++ b/compiler-rt/test/asan/TestCases/move_container_annotations.cpp
@@ -0,0 +1,232 @@
+// RUN: %clangxx_asan -fexceptions -O %s -o %t && %env_asan_opts=detect_stack_use_after_return=0 %run %t
+//
+// Test __sanitizer_copy_contiguous_container_annotations.
+
+#include <algorithm>
+#include <deque>
+#include <numeric>
+
+#include <assert.h>
+#include <sanitizer/asan_interface.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+static constexpr size_t kGranularity = 8;
+
+template <class T> static constexpr T RoundDown(T x) {
+  return reinterpret_cast<T>(reinterpret_cast<uintptr_t>(x) &
+                             ~(kGranularity - 1));
+}
+template <class T> static constexpr T RoundUp(T x) {
+  return (x == RoundDown(x))
+             ? x
+             : reinterpret_cast<T>(reinterpret_cast<uintptr_t>(RoundDown(x)) +
+                                   kGranularity);
+}
+
+static std::deque<int> GetPoisonedState(char *begin, char *end) {
+  std::deque<int> result;
+  for (char *ptr = begin; ptr != end; ++ptr) {
+    result.push_back(__asan_address_is_poisoned(ptr));
+  }
+  return result;
+}
+
+static void RandomPoison(char *beg, char *end) {
+  if (beg != RoundDown(beg) && RoundDown(beg) != RoundDown(end) &&
+      rand() % 2 == 1) {
+    __asan_poison_memory_region(beg, RoundUp(beg) - beg);
+    __asan_unpoison_memory_region(beg, rand() % (RoundUp(beg) - beg + 1));
+  }
+  for (beg = RoundUp(beg); beg + kGranularity <= end; beg += kGranularity) {
+    __asan_poison_memory_region(beg, kGranularity);
+    __a...
[truncated]

@AdvenamTacet AdvenamTacet changed the title [compiler-rt][ASan] Add function moving annotations [compiler-rt][ASan] Add function copying annotations Aug 28, 2024
@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch from 8790c6f to d6b5542 Compare August 30, 2024 16:24
@AdvenamTacet
Copy link
Member Author

This PR is ready for the review. I see nothing more what has to be done in it.

Next step (but it may be in a separate PR) is adding a function like asan_unsafe_memmove, which would be a memmove with turned off instrumentation (to not trigger ASan error). I'm unsure in which file we should add the function and I'm open to suggestions. Optionally we can create asan_unsafe_memmove_with_annotations (to copy content and annotations in one call).

Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some time to reasonably review PR.

size_t off_new, int poison_buffers) {
size_t old_buffer_size = capacity + off_old + kGranularity * 2;
size_t new_buffer_size = capacity + off_new + kGranularity * 2;
char *old_buffer = new char[old_buffer_size];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uinque_ptr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look good now? Or should I change more?

compiler-rt/lib/asan/asan_poisoning.cpp Outdated Show resolved Hide resolved
uptr src_storage_begin = reinterpret_cast<uptr>(src_begin_p);
uptr src_storage_end = reinterpret_cast<uptr>(src_end_p);
uptr dst_storage_begin = reinterpret_cast<uptr>(dst_begin_p);
uptr dst_storage_end =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe safer to pass dst_storage_end as a parameters
and check the assumption here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first implementation and original reasoning, but later I changed it. My motivation behind removing that argument was intended use of that function when (without ASan) memmove or memcpy is used and those functions assume same size of both containers. But I'm happy with either, reverted that change for now.

Also, should we keep this order of arguments (src, dst) or should we change the order to dst be before src? I based order on old/new container annotation functions, like:

void __sanitizer_annotate_double_ended_contiguous_container(
    const void *storage_beg_p, const void *storage_end_p,
    const void *old_container_beg_p, const void *old_container_end_p,
    const void *new_container_beg_p, const void *new_container_end_p) {

But maybe it's better to mimic memmove/memcpy with destination being first?

@vitalybuka vitalybuka self-requested a review September 6, 2024 19:44
Copy link
Member Author

@AdvenamTacet AdvenamTacet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some time to reasonably review PR.

If I can help in any way, let me know!

uptr src_storage_begin = reinterpret_cast<uptr>(src_begin_p);
uptr src_storage_end = reinterpret_cast<uptr>(src_end_p);
uptr dst_storage_begin = reinterpret_cast<uptr>(dst_begin_p);
uptr dst_storage_end =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first implementation and original reasoning, but later I changed it. My motivation behind removing that argument was intended use of that function when (without ASan) memmove or memcpy is used and those functions assume same size of both containers. But I'm happy with either, reverted that change for now.

Also, should we keep this order of arguments (src, dst) or should we change the order to dst be before src? I based order on old/new container annotation functions, like:

void __sanitizer_annotate_double_ended_contiguous_container(
    const void *storage_beg_p, const void *storage_end_p,
    const void *old_container_beg_p, const void *old_container_end_p,
    const void *new_container_beg_p, const void *new_container_end_p) {

But maybe it's better to mimic memmove/memcpy with destination being first?

size_t off_new, int poison_buffers) {
size_t old_buffer_size = capacity + off_old + kGranularity * 2;
size_t new_buffer_size = capacity + off_new + kGranularity * 2;
char *old_buffer = new char[old_buffer_size];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look good now? Or should I change more?

Advenam Tacet and others added 8 commits October 3, 2024 05:28
This PR adds a `__sanitizer_move_contiguous_container_annotations` function, which moves annotations from one memory area to another.
Old area is unpoisoned at the end. New area is annotated in the same way as the old region at the beginning (within limitations of ASan).

```cpp
void __sanitizer_move_contiguous_container_annotations(
    const void *old_storage_beg_p, const void *old_storage_end_p,
    const void *new_storage_beg_p, const void *new_storage_end_p) {
```

This function aims to help with short string annotations and similar container annotations.
Right now we change trait types of `std::basic_string` when compiling with ASan.

https://github.com/llvm/llvm-project/blob/87f3407856e61a73798af4e41b28bc33b5bf4ce6/libcxx/include/string#L738-L751

The goal is to not change `__trivially_relocatable` when compiling with ASan.
If this function is accpeted and upstreamed, the next step is creating function like `__memcpy_with_asan` moving memory with ASan.
And then using this function instead of `__builtin__memcpy` while moving trivially relocatable objects.

NOTICE: I did not test it yet, so it's probably not compiling, but I don't expect big changes. PR is WIP until I test it.

---

I'm thinking if there is a good way to address fact that in a container the new buffer is usually bigger than the previous one.
We may add two more arguments to the functions to address it (the beginning and the end of the whole buffer.

Another potential change is removing `new_storage_end_p` as it's redundant, because we require the same size.
Implemented correct container annotations copying function (non-overlapping containers only)

Refactored the initial PoC to a fully functional implementation for copying container annotations without moving/copying the memory content itself. This function currently supports cases where the source and destination containers do not overlap. Handling overlapping containers would add complexity.

The function is designed to work irrespective of whether the buffers are granule-aligned or the distance between them is granule-aligned. However, such scenarios may have an impact on performance.

A Test case has been included to verify the correctness of the implementation.

Removed unpoisoning oryginal buffer at the end, users can do it by themselves if they want to.
Added helper functions, removed unnecessary branches, simplified code.
Adds support for overlapping containers.
Adds test case for that.
Update comments, simplify flow, improve readability.
- VPrintf level changed to 3.
- unique_ptr used in the test example.
For safety.
Fix code formatting.
@AdvenamTacet AdvenamTacet force-pushed the short-string-annotations-cleaning-traits branch from 496d6a6 to a2e5f58 Compare October 3, 2024 03:30
Copy link
Collaborator

@vitalybuka vitalybuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please pull the rest of #110941 and LGTM

I guess future TODO is to optimize SLOW routines, which is 15x slower then fast path.

I think reasonably readable and efficient approach is to create
iterators which walk src shadow in bytes and convert into bitsets of poisoned bytes, with required shifts by offset.

and another iterator which stores bitsets into dst shadow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants