RegEx with ASAN crashes the application

Summary

The issue at hand involves a regular expression pattern that causes a crash when used with AddressSanitizer (ASAN) in a C++ application. The pattern L"primary key\(*" is used to match the string “primary key” followed by zero or more occurrences of any character, but it results in a crash due to invalid memory access.

Root Cause

The root cause of the issue is the use of a quantifier (*) without a preceding atom in the regular expression pattern. This results in an invalid regular expression, which causes the crash when used with ASAN. The key points are:

  • The pattern L"primary key\(*" is invalid due to the missing atom before the quantifier.
  • The std::wregex constructor does not check for the validity of the pattern, leading to a crash at runtime.

Why This Happens in Real Systems

This issue occurs in real systems due to the following reasons:

  • Insufficient validation of regular expression patterns before using them.
  • Lack of understanding of regular expression syntax and semantics.
  • Inadequate testing of regular expression patterns with different inputs.

Real-World Impact

The impact of this issue in real-world systems can be significant, including:

  • Application crashes due to invalid memory access.
  • Data corruption or loss due to incorrect handling of regular expressions.
  • Security vulnerabilities if the regular expression is used to validate user input.

Example or Code

std::wregex pattern(L"primary key\\(.*", std::regex_constants::icase);

Note that the corrected pattern uses a character class (.*) to match any characters, and the \\ is used to escape the backslash.

How Senior Engineers Fix It

Senior engineers fix this issue by:

  • Validating the regular expression pattern before using it.
  • Using a character class or atom before the quantifier.
  • Testing the regular expression pattern with different inputs to ensure its correctness.
  • Enabling ASAN to detect invalid memory access and other issues.

Why Juniors Miss It

Junior engineers may miss this issue due to:

  • Lack of experience with regular expressions and their syntax.
  • Insufficient understanding of the implications of using invalid regular expressions.
  • Inadequate testing of regular expression patterns, leading to crashes or other issues at runtime.
  • Not using tools like ASAN to detect and diagnose issues related to memory access and other low-level details.