AI is redefining software quality and security. Insights from 450 CISOs & devs →
Aikido

Guard against slow regular expressions: preventing ReDoS attacks

Rule
Guard against slow regular expressions.
Regular expressions with nested quantifiers or ambiguous patterns can cause catastrophic backtracking and performance issues.
Supported languages: 45+

Introduction

Regular expressions can freeze your application for seconds or minutes with the right input. Catastrophic backtracking occurs when regex engines explore exponentially increasing paths while trying to match a pattern. A regex like (a+)+b takes microseconds to match valid input but can take hours to reject a string of a's without a trailing b. Attackers exploit this through Regular Expression Denial of Service (ReDoS) attacks, sending crafted input that makes your regex engine consume 100% CPU until request timeouts occur or the process crashes.

Why it matters

Security implications (ReDoS attacks): An attacker can paralyze your application with a single request containing crafted input. Email validation and URL parsing patterns are common targets. Unlike traditional DoS attacks requiring bandwidth, ReDoS needs only tiny payloads.

Performance degradation: Normal user input can trigger catastrophic backtracking, causing response times to spike from milliseconds to seconds. This creates unpredictable latency that's difficult to debug because it only manifests with specific input patterns.

Production incidents: Vulnerable regex blocks the event loop in Node.js or consumes thread pool resources. As requests pile up, memory increases and the system becomes unresponsive. In microservices, one vulnerable regex cascades failures to dependent services.

Difficulty in detection: Patterns that work fine in testing with short inputs become exponentially slow with longer inputs. The vulnerability often goes unnoticed until production, requiring emergency deployment during an active incident.

Code examples

❌ Non-compliant:

function validateEmail(email) {
    const regex = /^([a-zA-Z0-9_\-\.]+)+@([a-zA-Z0-9_\-\.]+)+\.([a-zA-Z]{2,5})$/;
    return regex.test(email);
}

function extractURLs(text) {
    const regex = /(https?:\/\/)?([\w\-])+\.(\w+)+([\w\-\.,@?^=%&:/~\+#]*)+/g;
    return text.match(regex);
}

Why it's unsafe: The nested quantifiers ([a-zA-Z0-9_\\-\\.]+)+ create exponential backtracking. For an email like aaaaaaaaaaaaaaaaaaaaaaaaa!, the regex engine tries countless combinations before failing. The URL regex has multiple nested quantifiers that compound the problem, making it trivially exploitable with inputs like long strings of valid characters without the expected structure.

✅ Compliant:

function validateEmail(email) {
    const regex = /^[a-zA-Z0-9_\-\.]+@[a-zA-Z0-9_\-\.]+\.[a-zA-Z]{2,5}$/;
    return regex.test(email);
}

function extractURLs(text) {
    const regex = /https?:\/\/[\w\-]+\.[\w\-]+(?:[\w\-\.,@?^=%&:/~\+#]*)?/g;
    return text.match(regex);
}

Why it's safe: Removing nested quantifiers eliminates catastrophic backtracking. Single quantifiers like [a-zA-Z0-9_\-\.]+ execute in linear time. The URL pattern uses non-capturing groups with optional suffix (?:...)? instead of nested repetition, ensuring predictable performance regardless of input length or content.

Conclusion

Regular expression performance is a security concern, not just an optimization. Review all regex patterns for nested quantifiers, overlapping character classes in repetition groups, and ambiguous alternatives. Test regex patterns with pathological inputs (long strings of valid characters followed by invalid endings) to identify catastrophic backtracking before deployment. When possible, replace complex regex with string parsing functions that have predictable performance characteristics.

FAQs

Got Questions?

What patterns cause catastrophic backtracking?

Common culprits include nested quantifiers like (a+)+, (a*)*, or (a+)*b. Alternation with overlapping patterns like (a|a)* or (a|ab)*. Repetition with optional components like (a?)+. Any pattern where the regex engine can match the same substring in multiple ways creates exponential search space. Watch for quantifiers (+, *, {n,m}) inside groups that are themselves quantified.

How do I test if my regex is vulnerable to ReDoS?

Use online tools like regex101.com which show execution steps and warn about catastrophic backtracking. Create test inputs with long strings of valid characters followed by characters that force backtracking. For pattern /^(a+)+b$/, test with "aaaaaaaaaaaaaaa!" (30+ a's, no b). If execution takes more than milliseconds, the regex is vulnerable. Implement timeouts in production regex operations as defense in depth.

What's the difference between catastrophic and linear backtracking?

Linear backtracking occurs when the regex tries alternatives in sequence but doesn't re-evaluate previous choices. The work grows linearly with input size. Catastrophic backtracking happens when nested quantifiers force the engine to try exponentially many combinations. For input of length n, execution time can be O(2^n) or worse. The difference is between milliseconds and minutes for modest input sizes.

Can I use lookaheads and lookbehinds safely?

Lookaheads (?=...) and lookbehinds (?<=...) themselves don't cause catastrophic backtracking, but they can hide vulnerable patterns. A lookahead containing (a+)+ is still vulnerable. Use lookarounds for their intended purpose (assertions without consuming characters), not as a workaround for complex matching. Keep the patterns inside lookarounds simple and test them thoroughly.

Are there regex engines that prevent catastrophic backtracking?

RE2 (used by Google) guarantees linear time execution by forbidding backtracking entirely. It doesn't support all features (backreferences, lookarounds) but prevents ReDoS completely. For critical security checks, consider using RE2 bindings or similar engines. For JavaScript, there's no built-in alternative, so pattern design and timeouts are your primary defenses.

Should I add timeouts to all regex operations?

For untrusted input (user-provided data, external API responses), yes. Set reasonable timeouts like 100-500ms depending on expected complexity. In Node.js, you can't directly timeout regex.test(), but you can validate input length first or run regex in a worker thread with timeout. Reject inputs exceeding reasonable length limits before attempting regex matching.

How do I fix an existing vulnerable regex pattern?

First, determine if you need regex at all. Many validation tasks are simpler with string methods like includes(), startsWith(), or split(). If regex is necessary, eliminate nested quantifiers by flattening the pattern. Replace (a+)+ with a+. Use atomic groups or possessive quantifiers if your regex engine supports them. For complex patterns, consider parsing the input in multiple passes with simpler regex or string operations.

Get secure for free

Secure your code, cloud, and runtime in one central system.
Find and fix vulnerabilities fast automatically.

No credit card required | Scan results in 32secs.