Regex Testing Tool
Test and debug regular expressions with real-time matching highlight and code generation
What is a Regular Expression?
The Regex tool helps write, test, and understand regular expressions. A regular expression describes text patterns for search, extraction, validation, and replacement, such as email-like strings, log fields, URLs, IDs, dates, or repeated formatting. A tester makes matches, capture groups, flags, and edge cases visible before a pattern is used in code, a data pipeline, or form validation. Regex is powerful but easy to overtrust: a pattern may match too much or too little, become slow on large inputs, or handle Unicode and locale rules poorly. For complex parsers, HTML structures, or security-critical validation, a dedicated parser or library is often safer.
How to Use
How to use
- Enter your regex pattern in the left input box
- Select needed flags (e.g., g for global, i for case-insensitive)
- Enter test text on the right side
- View highlighted matches and details
Regex Tips
- Test patterns with both matching and non-matching examples so overly broad expressions are easier to catch.
- Be careful with nested quantifiers on long text; some patterns can become slow because of excessive backtracking.
Use Cases
Technical Principle
The tester is driven by the ECMAScript RegExp engine defined in the language spec. ECMAScript regex is a backtracking NFA implementation: the engine tries each alternative in left-to-right order and rewinds whenever a quantifier fails, which is what makes lookarounds and backreferences possible but also what produces catastrophic backtracking on adversarial inputs. The six flags it recognises are g (global), i (case-insensitive), m (multiline, anchors per line), s (dotAll, '.' matches newline), u (Unicode mode), and y (sticky, anchor to lastIndex). Matching semantics shift sharply between flags. Under /u, the engine treats the pattern as a sequence of Unicode code points instead of UTF-16 code units, so surrogate pairs match as one character and Unicode property classes like \p{L} (any letter), \p{Sc} (currency symbol), and \p{Script=Han} become available. Named capture groups (?<name>...) and lookbehind (?<=...) are part of ES2018 and ship in Chromium 64+, Firefox 78+, and Safari 16.4+, but PCRE-only features such as atomic groups (?>...) and possessive quantifiers (a++) are still not in the spec. The single largest production hazard is ReDoS — a pattern like (a+)+b on the input 'aaaaaaaaaaaaaaaa!' explores an exponential number of paths because two nested quantifiers can split the same character run in many ways. Mitigations are well known: avoid nested quantifiers over the same character class, anchor with ^ and $ where possible, prefer atomic groups when the engine supports them, or switch to a linear-time engine such as Go's RE2 (which trades backreferences and lookbehind for guaranteed O(n) matching). For untrusted input destined for server-side validation, RE2 or a hand-written parser is almost always the safer call.
- Engine: ECMAScript RegExp is a backtracking NFA, not a DFA; that is what enables lookarounds and backreferences but also enables catastrophic backtracking.
- Flags: g (global), i (case-insensitive), m (^/$ per line), s (dotAll, '.' matches \n), u (Unicode mode, surrogate-aware), y (sticky, anchored to lastIndex).
- Unicode mode: /u unlocks code-point matching plus \p{L} (letter), \p{Sc} (currency), \p{Script=Han}; without /u, [a-zA-Z]+ misses every non-Latin word.
- Modern syntax availability: named captures (?<n>...) and lookbehind (?<=...) ship in Chromium 64+, Firefox 78+, Safari 16.4+; atomic groups (?>...) and possessive quantifiers (a++) are not in ECMAScript.
- ReDoS: nested quantifiers over the same class such as (a+)+b explode exponentially; tighten with anchors, atomic groups, or migrate to RE2 / re2-wasm for untrusted input.
- Complexity floor: literal anchored patterns over a fixed alphabet run in O(n); backreferences and unbounded lookarounds push worst-case to exponential, so benchmark on adversarial input before shipping.
Examples
Match an email address
Pattern: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
Flags: gi
Input: contact: alice@example.com, bob_2024@mail.co.uk, invalid@.com
Matches: alice@example.com, bob_2024@mail.co.uk (2 hits)Extract IPv4 addresses from a log
Pattern: \b(?:\d{1,3}\.){3}\d{1,3}\b
Flags: g
Input: 2026-06-15 ERROR client 192.168.1.42 connect failed (peer 10.0.0.1)
Matches: 192.168.1.42, 10.0.0.1Capture date parts with named groups
Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Flags: g
Input: Sprint window: 2026-09-01 to 2026-09-15
Groups: {year:'2026', month:'09', day:'01'}, {year:'2026', month:'09', day:'15'}Greedy vs lazy quantifier
Input: <b>hello</b> <i>world</i>
.* (greedy) -> matches the whole line: <b>hello</b> <i>world</i>
.*? (lazy) -> first match only: <b>hello</b>
Lazy is what you usually want when scraping between tags.Unicode-aware letter class
Pattern: \p{L}+
Flags: gu
Input: Hello, 北京! Café Москва
Matches: Hello, 北京, Café, Россия
Without the u flag and \p{L}, [a-zA-Z]+ would miss every non-Latin word.FAQ
Which regex flavor does the tester use?
JavaScript's regex engine (ECMAScript). It is similar to PCRE for common features but differs in edge cases: no possessive quantifiers, no recursion, lookbehind support requires a recent browser, named groups use (?<name>...). For Python (re/regex), Java, .NET, or PCRE, behavior on the same pattern can differ - test in the actual environment too.
What flags are supported?
g (global, find all matches), i (case-insensitive), m (multiline, ^ and $ match line boundaries), s (dotall, . matches newline), u (Unicode), y (sticky), d (hasIndices). Pick them with the toggle row; the page shows what each one changes in your test text.
Why does my pattern match fewer results than expected?
Without the g flag, regex.exec returns one match. Without m, ^ and $ only match the very start/end of input. Without u, surrogate-pair characters (emoji) match as two halves. Most 'why doesn't this work' issues are missing flags.
How do I match emoji and CJK characters?
Use the u flag to enable proper Unicode handling. Use \p{...} character classes (\p{Letter}, \p{Script=Han}, \p{Emoji}) to match by Unicode property. Without u, [a-zA-Z] won't match accented characters - either expand to [a-zA-ZÀ-ÿ] or use \p{L}.
Can I generate code from a pattern?
Most builds export the pattern as JavaScript, Python, PHP, Ruby, Go, or Java syntax with the appropriate escaping. Be aware: the same pattern may behave differently across engines (greediness defaults, character class behaviour, lookaround support). Always test in the target language.
What's the difference between greedy and lazy quantifiers?
Greedy (* + ?) match as much as possible, then backtrack if needed. Lazy (*? +? ??) match as little as possible. For 'extract everything between A and B', greedy .* will overshoot if multiple Bs exist; lazy .*? stops at the first B. Pick lazy by default for inner-content extraction.
Is my regex or test text uploaded?
No. Matching runs in your browser using JavaScript's native regex engine. Patterns and inputs are not transmitted.