ToolActToolAct

Regex Testing Tool

Test and debug regular expressions with real-time matching highlight and code generation

Regular Expression
//g
Flags
gglobal
iignore case
mmultiline
sdotAll
uunicode
Common Patterns
Test Text
Match Results0 matches
No matches

What is a Regular Expression?

The Regex tool helps write, test, and understand regular expressions. A regular expression describes text patterns for search, extraction, validation, and replacement, such as email-like strings, log fields, URLs, IDs, dates, or repeated formatting. A tester makes matches, capture groups, flags, and edge cases visible before a pattern is used in code, a data pipeline, or form validation. Regex is powerful but easy to overtrust: a pattern may match too much or too little, become slow on large inputs, or handle Unicode and locale rules poorly. For complex parsers, HTML structures, or security-critical validation, a dedicated parser or library is often safer.

How to Use

How to use

  1. Enter your regex pattern in the left input box
  2. Select needed flags (e.g., g for global, i for case-insensitive)
  3. Enter test text on the right side
  4. View highlighted matches and details

Regex Tips

  • Test patterns with both matching and non-matching examples so overly broad expressions are easier to catch.
  • Be careful with nested quantifiers on long text; some patterns can become slow because of excessive backtracking.

Use Cases

Test JavaScript regular expressions liveEnter a pattern, choose flags g, i, m, s, and u, and test against sample text with highlighted matches. Invalid regex syntax is caught immediately, and zero-length global matches are handled without locking the loop. Real-time feedback makes iterative tweaking faster than running a script in a console or restarting an editor every time the pattern changes.
Inspect match positions and named groupsEvery match lists its text, start and end offsets, and named capture groups when present. This is useful when refining extraction patterns for logs, form validation, import rules, or data-cleaning scripts. The visible offset lets the same pattern be plugged into a wider pipeline that already tracks character positions in the source file.
Turn a working pattern into code snippetsApply built-in examples for emails, URLs, IPs, dates, colors, Chinese text, numbers, and whitespace, then generate copyable snippets for JavaScript, Python, Java, PHP, or Go using the selected flags where each language supports them. The generated snippets include the literal pattern and the matching flag set, so the runtime behavior matches what was tested in the page.
Debug catastrophic backtracking and runaway matchesPaste a long or adversarial string into the tester to expose nested quantifiers such as (a+)+ or .*foo.* that freeze the engine. Watch the match count and per-match offsets; if iteration stalls or memory spikes, simplify with atomic groups, possessive quantifiers, or by tightening anchors. A good pattern survives the worst-case input without exponential blowup.
Compare greedy vs lazy quantifiers, lookbehind, and \p{L}Use the tester to see how greedy quantifiers (.*) and lazy quantifiers (.*?) pick different prefixes in the same text, then check lookbehind support (?<=) which is available in JavaScript and PCRE but not in every older engine. Switch on the u flag to use Unicode property classes like \p{L} (any letter) and \p{Sc} (currency symbol), so letter-based patterns stay correct across Latin, Cyrillic, and Han scripts.

Technical Principle

The tester is driven by the ECMAScript RegExp engine defined in the language spec. ECMAScript regex is a backtracking NFA implementation: the engine tries each alternative in left-to-right order and rewinds whenever a quantifier fails, which is what makes lookarounds and backreferences possible but also what produces catastrophic backtracking on adversarial inputs. The six flags it recognises are g (global), i (case-insensitive), m (multiline, anchors per line), s (dotAll, '.' matches newline), u (Unicode mode), and y (sticky, anchor to lastIndex). Matching semantics shift sharply between flags. Under /u, the engine treats the pattern as a sequence of Unicode code points instead of UTF-16 code units, so surrogate pairs match as one character and Unicode property classes like \p{L} (any letter), \p{Sc} (currency symbol), and \p{Script=Han} become available. Named capture groups (?<name>...) and lookbehind (?<=...) are part of ES2018 and ship in Chromium 64+, Firefox 78+, and Safari 16.4+, but PCRE-only features such as atomic groups (?>...) and possessive quantifiers (a++) are still not in the spec. The single largest production hazard is ReDoS — a pattern like (a+)+b on the input 'aaaaaaaaaaaaaaaa!' explores an exponential number of paths because two nested quantifiers can split the same character run in many ways. Mitigations are well known: avoid nested quantifiers over the same character class, anchor with ^ and $ where possible, prefer atomic groups when the engine supports them, or switch to a linear-time engine such as Go's RE2 (which trades backreferences and lookbehind for guaranteed O(n) matching). For untrusted input destined for server-side validation, RE2 or a hand-written parser is almost always the safer call.

  • Engine: ECMAScript RegExp is a backtracking NFA, not a DFA; that is what enables lookarounds and backreferences but also enables catastrophic backtracking.
  • Flags: g (global), i (case-insensitive), m (^/$ per line), s (dotAll, '.' matches \n), u (Unicode mode, surrogate-aware), y (sticky, anchored to lastIndex).
  • Unicode mode: /u unlocks code-point matching plus \p{L} (letter), \p{Sc} (currency), \p{Script=Han}; without /u, [a-zA-Z]+ misses every non-Latin word.
  • Modern syntax availability: named captures (?<n>...) and lookbehind (?<=...) ship in Chromium 64+, Firefox 78+, Safari 16.4+; atomic groups (?>...) and possessive quantifiers (a++) are not in ECMAScript.
  • ReDoS: nested quantifiers over the same class such as (a+)+b explode exponentially; tighten with anchors, atomic groups, or migrate to RE2 / re2-wasm for untrusted input.
  • Complexity floor: literal anchored patterns over a fixed alphabet run in O(n); backreferences and unbounded lookarounds push worst-case to exponential, so benchmark on adversarial input before shipping.

Examples

Match an email address

Pattern: ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
Flags:   gi

Input:   contact: alice@example.com, bob_2024@mail.co.uk, invalid@.com
Matches: alice@example.com, bob_2024@mail.co.uk  (2 hits)

Extract IPv4 addresses from a log

Pattern: \b(?:\d{1,3}\.){3}\d{1,3}\b
Flags:   g

Input:   2026-06-15 ERROR client 192.168.1.42 connect failed (peer 10.0.0.1)
Matches: 192.168.1.42, 10.0.0.1

Capture date parts with named groups

Pattern: (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
Flags:   g

Input:   Sprint window: 2026-09-01 to 2026-09-15
Groups:  {year:'2026', month:'09', day:'01'}, {year:'2026', month:'09', day:'15'}

Greedy vs lazy quantifier

Input:   <b>hello</b> <i>world</i>

.*  (greedy) -> matches the whole line: <b>hello</b> <i>world</i>
.*? (lazy)   -> first match only: <b>hello</b>

Lazy is what you usually want when scraping between tags.

Unicode-aware letter class

Pattern: \p{L}+
Flags:   gu

Input:   Hello, 北京! Café Москва
Matches: Hello, 北京, Café, Россия

Without the u flag and \p{L}, [a-zA-Z]+ would miss every non-Latin word.

FAQ

Which regex flavor does the tester use?

JavaScript's regex engine (ECMAScript). It is similar to PCRE for common features but differs in edge cases: no possessive quantifiers, no recursion, lookbehind support requires a recent browser, named groups use (?<name>...). For Python (re/regex), Java, .NET, or PCRE, behavior on the same pattern can differ - test in the actual environment too.

What flags are supported?

g (global, find all matches), i (case-insensitive), m (multiline, ^ and $ match line boundaries), s (dotall, . matches newline), u (Unicode), y (sticky), d (hasIndices). Pick them with the toggle row; the page shows what each one changes in your test text.

Why does my pattern match fewer results than expected?

Without the g flag, regex.exec returns one match. Without m, ^ and $ only match the very start/end of input. Without u, surrogate-pair characters (emoji) match as two halves. Most 'why doesn't this work' issues are missing flags.

How do I match emoji and CJK characters?

Use the u flag to enable proper Unicode handling. Use \p{...} character classes (\p{Letter}, \p{Script=Han}, \p{Emoji}) to match by Unicode property. Without u, [a-zA-Z] won't match accented characters - either expand to [a-zA-ZÀ-ÿ] or use \p{L}.

Can I generate code from a pattern?

Most builds export the pattern as JavaScript, Python, PHP, Ruby, Go, or Java syntax with the appropriate escaping. Be aware: the same pattern may behave differently across engines (greediness defaults, character class behaviour, lookaround support). Always test in the target language.

What's the difference between greedy and lazy quantifiers?

Greedy (* + ?) match as much as possible, then backtrack if needed. Lazy (*? +? ??) match as little as possible. For 'extract everything between A and B', greedy .* will overshoot if multiple Bs exist; lazy .*? stops at the first B. Pick lazy by default for inner-content extraction.

Is my regex or test text uploaded?

No. Matching runs in your browser using JavaScript's native regex engine. Patterns and inputs are not transmitted.