# regex — Understand and Apply Regular Expressions

> Regular expression syntax reference — character classes, anchors, quantifiers, groups and lookarounds. The regex building blocks explained concisely.

Source: https://www.jpkc.com/db/en/cheatsheets/files-text/regex/

<!-- PROSE:intro -->
Regular expressions (regex) are a compact language for searching, validating and replacing patterns in text. Instead of matching fixed strings, you describe whole classes of matches with wildcards, repetitions and anchor characters – think "four digits in a row" or "an email address". You meet them everywhere: in `grep`, `sed` and `awk` on the command line, in editors and in practically every programming language. Keep in mind that several dialects exist – BRE and ERE in the classic Unix tools, PCRE in Perl, PHP and many modern languages – so details around escaping and extensions can differ. This reference shows you the building blocks you reach for most in practice.
<!-- PROSE:intro:end -->

## Character Classes

`.` — Match any single character (except newline by default).

```bash
h.t matches hat, hit, hot
```

`\d` — Match any digit (0-9). Equivalent to [0-9].

```bash
\d{3} matches 123, 456, 789
```

`\D` — Match any non-digit character.

```bash
\D+ matches abc, hello
```

`\w` — Match any word character (letter, digit, underscore). Equivalent to [a-zA-Z0-9_].

```bash
\w+ matches hello_world, var123
```

`\W` — Match any non-word character.

```bash
\W matches !, @, spaces
```

`\s` — Match any whitespace (space, tab, newline).

```bash
hello\sworld matches 'hello world'
```

`\S` — Match any non-whitespace character.

```bash
\S+ matches any non-space word
```

`[abc]` — Match any one character in the set.

```bash
[aeiou] matches any vowel
```

`[^abc]` — Match any character NOT in the set.

```bash
[^0-9] matches any non-digit
```

`[a-z]` — Match any character in the range.

```bash
[a-zA-Z] matches any letter
```

## Quantifiers

`*` — Match 0 or more of the preceding element (greedy).

```bash
ab*c matches ac, abc, abbc, abbbc
```

`+` — Match 1 or more of the preceding element (greedy).

```bash
ab+c matches abc, abbc but NOT ac
```

`?` — Match 0 or 1 of the preceding element (optional).

```bash
colou?r matches color and colour
```

`{n}` — Match exactly n occurrences.

```bash
\d{4} matches exactly 4 digits: 2026
```

`{n,}` — Match n or more occurrences.

```bash
\d{2,} matches 2 or more digits
```

`{n,m}` — Match between n and m occurrences.

```bash
\d{2,4} matches 12, 123, or 1234
```

`*? +? ??` — Lazy (non-greedy) versions: match as few as possible.

```bash
<.*?> matches <b> in '<b>text</b>' (not the whole string)
```

## Anchors & Boundaries

`^` — Match the start of a line/string.

```bash
^Hello matches 'Hello World' but not 'Say Hello'
```

`$` — Match the end of a line/string.

```bash
world$ matches 'hello world' but not 'world hello'
```

`\b` — Match a word boundary (between \w and \W).

```bash
\bcat\b matches 'cat' but not 'category'
```

`\B` — Match a non-word boundary.

```bash
\Bcat\B matches 'concatenate' but not 'cat'
```

`^...$` — Match the entire string (combined anchors).

```bash
^\d{5}$ matches only exactly 5 digits
```

## Groups & Alternation

`(abc)` — Capturing group: group and capture for back-references.

```bash
(\d{3})-(\d{4}) captures area code and number separately
```

`(?:abc)` — Non-capturing group: group without capturing.

```bash
(?:https?://)? optionally matches http:// or https://
```

`a|b` — Alternation: match a OR b.

```bash
cat|dog matches 'cat' or 'dog'
```

`\1 \2` — Back-reference: match the same text as a previous group.

```bash
(\w+)\s\1 matches repeated words like 'the the'
```

`(?<name>abc)` — Named capturing group.

```bash
(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})
```

## Lookahead & Lookbehind

`(?=abc)` — Positive lookahead: match if followed by abc (without consuming).

```bash
\d+(?= USD) matches '100' in '100 USD'
```

`(?!abc)` — Negative lookahead: match if NOT followed by abc.

```bash
\d+(?! USD) matches '100' in '100 EUR' but not '100 USD'
```

`(?<=abc)` — Positive lookbehind: match if preceded by abc.

```bash
(?<=\$)\d+ matches '50' in '$50'
```

`(?<!abc)` — Negative lookbehind: match if NOT preceded by abc.

```bash
(?<!\$)\d+ matches '50' in 'EUR 50' but not '$50'
```

## Flags / Modifiers

`i` — Case-insensitive matching.

```bash
/hello/i matches Hello, HELLO, hello
```

`g` — Global: find all matches, not just the first.

```bash
/\d+/g finds all numbers in a string
```

`m` — Multiline: ^ and $ match start/end of each line.

```bash
/^start/m matches 'start' at beginning of any line
```

`s` — Dotall: make . also match newline characters.

```bash
/start.*end/s matches across multiple lines
```

## Common Patterns

`^[\w.-]+@[\w.-]+\.[a-zA-Z]{2,}$` — Email address (basic validation).

```bash
user@example.com, name.tag@sub.domain.org
```

`^https?://[\w.-]+(?:/[\w./?%&=-]*)?$` — URL (HTTP/HTTPS).

```bash
https://example.com/path?q=search
```

`^\d{1,3}(?:\.\d{1,3}){3}$` — IPv4 address (basic format check).

```bash
192.168.1.1, 10.0.0.255
```

`^#?([0-9a-fA-F]{3}|[0-9a-fA-F]{6})$` — Hex color code.

```bash
#fff, #1a2b3c, 00ff00
```

`^\d{4}-\d{2}-\d{2}$` — Date in ISO 8601 format (YYYY-MM-DD).

```bash
2026-03-19
```

`^(?=.*[A-Z])(?=.*[a-z])(?=.*\d).{8,}$` — Password: min 8 chars, uppercase, lowercase, digit.

```bash
Passw0rd, MyS3cret!
```

<!-- PROSE:outro -->
## Conclusion

Regular expressions are powerful, but they deserve a careful hand. Mind the dialect first: BRE, ERE and PCRE differ in which characters you have to escape and which extensions (such as lookbehind or named groups) are available at all – a pattern that runs in PHP may fail under `grep`. Also keep the difference between greedy and lazy quantifiers in mind: `.*` grabs as much as possible by default, which often matches more than intended – `.*?` fixes that. And beware complex, nested patterns with overlapping quantifiers: they can trigger "catastrophic backtracking" (ReDoS) and bring an engine to a near halt on certain inputs. Keep patterns as simple as you can and test them against real data.

## Further Reading

- [Wikipedia: Regular expression](https://en.wikipedia.org/wiki/Regular_expression) – introduction to the theory and syntax
- [regex101](https://regex101.com/) – interactive online tester with explanations
- [MDN: Regular expressions](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_expressions) – in-depth guide (JavaScript dialect)
<!-- PROSE:outro:end -->

## Related Commands

- [grep](https://www.jpkc.com/db/en/cheatsheets/files-text/grep/) – search text line by line using regular expressions
- [sed](https://www.jpkc.com/db/en/cheatsheets/files-text/sed/) – stream editor for search-and-replace with regex
- [awk](https://www.jpkc.com/db/en/cheatsheets/files-text/awk/) – pattern-driven text processing and reporting

