# uniq — Filter and Count Duplicate Lines

> Practical guide to uniq: remove adjacent duplicate lines, count occurrences and find duplicates or unique entries — usually paired with sort.

Source: https://www.jpkc.com/db/en/cheatsheets/files-text/uniq/

<!-- PROSE:intro -->
uniq filters duplicate lines out of text – but only when they sit directly next to each other. This is the single most common pitfall: uniq compares adjacent lines only, so the input almost always has to pass through `sort` first. Once that is done, uniq counts occurrences (`-c`), shows only duplicates (`-d`) or only one-offs (`-u`), and can skip fields and characters before comparing. In pipelines with `sort`, uniq is the go-to tool for frequency analysis of logs and lists. This guide walks you through the most important options.
<!-- PROSE:intro:end -->

## Basic Usage

`sort <file> | uniq` — Remove duplicate lines (input must be sorted first).

```bash
sort names.txt | uniq
```

`uniq <file>` — Remove adjacent duplicate lines only (without sorting).

```bash
uniq log.txt
```

`sort <file> | uniq > <output>` — Deduplicate and save to a new file.

```bash
sort emails.txt | uniq > unique-emails.txt
```

`sort -u <file>` — Sort and deduplicate in one command (sort has a built-in unique flag).

```bash
sort -u names.txt
```

## Counting & Statistics

`sort <file> | uniq -c` — Prefix each line with the number of occurrences.

```bash
sort access.log | uniq -c
```

`sort <file> | uniq -c | sort -rn` — Count occurrences and sort by frequency (most common first).

```bash
cut -f 1 -d ' ' access.log | sort | uniq -c | sort -rn | head -20
```

`sort <file> | uniq -c | sort -n` — Count occurrences and sort by frequency (least common first).

```bash
sort errors.log | uniq -c | sort -n
```

## Filtering Duplicates

`sort <file> | uniq -d` — Show only lines that appear more than once (duplicates only).

```bash
sort emails.txt | uniq -d
```

`sort <file> | uniq -D` — Show all duplicate lines (not just one per group).

```bash
sort data.txt | uniq -D
```

`sort <file> | uniq -u` — Show only lines that appear exactly once (unique lines only).

```bash
sort entries.txt | uniq -u
```

`sort <file> | uniq -cd` — Show duplicates with their count.

```bash
sort urls.txt | uniq -cd | sort -rn
```

## Field & Character Skipping

`uniq -f <n>` — Skip the first n fields when comparing lines (fields are whitespace-separated).

```bash
sort -k2 data.txt | uniq -f 1
```

`uniq -s <n>` — Skip the first n characters when comparing lines.

```bash
uniq -s 10 timestamped.log
```

`uniq -w <n>` — Compare only the first n characters of each line.

```bash
sort file.txt | uniq -w 20
```

`uniq -f <n> -s <m> -w <o>` — Combine: skip n fields, then skip m chars, compare only o chars.

```bash
uniq -f 2 -w 15 data.txt
```

## Case Sensitivity

`sort -f <file> | uniq -i` — Case-insensitive deduplication (sort must also be case-insensitive).

```bash
sort -f names.txt | uniq -i
```

`sort -f <file> | uniq -ic` — Case-insensitive count of occurrences.

```bash
sort -f words.txt | uniq -ic | sort -rn
```

`sort -f <file> | uniq -id` — Show case-insensitive duplicates.

```bash
sort -f emails.txt | uniq -id
```

## Common Patterns

`awk '{print $1}' <log> | sort | uniq -c | sort -rn | head -10` — Top 10 most frequent IPs in an access log.

```bash
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
```

`cut -f <n> -d '<d>' <file> | sort | uniq -c | sort -rn` — Frequency analysis of a specific column.

```bash
cut -f 3 -d ',' sales.csv | sort | uniq -c | sort -rn
```

`sort <file1> <file2> | uniq -d` — Find lines that appear in both files (intersection).

```bash
sort file1.txt file2.txt | uniq -d
```

`sort <file1> <file2> | uniq -u` — Find lines unique to either file (symmetric difference).

```bash
sort old-list.txt new-list.txt | uniq -u
```

`sort <file1> <file1> <file2> | uniq -u` — Find lines only in file2 but not in file1.

```bash
sort old.txt old.txt new.txt | uniq -u
```

`history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10` — Show the 10 most frequently used shell commands.

```bash
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10
```

`wc -l <file> && sort -u <file> | wc -l` — Compare total lines vs unique lines to gauge duplication.

```bash
echo "Total: $(wc -l < data.txt)  Unique: $(sort -u data.txt | wc -l)"
```

<!-- PROSE:outro -->
## Conclusion

Remember the golden rule: uniq without a preceding `sort` only removes immediately adjacent duplicates and misses anything further apart – so you almost always want `sort | uniq` or simply `sort -u`. For plain deduplication `sort -u` is shorter; but as soon as you need to count (`-c`), show only duplicates (`-d`) or only unique lines (`-u`), there is no way around uniq. When comparing with `-f`, `-s` or `-w`, the sort key (`sort`) and the comparison key (uniq) must line up, otherwise supposed duplicates go undetected.

## Further Reading

- [GNU Coreutils manual: uniq](https://www.gnu.org/software/coreutils/manual/html_node/uniq-invocation.html) – complete reference for every option
- [man7.org: uniq(1)](https://man7.org/linux/man-pages/man1/uniq.1.html) – the Linux manual page
<!-- PROSE:outro:end -->

## Related Commands

- [sort](https://www.jpkc.com/db/en/cheatsheets/files-text/sort/) – sort lines (a prerequisite for uniq)
- [wc](https://www.jpkc.com/db/en/cheatsheets/files-text/wc/) – count lines, words and bytes
- [cut](https://www.jpkc.com/db/en/cheatsheets/files-text/cut/) – extract columns and fields from lines

