uniq — Filter and Count Duplicate Lines

Practical guide to uniq: remove adjacent duplicate lines, count occurrences and find duplicates or unique entries — usually paired with sort.

uniq filters duplicate lines out of text – but only when they sit directly next to each other. This is the single most common pitfall: uniq compares adjacent lines only, so the input almost always has to pass through sort first. Once that is done, uniq counts occurrences (-c), shows only duplicates (-d) or only one-offs (-u), and can skip fields and characters before comparing. In pipelines with sort, uniq is the go-to tool for frequency analysis of logs and lists. This guide walks you through the most important options.

Basic Usage

sort <file> | uniq — Remove duplicate lines (input must be sorted first).

sort names.txt | uniq

uniq <file> — Remove adjacent duplicate lines only (without sorting).

uniq log.txt

sort <file> | uniq > <output> — Deduplicate and save to a new file.

sort emails.txt | uniq > unique-emails.txt

sort -u <file> — Sort and deduplicate in one command (sort has a built-in unique flag).

sort -u names.txt

Counting & Statistics

sort <file> | uniq -c — Prefix each line with the number of occurrences.

sort access.log | uniq -c

sort <file> | uniq -c | sort -rn — Count occurrences and sort by frequency (most common first).

cut -f 1 -d ' ' access.log | sort | uniq -c | sort -rn | head -20

sort <file> | uniq -c | sort -n — Count occurrences and sort by frequency (least common first).

sort errors.log | uniq -c | sort -n

Filtering Duplicates

sort <file> | uniq -d — Show only lines that appear more than once (duplicates only).

sort emails.txt | uniq -d

sort <file> | uniq -D — Show all duplicate lines (not just one per group).

sort data.txt | uniq -D

sort <file> | uniq -u — Show only lines that appear exactly once (unique lines only).

sort entries.txt | uniq -u

sort <file> | uniq -cd — Show duplicates with their count.

sort urls.txt | uniq -cd | sort -rn

Field & Character Skipping

uniq -f <n> — Skip the first n fields when comparing lines (fields are whitespace-separated).

sort -k2 data.txt | uniq -f 1

uniq -s <n> — Skip the first n characters when comparing lines.

uniq -s 10 timestamped.log

uniq -w <n> — Compare only the first n characters of each line.

sort file.txt | uniq -w 20

uniq -f <n> -s <m> -w <o> — Combine: skip n fields, then skip m chars, compare only o chars.

uniq -f 2 -w 15 data.txt

Case Sensitivity

sort -f <file> | uniq -i — Case-insensitive deduplication (sort must also be case-insensitive).

sort -f names.txt | uniq -i

sort -f <file> | uniq -ic — Case-insensitive count of occurrences.

sort -f words.txt | uniq -ic | sort -rn

sort -f <file> | uniq -id — Show case-insensitive duplicates.

sort -f emails.txt | uniq -id

Common Patterns

awk '{print $1}' <log> | sort | uniq -c | sort -rn | head -10 — Top 10 most frequent IPs in an access log.

awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

cut -f <n> -d '<d>' <file> | sort | uniq -c | sort -rn — Frequency analysis of a specific column.

cut -f 3 -d ',' sales.csv | sort | uniq -c | sort -rn

sort <file1> <file2> | uniq -d — Find lines that appear in both files (intersection).

sort file1.txt file2.txt | uniq -d

sort <file1> <file2> | uniq -u — Find lines unique to either file (symmetric difference).

sort old-list.txt new-list.txt | uniq -u

sort <file1> <file1> <file2> | uniq -u — Find lines only in file2 but not in file1.

sort old.txt old.txt new.txt | uniq -u

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10 — Show the 10 most frequently used shell commands.

history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10

wc -l <file> && sort -u <file> | wc -l — Compare total lines vs unique lines to gauge duplication.

echo "Total: $(wc -l < data.txt)  Unique: $(sort -u data.txt | wc -l)"

Conclusion

Remember the golden rule: uniq without a preceding sort only removes immediately adjacent duplicates and misses anything further apart – so you almost always want sort | uniq or simply sort -u. For plain deduplication sort -u is shorter; but as soon as you need to count (-c), show only duplicates (-d) or only unique lines (-u), there is no way around uniq. When comparing with -f, -s or -w, the sort key (sort) and the comparison key (uniq) must line up, otherwise supposed duplicates go undetected.

Further Reading

  • sort – sort lines (a prerequisite for uniq)
  • wc – count lines, words and bytes
  • cut – extract columns and fields from lines