uniq — Filter and Count Duplicate Lines
Practical guide to uniq: remove adjacent duplicate lines, count occurrences and find duplicates or unique entries — usually paired with sort.
uniq filters duplicate lines out of text – but only when they sit directly next to each other. This is the single most common pitfall: uniq compares adjacent lines only, so the input almost always has to pass through sort first. Once that is done, uniq counts occurrences (-c), shows only duplicates (-d) or only one-offs (-u), and can skip fields and characters before comparing. In pipelines with sort, uniq is the go-to tool for frequency analysis of logs and lists. This guide walks you through the most important options.
Basic Usage
sort <file> | uniq — Remove duplicate lines (input must be sorted first).
sort names.txt | uniquniq <file> — Remove adjacent duplicate lines only (without sorting).
uniq log.txtsort <file> | uniq > <output> — Deduplicate and save to a new file.
sort emails.txt | uniq > unique-emails.txtsort -u <file> — Sort and deduplicate in one command (sort has a built-in unique flag).
sort -u names.txtCounting & Statistics
sort <file> | uniq -c — Prefix each line with the number of occurrences.
sort access.log | uniq -csort <file> | uniq -c | sort -rn — Count occurrences and sort by frequency (most common first).
cut -f 1 -d ' ' access.log | sort | uniq -c | sort -rn | head -20sort <file> | uniq -c | sort -n — Count occurrences and sort by frequency (least common first).
sort errors.log | uniq -c | sort -nFiltering Duplicates
sort <file> | uniq -d — Show only lines that appear more than once (duplicates only).
sort emails.txt | uniq -dsort <file> | uniq -D — Show all duplicate lines (not just one per group).
sort data.txt | uniq -Dsort <file> | uniq -u — Show only lines that appear exactly once (unique lines only).
sort entries.txt | uniq -usort <file> | uniq -cd — Show duplicates with their count.
sort urls.txt | uniq -cd | sort -rnField & Character Skipping
uniq -f <n> — Skip the first n fields when comparing lines (fields are whitespace-separated).
sort -k2 data.txt | uniq -f 1uniq -s <n> — Skip the first n characters when comparing lines.
uniq -s 10 timestamped.loguniq -w <n> — Compare only the first n characters of each line.
sort file.txt | uniq -w 20uniq -f <n> -s <m> -w <o> — Combine: skip n fields, then skip m chars, compare only o chars.
uniq -f 2 -w 15 data.txtCase Sensitivity
sort -f <file> | uniq -i — Case-insensitive deduplication (sort must also be case-insensitive).
sort -f names.txt | uniq -isort -f <file> | uniq -ic — Case-insensitive count of occurrences.
sort -f words.txt | uniq -ic | sort -rnsort -f <file> | uniq -id — Show case-insensitive duplicates.
sort -f emails.txt | uniq -idCommon Patterns
awk '{print $1}' <log> | sort | uniq -c | sort -rn | head -10 — Top 10 most frequent IPs in an access log.
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10cut -f <n> -d '<d>' <file> | sort | uniq -c | sort -rn — Frequency analysis of a specific column.
cut -f 3 -d ',' sales.csv | sort | uniq -c | sort -rnsort <file1> <file2> | uniq -d — Find lines that appear in both files (intersection).
sort file1.txt file2.txt | uniq -dsort <file1> <file2> | uniq -u — Find lines unique to either file (symmetric difference).
sort old-list.txt new-list.txt | uniq -usort <file1> <file1> <file2> | uniq -u — Find lines only in file2 but not in file1.
sort old.txt old.txt new.txt | uniq -uhistory | awk '{print $2}' | sort | uniq -c | sort -rn | head -10 — Show the 10 most frequently used shell commands.
history | awk '{print $2}' | sort | uniq -c | sort -rn | head -10wc -l <file> && sort -u <file> | wc -l — Compare total lines vs unique lines to gauge duplication.
echo "Total: $(wc -l < data.txt) Unique: $(sort -u data.txt | wc -l)" Conclusion
Remember the golden rule: uniq without a preceding sort only removes immediately adjacent duplicates and misses anything further apart – so you almost always want sort | uniq or simply sort -u. For plain deduplication sort -u is shorter; but as soon as you need to count (-c), show only duplicates (-d) or only unique lines (-u), there is no way around uniq. When comparing with -f, -s or -w, the sort key (sort) and the comparison key (uniq) must line up, otherwise supposed duplicates go undetected.
Further Reading
- GNU Coreutils manual: uniq – complete reference for every option
- man7.org: uniq(1) – the Linux manual page