sort — Sort Lines of Text Files

Practical guide to sort: order lines alphabetically, numerically or by version — with keys, fields, locale control and tuning for large files.

sort orders the lines of a file or a data stream – alphabetically by default, but on demand also numerically, by version, by month name or by human-readable sizes such as 2K or 1G. With keys (-k) and a freely chosen field separator (-t) you sort by specific columns, for example in CSV files or logs. In pipelines, sort teamed up with uniq is the classic duo for frequency analysis. For huge files you can control the memory buffer, the temp directory and even parallel processing. This guide walks you through the options you actually reach for day to day.

Basic Usage

sort <file> — Sort lines alphabetically (default: ascending, case-sensitive).

sort names.txt

sort -r <file> — Sort in reverse (descending) order.

sort -r names.txt

sort -o <output> <file> — Write result to a file. Safe to use same file as input and output.

sort -o sorted.txt data.txt

sort -u <file> — Sort and remove duplicate lines (unique).

sort -u emails.txt

sort -c <file> — Check if a file is already sorted. Prints first unsorted line and exits with error.

sort -c data.txt

Sorting Modes

sort -n <file> — Sort numerically instead of alphabetically.

sort -n scores.txt

sort -h <file> — Sort by human-readable numbers (e.g. 2K, 1G, 3M).

du -sh * | sort -h

sort -V <file> — Sort version numbers naturally (e.g. 1.2 < 1.10).

sort -V versions.txt

sort -M <file> — Sort by month name (Jan < Feb < Mar ...).

sort -M months.txt

sort -R <file> — Sort in random order (shuffle lines).

sort -R playlist.txt

sort -g <file> — Sort by general numeric value. Supports scientific notation (e.g. 1.5e3).

sort -g measurements.txt

Key & Field Selection

sort -k <field> <file> — Sort by a specific field (1-based). Default separator is whitespace.

sort -k 2 data.txt

sort -k <start>,<end> <file> — Sort by field range from start to end (inclusive).

sort -k 2,2 data.txt

sort -t '<sep>' -k <field> <file> — Set field separator and sort by a specific field.

sort -t ',' -k 3 data.csv

sort -t ':' -k 3 -n <file> — Sort by numeric value of a specific field with custom delimiter.

sort -t ':' -k 3 -n /etc/passwd

sort -k <field>n <file> — Sort a specific field numerically. Modifier is appended to the key.

sort -k 2n scores.txt

sort -k <f1>,<f1> -k <f2>,<f2>n <file> — Sort by multiple keys. First key is primary, second is secondary.

sort -k 1,1 -k 2,2n students.txt

Case & Locale

sort -f <file> — Fold lower case to upper case (case-insensitive sorting).

sort -f mixed-case.txt

sort -d <file> — Dictionary order. Consider only blanks and alphanumeric characters.

sort -d words.txt

sort -i <file> — Ignore non-printable characters when sorting.

sort -i data.txt

LC_ALL=C sort <file> — Sort using byte values (C locale). Faster and consistent across systems.

LC_ALL=C sort large-file.txt

Whitespace & Stability

sort -b <file> — Ignore leading blanks when determining sort keys.

sort -b indented.txt

sort -s <file> — Stable sort. Preserve original order of lines with equal keys.

sort -s -k 1,1 data.txt

sort -z <file> — Use NUL as line delimiter instead of newline. Useful with find -print0.

find . -print0 | sort -z

Merging & Large Files

sort -m <file1> <file2> — Merge already sorted files without re-sorting.

sort -m sorted1.txt sorted2.txt

sort -S <size> <file> — Use specified amount of memory for sorting buffer.

sort -S 2G huge-file.txt

sort -T <dir> <file> — Use specified directory for temporary files instead of /tmp.

sort -T /data/tmp huge-file.txt

sort --parallel=<n> <file> — Run up to N sorts concurrently (GNU sort).

sort --parallel=4 huge-file.txt

Pipelines

<command> | sort — Sort the output of any command.

ls -1 | sort

<command> | sort -n — Sort command output numerically.

wc -l *.txt | sort -n

<command> | sort | uniq -c | sort -rn — Count occurrences and show most frequent first.

awk '{print $1}' access.log | sort | uniq -c | sort -rn

<command> | sort -u — Sort and deduplicate output in one step.

cat file1.txt file2.txt | sort -u

<command> | sort -t '<sep>' -k <field> -rn | head -n <count> — Get the top N entries sorted by a numeric field.

du -s */ | sort -rn | head -n 10

Common Patterns

sort -t ',' -k 2,2 -k 3,3n <file> — Sort CSV by column 2 (alphabetic), then column 3 (numeric).

sort -t ',' -k 2,2 -k 3,3n employees.csv

sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n <file> — Sort IP addresses numerically by each octet.

sort -t '.' -k 1,1n -k 2,2n -k 3,3n -k 4,4n ips.txt

sort -rn <file> | head -n 1 — Find the largest numeric value in a file.

sort -rn numbers.txt | head -n 1

sort <file1> <file2> | uniq -d — Find common lines between two files.

sort users1.txt users2.txt | uniq -d

sort <file1> <file2> | uniq -u — Find lines unique to either file (not in both).

sort old-list.txt new-list.txt | uniq -u

tail -n +2 <file> | sort -t ',' -k <field>n — Sort a CSV file by a numeric column, skipping the header row.

tail -n +2 sales.csv | sort -t ',' -k 3n

Conclusion

sort is one of the most versatile text tools in the shell and combines seamlessly with uniq, head and cut. Mind the locale: in a UTF-8 environment sort behaves differently than with LC_ALL=C, which can produce surprising orderings and noticeable speed differences – for reproducible, fast results prefix LC_ALL=C. With -o you can safely write back to the same file; a plain > would instead truncate the input before sort reads it. And remember: a downstream uniq only removes immediately adjacent duplicates, which is why a sort almost always belongs in front of it.

Further Reading

  • uniq – filter and count adjacent duplicate lines
  • wc – count lines, words and bytes
  • cut – extract columns and fields from lines