split — Break Files into Smaller Pieces

Practical guide to split — break large files into pieces by line count, byte size or number of chunks. Useful for processing, transfer and parallel jobs.

split breaks a file into several smaller pieces – by line count (-l), byte size (-b) or a fixed number of chunks (-n). That is handy for making huge log files manageable, chopping large archives for transfer, or splitting data for parallel processing. By default the pieces get an alphabetic suffix (xaa, xab, …), which you can tailor with a custom prefix as well as numeric suffixes and extensions. You reassemble the pieces with a simple cat, as long as you preserve their order.

Split by Lines

split -l <n> <file> — Split file into pieces of n lines each.

split -l 1000 largefile.txt

split -l <n> <file> <prefix> — Split with a custom output prefix.

split -l 500 data.csv part_

split -l 1 <file> — Split each line into its own file.

split -l 1 urls.txt url_

Split by Size

split -b <size> <file> — Split into pieces of specified byte size (K, M, G suffixes).

split -b 10M largefile.tar.gz chunk_

split -C <size> <file> — Split at line boundaries, keeping pieces under the size limit.

split -C 1M logfile.txt log_

split -b 100K <file> <prefix> — Split into 100KB chunks with a custom prefix.

split -b 100K backup.sql sql_

Split by Count

split -n <n> <file> — Split into exactly n files of roughly equal size.

split -n 5 largefile.txt part_

split -n l/<n> <file> — Split into n files without breaking lines.

split -n l/4 data.csv quarter_

split -n r/<n> <file> — Distribute lines round-robin across n files.

split -n r/3 tasks.txt worker_

Output Options

split -d <file> — Use numeric suffixes (00, 01, 02...) instead of alphabetic (aa, ab, ac...).

split -d -l 1000 data.csv part_

split -a <n> <file> — Set the suffix length (default is 2).

split -a 4 -l 100 huge.txt piece_

split --additional-suffix='.txt' <file> — Add a file extension to output files.

split -l 500 --additional-suffix='.csv' data.csv part_

split --verbose <file> — Print a message for each output file created.

split --verbose -l 1000 data.txt chunk_

split --filter='<cmd>' <file> — Pipe each piece through a command instead of writing to files.

split -l 1000 --filter='gzip > $FILE.gz' data.txt part_

Common Patterns

split -b 25M file.tar.gz part_ && cat part_* > file.tar.gz — Split a large file for transfer, then reassemble.

split -b 25M backup.tar.gz upload_ && cat upload_* > backup_restored.tar.gz

split -l 1000 data.csv batch_ && for f in batch_*; do process "$f"; done — Split data and process each batch.

split -l 1000 users.csv batch_ && for f in batch_*; do ./import.sh "$f"; done

wc -l <file> && split -n l/<n> <file> — Check line count, then split evenly for parallel processing.

wc -l data.csv && split -n l/$(nproc) data.csv worker_

Conclusion

split is the go-to when a file is too big for a tool, an upload or a chunk of memory. Remember the difference between the modes: -b cuts strictly by bytes (and may break mid-line), -C and -n l/… respect line boundaries, and -n r/… distributes round-robin. Reassembly comes down to order alone – cat prefix_* works because the shell sorts the alphabetic suffixes correctly; so allow enough suffix digits (-a) for the sort to hold even with many pieces. After reassembling large binary files, it is wise to verify the checksum.

Further Reading

  • cut – extract fields, characters or byte ranges from lines
  • head – show the first lines or bytes of a file
  • tail – show the last lines of a file or follow it live