awk — Field-Based Text Processing and Reporting

Process columnar text with awk — extract fields, filter by pattern, aggregate with arrays and format output. A compact language for data and logs.

awk is a small but complete programming language built for column-based text. It reads input line by line, splits each line into fields ($1, $2, … $NF) and runs your program against every record – making it the natural choice for log files, CSV/TSV data and tabular command output. Where grep finds lines and sed rewrites them, awk lets you select by field, compute sums and averages, build frequency tables with associative arrays and print neatly formatted reports. This guide covers the essentials: fields and separators, pattern matching, conditions, BEGIN/END blocks, arithmetic, string functions and the recipes you reach for again and again.

Basic Usage

awk '{print}' <file> — Print every line of the file (like cat).

awk '{print}' data.txt

awk '{print $0}' <file> — Print each entire line. $0 represents the whole line.

awk '{print $0}' data.txt

awk '{print $1}' <file> — Print the first field (column) of each line. Fields are split by whitespace.

awk '{print $1}' access.log

awk '{print $1, $3}' <file> — Print specific fields separated by a space (output field separator).

awk '{print $1, $3}' data.txt

awk '{print $NF}' <file> — Print the last field of each line. NF is the number of fields.

awk '{print $NF}' access.log

awk '{print $(NF-1)}' <file> — Print the second-to-last field of each line.

awk '{print $(NF-1)}' data.txt

awk 'NR==<n>' <file> — Print only a specific line number. NR is the current record (line) number.

awk 'NR==5' data.txt

awk 'NR>=<n> && NR<=<m>' <file> — Print a range of lines.

awk 'NR>=10 && NR<=20' data.txt

Field Separator

awk -F'<sep>' '{print $1}' <file> — Set a custom input field separator.

awk -F',' '{print $1}' data.csv

awk -F':' '{print $1, $3}' <file> — Use colon as field separator (useful for /etc/passwd).

awk -F':' '{print $1, $3}' /etc/passwd

awk -F'\t' '{print $2}' <file> — Use tab as field separator for TSV files.

awk -F'\t' '{print $2}' data.tsv

awk -F'[,;:]' '{print $1}' <file> — Use a regex as field separator (match any of the characters).

awk -F'[,;:]' '{print $1}' mixed.txt

awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' <file> — Set input field separator (FS) and output field separator (OFS).

awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' data.csv

awk -v OFS=',' '{$1=$1; print}' <file> — Change the output separator. The $1=$1 trick forces awk to rebuild the line.

awk -v OFS=',' '{$1=$1; print}' whitespace.txt

Pattern Matching

awk '/<pattern>/' <file> — Print lines matching a regex pattern (like grep).

awk '/ERROR/' log.txt

awk '!/<pattern>/' <file> — Print lines NOT matching a pattern (like grep -v).

awk '!/^#/' config.ini

awk '$<n> ~ /<pattern>/' <file> — Match a pattern against a specific field.

awk '$3 ~ /error/' log.txt

awk '$<n> !~ /<pattern>/' <file> — Print lines where a specific field does NOT match a pattern.

awk '$1 !~ /^192\.168/' access.log

awk '/<start>/,/<end>/' <file> — Print lines between two patterns (inclusive range).

awk '/BEGIN/,/END/' config.txt

awk '/pattern/ {print $2}' <file> — Print a specific field only from lines matching a pattern.

awk '/GET/ {print $7}' access.log

Conditions & Comparisons

awk '$<n> == "<value>"' <file> — Print lines where a field equals a specific string.

awk '$3 == "ERROR"' log.txt

awk '$<n> != "<value>"' <file> — Print lines where a field does not equal a value.

awk '$1 != "localhost"' hosts.txt

awk '$<n> > <value>' <file> — Print lines where a numeric field exceeds a threshold.

awk '$5 > 1000' data.txt

awk '$<n> > <value> && $<m> < <value>' <file> — Combine multiple conditions with && (AND).

awk '$3 > 100 && $4 < 500' sales.txt

awk '$<n> > <value> || $<m> == "<value>"' <file> — Combine conditions with || (OR).

awk '$5 > 1000 || $3 == "CRITICAL"' log.txt

awk 'NF > 0' <file> — Print only non-empty lines (lines with at least one field).

awk 'NF > 0' messy.txt

awk 'NF == <n>' <file> — Print lines with exactly N fields.

awk 'NF == 4' data.txt

awk 'length > <n>' <file> — Print lines longer than N characters.

awk 'length > 80' source.py

BEGIN & END Blocks

awk 'BEGIN {print "Header"} {print} END {print "Footer"}' <file> — Execute code before processing (BEGIN) and after all lines (END).

awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "---"}' results.txt

awk 'BEGIN {<init>} {<body>} END {<summary>}' <file> — Classic awk structure: initialize, process each line, then summarize.

awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt

awk 'END {print NR}' <file> — Print the total number of lines in a file.

awk 'END {print NR}' data.txt

awk 'END {print NR, "lines,", NF, "fields in last line"}' <file> — Print summary statistics after processing all lines.

awk 'END {print NR, "lines processed"}' access.log

Arithmetic & Aggregation

awk '{sum += $<n>} END {print sum}' <file> — Sum all values in a specific column.

awk '{sum += $3} END {print sum}' sales.txt

awk '{sum += $<n>} END {print sum/NR}' <file> — Calculate the average of a column.

awk '{sum += $2} END {print "Average:", sum/NR}' scores.txt

awk 'BEGIN {max=0} {if ($<n> > max) max=$<n>} END {print max}' <file> — Find the maximum value in a column.

awk 'BEGIN {max=0} {if ($3 > max) max=$3} END {print "Max:", max}' data.txt

awk 'NR==1 || $<n> < min {min=$<n>} END {print min}' <file> — Find the minimum value in a column.

awk 'NR==1 || $2 < min {min=$2} END {print "Min:", min}' data.txt

awk '{count[$<n>]++} END {for (k in count) print k, count[k]}' <file> — Count occurrences of each unique value in a column (frequency table).

awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log

awk '{sum[$1] += $2} END {for (k in sum) print k, sum[k]}' <file> — Sum values grouped by a key column (like SQL GROUP BY).

awk '{sum[$1] += $3} END {for (dept in sum) print dept, sum[dept]}' expenses.txt

String Functions

awk '{print length($0)}' <file> — Print the length of each line.

awk '{print length($0), $0}' data.txt

awk '{print toupper($0)}' <file> — Convert each line to uppercase.

awk '{print toupper($0)}' input.txt

awk '{print tolower($0)}' <file> — Convert each line to lowercase.

awk '{print tolower($0)}' input.txt

awk '{gsub(/<pattern>/, "<replacement>"); print}' <file> — Global substitution on each line (like sed s///g).

awk '{gsub(/foo/, "bar"); print}' input.txt

awk '{sub(/<pattern>/, "<replacement>"); print}' <file> — Replace only the first occurrence on each line.

awk '{sub(/^[ \t]+/, ""); print}' messy.txt

awk '{print substr($0, <start>, <length>)}' <file> — Extract a substring from each line (1-indexed start position).

awk '{print substr($0, 1, 10)}' data.txt

awk '{n=split($0, arr, "<sep>"); print arr[1]}' <file> — Split a string into an array by a separator. Returns the number of elements.

awk '{n=split($0, parts, ","); print parts[2]}' data.csv

awk 'match($0, /<pattern>/) {print substr($0, RSTART, RLENGTH)}' <file> — Extract the matched portion of a regex. Sets RSTART and RLENGTH.

awk 'match($0, /[0-9]+\.[0-9]+/) {print substr($0, RSTART, RLENGTH)}' data.txt

Formatted Output

awk '{printf "%-20s %s\n", $1, $2}' <file> — Print with formatted, aligned columns using printf.

awk '{printf "%-20s %10s\n", $1, $2}' data.txt

awk '{printf "%05d %s\n", NR, $0}' <file> — Print line numbers zero-padded to 5 digits.

awk '{printf "%05d %s\n", NR, $0}' script.sh

awk '{printf "%.2f\n", $1}' <file> — Format numbers with 2 decimal places.

awk '{printf "$%.2f\n", $3}' prices.txt

awk '{printf "%s,%s,%s\n", $1, $2, $3}' <file> — Convert whitespace-separated data to CSV.

awk '{printf "%s,%s,%s\n", $1, $2, $3}' data.txt

awk -v OFS='\t' '{$1=$1; print}' <file> — Convert any whitespace separation to tab-separated output.

awk -v OFS='\t' '{$1=$1; print}' data.txt

Variables & Assignment

awk -v <var>=<value> '{print <var>, $1}' <file> — Pass an external variable into the awk program.

awk -v threshold=100 '$3 > threshold {print}' data.txt

awk -v var="$SHELL_VAR" '{print var, $0}' <file> — Pass a shell variable into awk.

awk -v user="$USER" '{print user, $0}' log.txt

awk '{$<n> = "<value>"; print}' <file> — Replace a specific field value and print the modified line.

awk '{$2 = "REDACTED"; print}' users.txt

awk '{$(NF+1) = "<value>"; print}' <file> — Append a new field to the end of each line.

awk -v OFS=',' '{$(NF+1) = "new_col"; print}' data.csv

Arrays & Deduplication

awk '!seen[$0]++' <file> — Remove duplicate lines while preserving order (like sort -u but without sorting).

awk '!seen[$0]++' list.txt

awk '!seen[$<n>]++' <file> — Remove duplicates based on a specific field.

awk '!seen[$1]++' data.txt

awk '{a[$1]+=$2} END {for (k in a) print k, a[k]}' <file> — Aggregate values by key using an associative array.

awk -F',' '{a[$1]+=$2} END {for (k in a) print k, a[k]}' sales.csv

awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' <file> — Find and print only duplicate entries.

awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' access.log

awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' <file> — Reverse the order of lines (like tac).

awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' data.txt

Multi-File & Pipelines

<command> | awk '{print $<n>}' — Extract a specific column from command output.

ps aux | awk '{print $1, $11}'

<command> | awk 'NR>1 {print $<n>}' — Extract a column from command output, skipping the header line.

df -h | awk 'NR>1 {print $1, $5}'

awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' <file1> <file2> — Process multiple files with a separator. FNR resets per file, FILENAME holds the name.

awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' *.log

awk '{print > "output_" $<n> ".txt"}' <file> — Split a file into multiple files based on a field value.

awk -F',' '{print > "dept_" $1 ".csv"}' employees.csv

awk '{print | "sort"}' <file> — Pipe awk output to an external command.

awk '{print $1}' access.log | sort | uniq -c | sort -rn

Common Recipes

awk -F',' 'NR==1 {for (i=1;i<=NF;i++) header[i]=$i} NR>1 {for (i=1;i<=NF;i++) print header[i]": "$i; print ""}' <file> — Display CSV data vertically with column headers as labels.

awk -F',' 'NR==1 {for (i=1;i<=NF;i++) h[i]=$i} NR>1 {for (i=1;i<=NF;i++) print h[i]": "$i; print ""}' users.csv

awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' <file> — Reverse the field order on each line.

awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' data.txt

awk 'NR==FNR {a[$1]; next} $1 in a' <file1> <file2> — Print lines from file2 where the first field exists in file1 (like a join/lookup).

awk 'NR==FNR {a[$1]; next} $1 in a' ids.txt data.txt

awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print sum}' <file> — Sum all fields on each row (row totals).

awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print $0, sum}' matrix.txt

awk 'NR%<n>==0' <file> — Print every Nth line.

awk 'NR%5==0' data.txt

awk '{$1=""; print substr($0,2)}' <file> — Remove the first field and print the rest of the line.

awk '{$1=""; print substr($0,2)}' data.txt

Conclusion

awk shines the moment your data has columns: awk '{print $2}', a quick sum += $3 in an END block, or a one-line frequency table with count[$1]++ often replaces a whole script. Start with -F to set the field separator, lean on NR/NF for line and field counts, and remember that awk reads input without modifying your files – its output goes to stdout, so redirect deliberately when you write results back. A common gotcha: awk regex literals have no /i flag, so for case-insensitive matching use gawk's IGNORECASE=1 or tolower(). For anything beyond a few lines, GNU awk (gawk) is the most capable and widely available implementation.

Further Reading

  • sed – stream editor for substitution and line edits
  • grep – fast pattern search to pre-filter lines for awk
  • cut – lightweight field extraction for simple column jobs