awk — Field-Based Text Processing and Reporting
Process columnar text with awk — extract fields, filter by pattern, aggregate with arrays and format output. A compact language for data and logs.
awk is a small but complete programming language built for column-based text. It reads input line by line, splits each line into fields ($1, $2, … $NF) and runs your program against every record – making it the natural choice for log files, CSV/TSV data and tabular command output. Where grep finds lines and sed rewrites them, awk lets you select by field, compute sums and averages, build frequency tables with associative arrays and print neatly formatted reports. This guide covers the essentials: fields and separators, pattern matching, conditions, BEGIN/END blocks, arithmetic, string functions and the recipes you reach for again and again.
Basic Usage
awk '{print}' <file> — Print every line of the file (like cat).
awk '{print}' data.txtawk '{print $0}' <file> — Print each entire line. $0 represents the whole line.
awk '{print $0}' data.txtawk '{print $1}' <file> — Print the first field (column) of each line. Fields are split by whitespace.
awk '{print $1}' access.logawk '{print $1, $3}' <file> — Print specific fields separated by a space (output field separator).
awk '{print $1, $3}' data.txtawk '{print $NF}' <file> — Print the last field of each line. NF is the number of fields.
awk '{print $NF}' access.logawk '{print $(NF-1)}' <file> — Print the second-to-last field of each line.
awk '{print $(NF-1)}' data.txtawk 'NR==<n>' <file> — Print only a specific line number. NR is the current record (line) number.
awk 'NR==5' data.txtawk 'NR>=<n> && NR<=<m>' <file> — Print a range of lines.
awk 'NR>=10 && NR<=20' data.txtField Separator
awk -F'<sep>' '{print $1}' <file> — Set a custom input field separator.
awk -F',' '{print $1}' data.csvawk -F':' '{print $1, $3}' <file> — Use colon as field separator (useful for /etc/passwd).
awk -F':' '{print $1, $3}' /etc/passwdawk -F'\t' '{print $2}' <file> — Use tab as field separator for TSV files.
awk -F'\t' '{print $2}' data.tsvawk -F'[,;:]' '{print $1}' <file> — Use a regex as field separator (match any of the characters).
awk -F'[,;:]' '{print $1}' mixed.txtawk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' <file> — Set input field separator (FS) and output field separator (OFS).
awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' data.csvawk -v OFS=',' '{$1=$1; print}' <file> — Change the output separator. The $1=$1 trick forces awk to rebuild the line.
awk -v OFS=',' '{$1=$1; print}' whitespace.txtPattern Matching
awk '/<pattern>/' <file> — Print lines matching a regex pattern (like grep).
awk '/ERROR/' log.txtawk '!/<pattern>/' <file> — Print lines NOT matching a pattern (like grep -v).
awk '!/^#/' config.iniawk '$<n> ~ /<pattern>/' <file> — Match a pattern against a specific field.
awk '$3 ~ /error/' log.txtawk '$<n> !~ /<pattern>/' <file> — Print lines where a specific field does NOT match a pattern.
awk '$1 !~ /^192\.168/' access.logawk '/<start>/,/<end>/' <file> — Print lines between two patterns (inclusive range).
awk '/BEGIN/,/END/' config.txtawk '/pattern/ {print $2}' <file> — Print a specific field only from lines matching a pattern.
awk '/GET/ {print $7}' access.logConditions & Comparisons
awk '$<n> == "<value>"' <file> — Print lines where a field equals a specific string.
awk '$3 == "ERROR"' log.txtawk '$<n> != "<value>"' <file> — Print lines where a field does not equal a value.
awk '$1 != "localhost"' hosts.txtawk '$<n> > <value>' <file> — Print lines where a numeric field exceeds a threshold.
awk '$5 > 1000' data.txtawk '$<n> > <value> && $<m> < <value>' <file> — Combine multiple conditions with && (AND).
awk '$3 > 100 && $4 < 500' sales.txtawk '$<n> > <value> || $<m> == "<value>"' <file> — Combine conditions with || (OR).
awk '$5 > 1000 || $3 == "CRITICAL"' log.txtawk 'NF > 0' <file> — Print only non-empty lines (lines with at least one field).
awk 'NF > 0' messy.txtawk 'NF == <n>' <file> — Print lines with exactly N fields.
awk 'NF == 4' data.txtawk 'length > <n>' <file> — Print lines longer than N characters.
awk 'length > 80' source.pyBEGIN & END Blocks
awk 'BEGIN {print "Header"} {print} END {print "Footer"}' <file> — Execute code before processing (BEGIN) and after all lines (END).
awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "---"}' results.txtawk 'BEGIN {<init>} {<body>} END {<summary>}' <file> — Classic awk structure: initialize, process each line, then summarize.
awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txtawk 'END {print NR}' <file> — Print the total number of lines in a file.
awk 'END {print NR}' data.txtawk 'END {print NR, "lines,", NF, "fields in last line"}' <file> — Print summary statistics after processing all lines.
awk 'END {print NR, "lines processed"}' access.logArithmetic & Aggregation
awk '{sum += $<n>} END {print sum}' <file> — Sum all values in a specific column.
awk '{sum += $3} END {print sum}' sales.txtawk '{sum += $<n>} END {print sum/NR}' <file> — Calculate the average of a column.
awk '{sum += $2} END {print "Average:", sum/NR}' scores.txtawk 'BEGIN {max=0} {if ($<n> > max) max=$<n>} END {print max}' <file> — Find the maximum value in a column.
awk 'BEGIN {max=0} {if ($3 > max) max=$3} END {print "Max:", max}' data.txtawk 'NR==1 || $<n> < min {min=$<n>} END {print min}' <file> — Find the minimum value in a column.
awk 'NR==1 || $2 < min {min=$2} END {print "Min:", min}' data.txtawk '{count[$<n>]++} END {for (k in count) print k, count[k]}' <file> — Count occurrences of each unique value in a column (frequency table).
awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.logawk '{sum[$1] += $2} END {for (k in sum) print k, sum[k]}' <file> — Sum values grouped by a key column (like SQL GROUP BY).
awk '{sum[$1] += $3} END {for (dept in sum) print dept, sum[dept]}' expenses.txtString Functions
awk '{print length($0)}' <file> — Print the length of each line.
awk '{print length($0), $0}' data.txtawk '{print toupper($0)}' <file> — Convert each line to uppercase.
awk '{print toupper($0)}' input.txtawk '{print tolower($0)}' <file> — Convert each line to lowercase.
awk '{print tolower($0)}' input.txtawk '{gsub(/<pattern>/, "<replacement>"); print}' <file> — Global substitution on each line (like sed s///g).
awk '{gsub(/foo/, "bar"); print}' input.txtawk '{sub(/<pattern>/, "<replacement>"); print}' <file> — Replace only the first occurrence on each line.
awk '{sub(/^[ \t]+/, ""); print}' messy.txtawk '{print substr($0, <start>, <length>)}' <file> — Extract a substring from each line (1-indexed start position).
awk '{print substr($0, 1, 10)}' data.txtawk '{n=split($0, arr, "<sep>"); print arr[1]}' <file> — Split a string into an array by a separator. Returns the number of elements.
awk '{n=split($0, parts, ","); print parts[2]}' data.csvawk 'match($0, /<pattern>/) {print substr($0, RSTART, RLENGTH)}' <file> — Extract the matched portion of a regex. Sets RSTART and RLENGTH.
awk 'match($0, /[0-9]+\.[0-9]+/) {print substr($0, RSTART, RLENGTH)}' data.txtFormatted Output
awk '{printf "%-20s %s\n", $1, $2}' <file> — Print with formatted, aligned columns using printf.
awk '{printf "%-20s %10s\n", $1, $2}' data.txtawk '{printf "%05d %s\n", NR, $0}' <file> — Print line numbers zero-padded to 5 digits.
awk '{printf "%05d %s\n", NR, $0}' script.shawk '{printf "%.2f\n", $1}' <file> — Format numbers with 2 decimal places.
awk '{printf "$%.2f\n", $3}' prices.txtawk '{printf "%s,%s,%s\n", $1, $2, $3}' <file> — Convert whitespace-separated data to CSV.
awk '{printf "%s,%s,%s\n", $1, $2, $3}' data.txtawk -v OFS='\t' '{$1=$1; print}' <file> — Convert any whitespace separation to tab-separated output.
awk -v OFS='\t' '{$1=$1; print}' data.txtVariables & Assignment
awk -v <var>=<value> '{print <var>, $1}' <file> — Pass an external variable into the awk program.
awk -v threshold=100 '$3 > threshold {print}' data.txtawk -v var="$SHELL_VAR" '{print var, $0}' <file> — Pass a shell variable into awk.
awk -v user="$USER" '{print user, $0}' log.txtawk '{$<n> = "<value>"; print}' <file> — Replace a specific field value and print the modified line.
awk '{$2 = "REDACTED"; print}' users.txtawk '{$(NF+1) = "<value>"; print}' <file> — Append a new field to the end of each line.
awk -v OFS=',' '{$(NF+1) = "new_col"; print}' data.csvArrays & Deduplication
awk '!seen[$0]++' <file> — Remove duplicate lines while preserving order (like sort -u but without sorting).
awk '!seen[$0]++' list.txtawk '!seen[$<n>]++' <file> — Remove duplicates based on a specific field.
awk '!seen[$1]++' data.txtawk '{a[$1]+=$2} END {for (k in a) print k, a[k]}' <file> — Aggregate values by key using an associative array.
awk -F',' '{a[$1]+=$2} END {for (k in a) print k, a[k]}' sales.csvawk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' <file> — Find and print only duplicate entries.
awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' access.logawk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' <file> — Reverse the order of lines (like tac).
awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' data.txtMulti-File & Pipelines
<command> | awk '{print $<n>}' — Extract a specific column from command output.
ps aux | awk '{print $1, $11}'<command> | awk 'NR>1 {print $<n>}' — Extract a column from command output, skipping the header line.
df -h | awk 'NR>1 {print $1, $5}'awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' <file1> <file2> — Process multiple files with a separator. FNR resets per file, FILENAME holds the name.
awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' *.logawk '{print > "output_" $<n> ".txt"}' <file> — Split a file into multiple files based on a field value.
awk -F',' '{print > "dept_" $1 ".csv"}' employees.csvawk '{print | "sort"}' <file> — Pipe awk output to an external command.
awk '{print $1}' access.log | sort | uniq -c | sort -rnCommon Recipes
awk -F',' 'NR==1 {for (i=1;i<=NF;i++) header[i]=$i} NR>1 {for (i=1;i<=NF;i++) print header[i]": "$i; print ""}' <file> — Display CSV data vertically with column headers as labels.
awk -F',' 'NR==1 {for (i=1;i<=NF;i++) h[i]=$i} NR>1 {for (i=1;i<=NF;i++) print h[i]": "$i; print ""}' users.csvawk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' <file> — Reverse the field order on each line.
awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' data.txtawk 'NR==FNR {a[$1]; next} $1 in a' <file1> <file2> — Print lines from file2 where the first field exists in file1 (like a join/lookup).
awk 'NR==FNR {a[$1]; next} $1 in a' ids.txt data.txtawk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print sum}' <file> — Sum all fields on each row (row totals).
awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print $0, sum}' matrix.txtawk 'NR%<n>==0' <file> — Print every Nth line.
awk 'NR%5==0' data.txtawk '{$1=""; print substr($0,2)}' <file> — Remove the first field and print the rest of the line.
awk '{$1=""; print substr($0,2)}' data.txt Conclusion
awk shines the moment your data has columns: awk '{print $2}', a quick sum += $3 in an END block, or a one-line frequency table with count[$1]++ often replaces a whole script. Start with -F to set the field separator, lean on NR/NF for line and field counts, and remember that awk reads input without modifying your files – its output goes to stdout, so redirect deliberately when you write results back. A common gotcha: awk regex literals have no /i flag, so for case-insensitive matching use gawk's IGNORECASE=1 or tolower(). For anything beyond a few lines, GNU awk (gawk) is the most capable and widely available implementation.
Further Reading
- GNU awk (gawk) manual – the comprehensive reference for the language
- The GNU Awk User's Guide – One-liners – practical, ready-to-use awk snippets