# awk — Field-Based Text Processing and Reporting > Process columnar text with awk — extract fields, filter by pattern, aggregate with arrays and format output. A compact language for data and logs. Source: https://www.jpkc.com/db/en/cheatsheets/files-text/awk/ awk is a small but complete programming language built for column-based text. It reads input line by line, splits each line into fields (`$1`, `$2`, … `$NF`) and runs your program against every record – making it the natural choice for log files, CSV/TSV data and tabular command output. Where grep finds lines and sed rewrites them, awk lets you select by field, compute sums and averages, build frequency tables with associative arrays and print neatly formatted reports. This guide covers the essentials: fields and separators, pattern matching, conditions, `BEGIN`/`END` blocks, arithmetic, string functions and the recipes you reach for again and again. ## Basic Usage `awk '{print}' ` — Print every line of the file (like cat). ```bash awk '{print}' data.txt ``` `awk '{print $0}' ` — Print each entire line. $0 represents the whole line. ```bash awk '{print $0}' data.txt ``` `awk '{print $1}' ` — Print the first field (column) of each line. Fields are split by whitespace. ```bash awk '{print $1}' access.log ``` `awk '{print $1, $3}' ` — Print specific fields separated by a space (output field separator). ```bash awk '{print $1, $3}' data.txt ``` `awk '{print $NF}' ` — Print the last field of each line. NF is the number of fields. ```bash awk '{print $NF}' access.log ``` `awk '{print $(NF-1)}' ` — Print the second-to-last field of each line. ```bash awk '{print $(NF-1)}' data.txt ``` `awk 'NR==' ` — Print only a specific line number. NR is the current record (line) number. ```bash awk 'NR==5' data.txt ``` `awk 'NR>= && NR<=' ` — Print a range of lines. ```bash awk 'NR>=10 && NR<=20' data.txt ``` ## Field Separator `awk -F'' '{print $1}' ` — Set a custom input field separator. ```bash awk -F',' '{print $1}' data.csv ``` `awk -F':' '{print $1, $3}' ` — Use colon as field separator (useful for /etc/passwd). ```bash awk -F':' '{print $1, $3}' /etc/passwd ``` `awk -F'\t' '{print $2}' ` — Use tab as field separator for TSV files. ```bash awk -F'\t' '{print $2}' data.tsv ``` `awk -F'[,;:]' '{print $1}' ` — Use a regex as field separator (match any of the characters). ```bash awk -F'[,;:]' '{print $1}' mixed.txt ``` `awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' ` — Set input field separator (FS) and output field separator (OFS). ```bash awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' data.csv ``` `awk -v OFS=',' '{$1=$1; print}' ` — Change the output separator. The $1=$1 trick forces awk to rebuild the line. ```bash awk -v OFS=',' '{$1=$1; print}' whitespace.txt ``` ## Pattern Matching `awk '//' ` — Print lines matching a regex pattern (like grep). ```bash awk '/ERROR/' log.txt ``` `awk '!//' ` — Print lines NOT matching a pattern (like grep -v). ```bash awk '!/^#/' config.ini ``` `awk '$ ~ //' ` — Match a pattern against a specific field. ```bash awk '$3 ~ /error/' log.txt ``` `awk '$ !~ //' ` — Print lines where a specific field does NOT match a pattern. ```bash awk '$1 !~ /^192\.168/' access.log ``` `awk '//,//' ` — Print lines between two patterns (inclusive range). ```bash awk '/BEGIN/,/END/' config.txt ``` `awk '/pattern/ {print $2}' ` — Print a specific field only from lines matching a pattern. ```bash awk '/GET/ {print $7}' access.log ``` ## Conditions & Comparisons `awk '$ == ""' ` — Print lines where a field equals a specific string. ```bash awk '$3 == "ERROR"' log.txt ``` `awk '$ != ""' ` — Print lines where a field does not equal a value. ```bash awk '$1 != "localhost"' hosts.txt ``` `awk '$ > ' ` — Print lines where a numeric field exceeds a threshold. ```bash awk '$5 > 1000' data.txt ``` `awk '$ > && $ < ' ` — Combine multiple conditions with && (AND). ```bash awk '$3 > 100 && $4 < 500' sales.txt ``` `awk '$ > || $ == ""' ` — Combine conditions with || (OR). ```bash awk '$5 > 1000 || $3 == "CRITICAL"' log.txt ``` `awk 'NF > 0' ` — Print only non-empty lines (lines with at least one field). ```bash awk 'NF > 0' messy.txt ``` `awk 'NF == ' ` — Print lines with exactly N fields. ```bash awk 'NF == 4' data.txt ``` `awk 'length > ' ` — Print lines longer than N characters. ```bash awk 'length > 80' source.py ``` ## BEGIN & END Blocks `awk 'BEGIN {print "Header"} {print} END {print "Footer"}' ` — Execute code before processing (BEGIN) and after all lines (END). ```bash awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "---"}' results.txt ``` `awk 'BEGIN {} {} END {

}' ` — Classic awk structure: initialize, process each line, then summarize. ```bash awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt ``` `awk 'END {print NR}' ` — Print the total number of lines in a file. ```bash awk 'END {print NR}' data.txt ``` `awk 'END {print NR, "lines,", NF, "fields in last line"}' ` — Print summary statistics after processing all lines. ```bash awk 'END {print NR, "lines processed"}' access.log ``` ## Arithmetic & Aggregation `awk '{sum += $} END {print sum}' ` — Sum all values in a specific column. ```bash awk '{sum += $3} END {print sum}' sales.txt ``` `awk '{sum += $} END {print sum/NR}' ` — Calculate the average of a column. ```bash awk '{sum += $2} END {print "Average:", sum/NR}' scores.txt ``` `awk 'BEGIN {max=0} {if ($ > max) max=$} END {print max}' ` — Find the maximum value in a column. ```bash awk 'BEGIN {max=0} {if ($3 > max) max=$3} END {print "Max:", max}' data.txt ``` `awk 'NR==1 || $ < min {min=$} END {print min}' ` — Find the minimum value in a column. ```bash awk 'NR==1 || $2 < min {min=$2} END {print "Min:", min}' data.txt ``` `awk '{count[$]++} END {for (k in count) print k, count[k]}' ` — Count occurrences of each unique value in a column (frequency table). ```bash awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log ``` `awk '{sum[$1] += $2} END {for (k in sum) print k, sum[k]}' ` — Sum values grouped by a key column (like SQL GROUP BY). ```bash awk '{sum[$1] += $3} END {for (dept in sum) print dept, sum[dept]}' expenses.txt ``` ## String Functions `awk '{print length($0)}' ` — Print the length of each line. ```bash awk '{print length($0), $0}' data.txt ``` `awk '{print toupper($0)}' ` — Convert each line to uppercase. ```bash awk '{print toupper($0)}' input.txt ``` `awk '{print tolower($0)}' ` — Convert each line to lowercase. ```bash awk '{print tolower($0)}' input.txt ``` `awk '{gsub(//, ""); print}' ` — Global substitution on each line (like sed s///g). ```bash awk '{gsub(/foo/, "bar"); print}' input.txt ``` `awk '{sub(//, ""); print}' ` — Replace only the first occurrence on each line. ```bash awk '{sub(/^[ \t]+/, ""); print}' messy.txt ``` `awk '{print substr($0, , )}' ` — Extract a substring from each line (1-indexed start position). ```bash awk '{print substr($0, 1, 10)}' data.txt ``` `awk '{n=split($0, arr, ""); print arr[1]}' ` — Split a string into an array by a separator. Returns the number of elements. ```bash awk '{n=split($0, parts, ","); print parts[2]}' data.csv ``` `awk 'match($0, //) {print substr($0, RSTART, RLENGTH)}' ` — Extract the matched portion of a regex. Sets RSTART and RLENGTH. ```bash awk 'match($0, /[0-9]+\.[0-9]+/) {print substr($0, RSTART, RLENGTH)}' data.txt ``` ## Formatted Output `awk '{printf "%-20s %s\n", $1, $2}' ` — Print with formatted, aligned columns using printf. ```bash awk '{printf "%-20s %10s\n", $1, $2}' data.txt ``` `awk '{printf "%05d %s\n", NR, $0}' ` — Print line numbers zero-padded to 5 digits. ```bash awk '{printf "%05d %s\n", NR, $0}' script.sh ``` `awk '{printf "%.2f\n", $1}' ` — Format numbers with 2 decimal places. ```bash awk '{printf "$%.2f\n", $3}' prices.txt ``` `awk '{printf "%s,%s,%s\n", $1, $2, $3}' ` — Convert whitespace-separated data to CSV. ```bash awk '{printf "%s,%s,%s\n", $1, $2, $3}' data.txt ``` `awk -v OFS='\t' '{$1=$1; print}' ` — Convert any whitespace separation to tab-separated output. ```bash awk -v OFS='\t' '{$1=$1; print}' data.txt ``` ## Variables & Assignment `awk -v = '{print , $1}' ` — Pass an external variable into the awk program. ```bash awk -v threshold=100 '$3 > threshold {print}' data.txt ``` `awk -v var="$SHELL_VAR" '{print var, $0}' ` — Pass a shell variable into awk. ```bash awk -v user="$USER" '{print user, $0}' log.txt ``` `awk '{$ = ""; print}' ` — Replace a specific field value and print the modified line. ```bash awk '{$2 = "REDACTED"; print}' users.txt ``` `awk '{$(NF+1) = ""; print}' ` — Append a new field to the end of each line. ```bash awk -v OFS=',' '{$(NF+1) = "new_col"; print}' data.csv ``` ## Arrays & Deduplication `awk '!seen[$0]++' ` — Remove duplicate lines while preserving order (like sort -u but without sorting). ```bash awk '!seen[$0]++' list.txt ``` `awk '!seen[$]++' ` — Remove duplicates based on a specific field. ```bash awk '!seen[$1]++' data.txt ``` `awk '{a[$1]+=$2} END {for (k in a) print k, a[k]}' ` — Aggregate values by key using an associative array. ```bash awk -F',' '{a[$1]+=$2} END {for (k in a) print k, a[k]}' sales.csv ``` `awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' ` — Find and print only duplicate entries. ```bash awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' access.log ``` `awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' ` — Reverse the order of lines (like tac). ```bash awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' data.txt ``` ## Multi-File & Pipelines ` | awk '{print $}'` — Extract a specific column from command output. ```bash ps aux | awk '{print $1, $11}' ``` ` | awk 'NR>1 {print $}'` — Extract a column from command output, skipping the header line. ```bash df -h | awk 'NR>1 {print $1, $5}' ``` `awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' ` — Process multiple files with a separator. FNR resets per file, FILENAME holds the name. ```bash awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' *.log ``` `awk '{print > "output_" $ ".txt"}' ` — Split a file into multiple files based on a field value. ```bash awk -F',' '{print > "dept_" $1 ".csv"}' employees.csv ``` `awk '{print | "sort"}' ` — Pipe awk output to an external command. ```bash awk '{print $1}' access.log | sort | uniq -c | sort -rn ``` ## Common Recipes `awk -F',' 'NR==1 {for (i=1;i<=NF;i++) header[i]=$i} NR>1 {for (i=1;i<=NF;i++) print header[i]": "$i; print ""}' ` — Display CSV data vertically with column headers as labels. ```bash awk -F',' 'NR==1 {for (i=1;i<=NF;i++) h[i]=$i} NR>1 {for (i=1;i<=NF;i++) print h[i]": "$i; print ""}' users.csv ``` `awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' ` — Reverse the field order on each line. ```bash awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' data.txt ``` `awk 'NR==FNR {a[$1]; next} $1 in a' ` — Print lines from file2 where the first field exists in file1 (like a join/lookup). ```bash awk 'NR==FNR {a[$1]; next} $1 in a' ids.txt data.txt ``` `awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print sum}' ` — Sum all fields on each row (row totals). ```bash awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print $0, sum}' matrix.txt ``` `awk 'NR%==0' ` — Print every Nth line. ```bash awk 'NR%5==0' data.txt ``` `awk '{$1=""; print substr($0,2)}' ` — Remove the first field and print the rest of the line. ```bash awk '{$1=""; print substr($0,2)}' data.txt ``` ## Conclusion awk shines the moment your data has columns: `awk '{print $2}'`, a quick `sum += $3` in an `END` block, or a one-line frequency table with `count[$1]++` often replaces a whole script. Start with `-F` to set the field separator, lean on `NR`/`NF` for line and field counts, and remember that awk reads input without modifying your files – its output goes to stdout, so redirect deliberately when you write results back. A common gotcha: awk regex literals have no `/i` flag, so for case-insensitive matching use gawk's `IGNORECASE=1` or `tolower()`. For anything beyond a few lines, GNU awk (gawk) is the most capable and widely available implementation. ## Further Reading - [GNU awk (gawk) manual](https://www.gnu.org/software/gawk/manual/gawk.html) – the comprehensive reference for the language - [The GNU Awk User's Guide – One-liners](https://www.gnu.org/software/gawk/manual/gawk.html#One_002dliners) – practical, ready-to-use awk snippets ## Related Commands - [sed](https://www.jpkc.com/db/en/cheatsheets/files-text/sed/) – stream editor for substitution and line edits - [grep](https://www.jpkc.com/db/en/cheatsheets/files-text/grep/) – fast pattern search to pre-filter lines for awk - [cut](https://www.jpkc.com/db/en/cheatsheets/files-text/cut/) – lightweight field extraction for simple column jobs