# awk — Field-Based Text Processing and Reporting

> Process columnar text with awk — extract fields, filter by pattern, aggregate with arrays and format output. A compact language for data and logs.

Source: https://www.jpkc.com/db/en/cheatsheets/files-text/awk/

<!-- PROSE:intro -->
awk is a small but complete programming language built for column-based text. It reads input line by line, splits each line into fields (`$1`, `$2`, … `$NF`) and runs your program against every record – making it the natural choice for log files, CSV/TSV data and tabular command output. Where grep finds lines and sed rewrites them, awk lets you select by field, compute sums and averages, build frequency tables with associative arrays and print neatly formatted reports. This guide covers the essentials: fields and separators, pattern matching, conditions, `BEGIN`/`END` blocks, arithmetic, string functions and the recipes you reach for again and again.
<!-- PROSE:intro:end -->

## Basic Usage

`awk '{print}' <file>` — Print every line of the file (like cat).

```bash
awk '{print}' data.txt
```

`awk '{print $0}' <file>` — Print each entire line. $0 represents the whole line.

```bash
awk '{print $0}' data.txt
```

`awk '{print $1}' <file>` — Print the first field (column) of each line. Fields are split by whitespace.

```bash
awk '{print $1}' access.log
```

`awk '{print $1, $3}' <file>` — Print specific fields separated by a space (output field separator).

```bash
awk '{print $1, $3}' data.txt
```

`awk '{print $NF}' <file>` — Print the last field of each line. NF is the number of fields.

```bash
awk '{print $NF}' access.log
```

`awk '{print $(NF-1)}' <file>` — Print the second-to-last field of each line.

```bash
awk '{print $(NF-1)}' data.txt
```

`awk 'NR==<n>' <file>` — Print only a specific line number. NR is the current record (line) number.

```bash
awk 'NR==5' data.txt
```

`awk 'NR>=<n> && NR<=<m>' <file>` — Print a range of lines.

```bash
awk 'NR>=10 && NR<=20' data.txt
```

## Field Separator

`awk -F'<sep>' '{print $1}' <file>` — Set a custom input field separator.

```bash
awk -F',' '{print $1}' data.csv
```

`awk -F':' '{print $1, $3}' <file>` — Use colon as field separator (useful for /etc/passwd).

```bash
awk -F':' '{print $1, $3}' /etc/passwd
```

`awk -F'\t' '{print $2}' <file>` — Use tab as field separator for TSV files.

```bash
awk -F'\t' '{print $2}' data.tsv
```

`awk -F'[,;:]' '{print $1}' <file>` — Use a regex as field separator (match any of the characters).

```bash
awk -F'[,;:]' '{print $1}' mixed.txt
```

`awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' <file>` — Set input field separator (FS) and output field separator (OFS).

```bash
awk 'BEGIN{FS=","; OFS="\t"} {print $1, $2}' data.csv
```

`awk -v OFS=',' '{$1=$1; print}' <file>` — Change the output separator. The $1=$1 trick forces awk to rebuild the line.

```bash
awk -v OFS=',' '{$1=$1; print}' whitespace.txt
```

## Pattern Matching

`awk '/<pattern>/' <file>` — Print lines matching a regex pattern (like grep).

```bash
awk '/ERROR/' log.txt
```

`awk '!/<pattern>/' <file>` — Print lines NOT matching a pattern (like grep -v).

```bash
awk '!/^#/' config.ini
```

`awk '$<n> ~ /<pattern>/' <file>` — Match a pattern against a specific field.

```bash
awk '$3 ~ /error/' log.txt
```

`awk '$<n> !~ /<pattern>/' <file>` — Print lines where a specific field does NOT match a pattern.

```bash
awk '$1 !~ /^192\.168/' access.log
```

`awk '/<start>/,/<end>/' <file>` — Print lines between two patterns (inclusive range).

```bash
awk '/BEGIN/,/END/' config.txt
```

`awk '/pattern/ {print $2}' <file>` — Print a specific field only from lines matching a pattern.

```bash
awk '/GET/ {print $7}' access.log
```

## Conditions & Comparisons

`awk '$<n> == "<value>"' <file>` — Print lines where a field equals a specific string.

```bash
awk '$3 == "ERROR"' log.txt
```

`awk '$<n> != "<value>"' <file>` — Print lines where a field does not equal a value.

```bash
awk '$1 != "localhost"' hosts.txt
```

`awk '$<n> > <value>' <file>` — Print lines where a numeric field exceeds a threshold.

```bash
awk '$5 > 1000' data.txt
```

`awk '$<n> > <value> && $<m> < <value>' <file>` — Combine multiple conditions with && (AND).

```bash
awk '$3 > 100 && $4 < 500' sales.txt
```

`awk '$<n> > <value> || $<m> == "<value>"' <file>` — Combine conditions with || (OR).

```bash
awk '$5 > 1000 || $3 == "CRITICAL"' log.txt
```

`awk 'NF > 0' <file>` — Print only non-empty lines (lines with at least one field).

```bash
awk 'NF > 0' messy.txt
```

`awk 'NF == <n>' <file>` — Print lines with exactly N fields.

```bash
awk 'NF == 4' data.txt
```

`awk 'length > <n>' <file>` — Print lines longer than N characters.

```bash
awk 'length > 80' source.py
```

## BEGIN & END Blocks

`awk 'BEGIN {print "Header"} {print} END {print "Footer"}' <file>` — Execute code before processing (BEGIN) and after all lines (END).

```bash
awk 'BEGIN {print "Name\tScore"} {print $1"\t"$2} END {print "---"}' results.txt
```

`awk 'BEGIN {<init>} {<body>} END {<summary>}' <file>` — Classic awk structure: initialize, process each line, then summarize.

```bash
awk 'BEGIN {sum=0} {sum+=$1} END {print "Total:", sum}' numbers.txt
```

`awk 'END {print NR}' <file>` — Print the total number of lines in a file.

```bash
awk 'END {print NR}' data.txt
```

`awk 'END {print NR, "lines,", NF, "fields in last line"}' <file>` — Print summary statistics after processing all lines.

```bash
awk 'END {print NR, "lines processed"}' access.log
```

## Arithmetic & Aggregation

`awk '{sum += $<n>} END {print sum}' <file>` — Sum all values in a specific column.

```bash
awk '{sum += $3} END {print sum}' sales.txt
```

`awk '{sum += $<n>} END {print sum/NR}' <file>` — Calculate the average of a column.

```bash
awk '{sum += $2} END {print "Average:", sum/NR}' scores.txt
```

`awk 'BEGIN {max=0} {if ($<n> > max) max=$<n>} END {print max}' <file>` — Find the maximum value in a column.

```bash
awk 'BEGIN {max=0} {if ($3 > max) max=$3} END {print "Max:", max}' data.txt
```

`awk 'NR==1 || $<n> < min {min=$<n>} END {print min}' <file>` — Find the minimum value in a column.

```bash
awk 'NR==1 || $2 < min {min=$2} END {print "Min:", min}' data.txt
```

`awk '{count[$<n>]++} END {for (k in count) print k, count[k]}' <file>` — Count occurrences of each unique value in a column (frequency table).

```bash
awk '{count[$1]++} END {for (ip in count) print ip, count[ip]}' access.log
```

`awk '{sum[$1] += $2} END {for (k in sum) print k, sum[k]}' <file>` — Sum values grouped by a key column (like SQL GROUP BY).

```bash
awk '{sum[$1] += $3} END {for (dept in sum) print dept, sum[dept]}' expenses.txt
```

## String Functions

`awk '{print length($0)}' <file>` — Print the length of each line.

```bash
awk '{print length($0), $0}' data.txt
```

`awk '{print toupper($0)}' <file>` — Convert each line to uppercase.

```bash
awk '{print toupper($0)}' input.txt
```

`awk '{print tolower($0)}' <file>` — Convert each line to lowercase.

```bash
awk '{print tolower($0)}' input.txt
```

`awk '{gsub(/<pattern>/, "<replacement>"); print}' <file>` — Global substitution on each line (like sed s///g).

```bash
awk '{gsub(/foo/, "bar"); print}' input.txt
```

`awk '{sub(/<pattern>/, "<replacement>"); print}' <file>` — Replace only the first occurrence on each line.

```bash
awk '{sub(/^[ \t]+/, ""); print}' messy.txt
```

`awk '{print substr($0, <start>, <length>)}' <file>` — Extract a substring from each line (1-indexed start position).

```bash
awk '{print substr($0, 1, 10)}' data.txt
```

`awk '{n=split($0, arr, "<sep>"); print arr[1]}' <file>` — Split a string into an array by a separator. Returns the number of elements.

```bash
awk '{n=split($0, parts, ","); print parts[2]}' data.csv
```

`awk 'match($0, /<pattern>/) {print substr($0, RSTART, RLENGTH)}' <file>` — Extract the matched portion of a regex. Sets RSTART and RLENGTH.

```bash
awk 'match($0, /[0-9]+\.[0-9]+/) {print substr($0, RSTART, RLENGTH)}' data.txt
```

## Formatted Output

`awk '{printf "%-20s %s\n", $1, $2}' <file>` — Print with formatted, aligned columns using printf.

```bash
awk '{printf "%-20s %10s\n", $1, $2}' data.txt
```

`awk '{printf "%05d %s\n", NR, $0}' <file>` — Print line numbers zero-padded to 5 digits.

```bash
awk '{printf "%05d %s\n", NR, $0}' script.sh
```

`awk '{printf "%.2f\n", $1}' <file>` — Format numbers with 2 decimal places.

```bash
awk '{printf "$%.2f\n", $3}' prices.txt
```

`awk '{printf "%s,%s,%s\n", $1, $2, $3}' <file>` — Convert whitespace-separated data to CSV.

```bash
awk '{printf "%s,%s,%s\n", $1, $2, $3}' data.txt
```

`awk -v OFS='\t' '{$1=$1; print}' <file>` — Convert any whitespace separation to tab-separated output.

```bash
awk -v OFS='\t' '{$1=$1; print}' data.txt
```

## Variables & Assignment

`awk -v <var>=<value> '{print <var>, $1}' <file>` — Pass an external variable into the awk program.

```bash
awk -v threshold=100 '$3 > threshold {print}' data.txt
```

`awk -v var="$SHELL_VAR" '{print var, $0}' <file>` — Pass a shell variable into awk.

```bash
awk -v user="$USER" '{print user, $0}' log.txt
```

`awk '{$<n> = "<value>"; print}' <file>` — Replace a specific field value and print the modified line.

```bash
awk '{$2 = "REDACTED"; print}' users.txt
```

`awk '{$(NF+1) = "<value>"; print}' <file>` — Append a new field to the end of each line.

```bash
awk -v OFS=',' '{$(NF+1) = "new_col"; print}' data.csv
```

## Arrays & Deduplication

`awk '!seen[$0]++' <file>` — Remove duplicate lines while preserving order (like sort -u but without sorting).

```bash
awk '!seen[$0]++' list.txt
```

`awk '!seen[$<n>]++' <file>` — Remove duplicates based on a specific field.

```bash
awk '!seen[$1]++' data.txt
```

`awk '{a[$1]+=$2} END {for (k in a) print k, a[k]}' <file>` — Aggregate values by key using an associative array.

```bash
awk -F',' '{a[$1]+=$2} END {for (k in a) print k, a[k]}' sales.csv
```

`awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' <file>` — Find and print only duplicate entries.

```bash
awk '{a[$1]++} END {for (k in a) if (a[k]>1) print k, a[k]}' access.log
```

`awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' <file>` — Reverse the order of lines (like tac).

```bash
awk '{a[NR]=$0} END {for (i=NR; i>=1; i--) print a[i]}' data.txt
```

## Multi-File & Pipelines

`<command> | awk '{print $<n>}'` — Extract a specific column from command output.

```bash
ps aux | awk '{print $1, $11}'
```

`<command> | awk 'NR>1 {print $<n>}'` — Extract a column from command output, skipping the header line.

```bash
df -h | awk 'NR>1 {print $1, $5}'
```

`awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' <file1> <file2>` — Process multiple files with a separator. FNR resets per file, FILENAME holds the name.

```bash
awk 'FNR==1 {print "--- " FILENAME " ---"} {print}' *.log
```

`awk '{print > "output_" $<n> ".txt"}' <file>` — Split a file into multiple files based on a field value.

```bash
awk -F',' '{print > "dept_" $1 ".csv"}' employees.csv
```

`awk '{print | "sort"}' <file>` — Pipe awk output to an external command.

```bash
awk '{print $1}' access.log | sort | uniq -c | sort -rn
```

## Common Recipes

`awk -F',' 'NR==1 {for (i=1;i<=NF;i++) header[i]=$i} NR>1 {for (i=1;i<=NF;i++) print header[i]": "$i; print ""}' <file>` — Display CSV data vertically with column headers as labels.

```bash
awk -F',' 'NR==1 {for (i=1;i<=NF;i++) h[i]=$i} NR>1 {for (i=1;i<=NF;i++) print h[i]": "$i; print ""}' users.csv
```

`awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' <file>` — Reverse the field order on each line.

```bash
awk '{for (i=NF; i>0; i--) printf "%s ", $i; printf "\n"}' data.txt
```

`awk 'NR==FNR {a[$1]; next} $1 in a' <file1> <file2>` — Print lines from file2 where the first field exists in file1 (like a join/lookup).

```bash
awk 'NR==FNR {a[$1]; next} $1 in a' ids.txt data.txt
```

`awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print sum}' <file>` — Sum all fields on each row (row totals).

```bash
awk '{sum=0; for(i=1;i<=NF;i++) sum+=$i; print $0, sum}' matrix.txt
```

`awk 'NR%<n>==0' <file>` — Print every Nth line.

```bash
awk 'NR%5==0' data.txt
```

`awk '{$1=""; print substr($0,2)}' <file>` — Remove the first field and print the rest of the line.

```bash
awk '{$1=""; print substr($0,2)}' data.txt
```

<!-- PROSE:outro -->
## Conclusion

awk shines the moment your data has columns: `awk '{print $2}'`, a quick `sum += $3` in an `END` block, or a one-line frequency table with `count[$1]++` often replaces a whole script. Start with `-F` to set the field separator, lean on `NR`/`NF` for line and field counts, and remember that awk reads input without modifying your files – its output goes to stdout, so redirect deliberately when you write results back. A common gotcha: awk regex literals have no `/i` flag, so for case-insensitive matching use gawk's `IGNORECASE=1` or `tolower()`. For anything beyond a few lines, GNU awk (gawk) is the most capable and widely available implementation.

## Further Reading

- [GNU awk (gawk) manual](https://www.gnu.org/software/gawk/manual/gawk.html) – the comprehensive reference for the language
- [The GNU Awk User's Guide – One-liners](https://www.gnu.org/software/gawk/manual/gawk.html#One_002dliners) – practical, ready-to-use awk snippets
<!-- PROSE:outro:end -->

## Related Commands

- [sed](https://www.jpkc.com/db/en/cheatsheets/files-text/sed/) – stream editor for substitution and line edits
- [grep](https://www.jpkc.com/db/en/cheatsheets/files-text/grep/) – fast pattern search to pre-filter lines for awk
- [cut](https://www.jpkc.com/db/en/cheatsheets/files-text/cut/) – lightweight field extraction for simple column jobs

