# Log Analysis — Inspecting Web Server Logs with Built-in Tools

> Practical recipes for analyzing Apache and nginx logs with grep, awk, sort and uniq — top IPs, status codes, bandwidth and real-time monitoring.

Source: https://www.jpkc.com/db/en/cheatsheets/web-servers/log-analysis/

<!-- PROSE:intro -->
Before you fire up GoAccess or a full log stack for a quick lookup, the tools already on your system usually get you there: `grep`, `awk`, `sort`, `uniq` and `tail -f` answer most questions about Apache and nginx logs in seconds. This recipe collection shows you how to find top IPs, check the distribution of status codes, surface the most-requested URLs, sum up bandwidth and watch errors in real time. Keep in mind that the field indices (`$1`, `$7`, `$9` …) depend on your actual log format — the examples assume the Combined Log Format. For ad-hoc analysis these one-liners are more than enough; only recurring reports or a dashboard make a dedicated tool worthwhile.
<!-- PROSE:intro:end -->

## Quick Inspection

`wc -l <file>` — Count the total number of lines in a log file.

```bash
wc -l access.log
```

`head -n <count> <file>` — View the first N lines to understand the log format.

```bash
head -n 20 /var/log/syslog
```

`tail -n <count> <file>` — View the most recent N log entries.

```bash
tail -n 50 /var/log/nginx/error.log
```

`less +F <file>` — Open a log file in follow mode (like tail -f but with scrollback). Press Ctrl+C to scroll, F to resume following.

```bash
less +F /var/log/syslog
```

`file <logfile>` — Detect if a log file is plain text, gzipped, or another format.

```bash
file /var/log/syslog.2.gz
```

`du -sh <file>` — Check the size of a log file before processing.

```bash
du -sh /var/log/nginx/access.log
```

## Filtering by Pattern

`grep '<pattern>' <file>` — Find all lines matching a pattern.

```bash
grep 'ERROR' /var/log/app.log
```

`grep -i '<pattern>' <file>` — Case-insensitive pattern search.

```bash
grep -i 'timeout' /var/log/app.log
```

`grep -E 'ERROR|WARN|FATAL' <file>` — Find lines matching any of multiple patterns.

```bash
grep -E 'ERROR|WARN|FATAL' /var/log/app.log
```

`grep -v '<pattern>' <file>` — Exclude lines matching a pattern (invert match).

```bash
grep -v 'healthcheck' access.log
```

`grep -v -e '<a>' -e '<b>' <file>` — Exclude lines matching multiple patterns.

```bash
grep -v -e 'bot' -e 'crawler' -e 'monitoring' access.log
```

`grep -c '<pattern>' <file>` — Count how many lines match a pattern.

```bash
grep -c '500' access.log
```

`grep -B <n> -A <n> '<pattern>' <file>` — Show context lines before and after each match.

```bash
grep -B 3 -A 5 'OutOfMemoryError' app.log
```

## Time-Based Filtering

`grep '<date>' <file>` — Filter log entries for a specific date.

```bash
grep '2026-02-16' /var/log/app.log
```

`grep '<date>.*<time_prefix>' <file>` — Filter entries for a specific date and hour.

```bash
grep '16/Feb/2026:14' access.log
```

`awk '$0 >= "<start>" && $0 <= "<end>"' <file>` — Extract log entries within a time range (works with ISO timestamps at line start).

```bash
awk '$0 >= "2026-02-16 10:00" && $0 <= "2026-02-16 12:00"' app.log
```

`sed -n '/<start>/,/<end>/p' <file>` — Extract all lines between two timestamp patterns.

```bash
sed -n '/2026-02-16 10:00/,/2026-02-16 12:00/p' app.log
```

`awk '/^<start>/,/^<end>/' <file>` — Extract a range of lines between two matching patterns with awk.

```bash
awk '/^2026-02-16 14:00/,/^2026-02-16 15:00/' app.log
```

`journalctl --since '<start>' --until '<end>'` — Query systemd journal for a specific time range.

```bash
journalctl --since '2026-02-16 10:00' --until '2026-02-16 12:00'
```

## Frequency & Counting

`sort <file> | uniq -c | sort -rn | head -<n>` — Count and rank the most frequent lines.

```bash
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
```

`awk '{print $<field>}' <file> | sort | uniq -c | sort -rn` — Count occurrences of a specific field (column).

```bash
awk '{print $9}' access.log | sort | uniq -c | sort -rn
```

`grep -oE '<pattern>' <file> | sort | uniq -c | sort -rn` — Extract and count specific patterns from log lines.

```bash
grep -oE '\b[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\b' access.log | sort | uniq -c | sort -rn | head
```

`awk '{count[$<field>]++} END {for (k in count) print count[k], k}' <file> | sort -rn` — Count field occurrences with awk (faster for large files, single pass).

```bash
awk '{count[$9]++} END {for (k in count) print count[k], k}' access.log | sort -rn
```

`grep -c '<pattern>' <file1> <file2> <file3>` — Count matches per file across multiple log files.

```bash
grep -c 'ERROR' /var/log/app.log.*
```

## Apache/Nginx Access Logs

`awk '{print $9}' <file> | sort | uniq -c | sort -rn` — Count HTTP status codes (field 9 in Combined Log Format).

```bash
awk '{print $9}' access.log | sort | uniq -c | sort -rn
```

`awk '{print $1}' <file> | sort | uniq -c | sort -rn | head -<n>` — Find the top N client IP addresses by request count.

```bash
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
```

`awk '{print $7}' <file> | sort | uniq -c | sort -rn | head -<n>` — Find the most requested URLs.

```bash
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20
```

`awk '$9 == <code>' <file>` — Filter requests by a specific HTTP status code.

```bash
awk '$9 == 404' access.log
```

`awk '$9 >= 500' <file>` — Find all server errors (5xx status codes).

```bash
awk '$9 >= 500' access.log
```

`awk '{print $10}' <file> | paste -sd+ | bc` — Calculate total bytes transferred (field 10 in Combined Log Format).

```bash
awk '$9 == 200 {print $10}' access.log | paste -sd+ | bc
```

`awk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' <file>` — Calculate total bandwidth in human-readable format.

```bash
awk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' access.log
```

`awk -F'"' '{print $6}' <file> | sort | uniq -c | sort -rn | head -<n>` — Find the top User-Agent strings (field 6 when splitting by quotes).

```bash
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -rn | head -10
```

`awk '$9 == 404 {print $7}' <file> | sort | uniq -c | sort -rn | head -<n>` — Find the most common 404 URLs.

```bash
awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20
```

## Requests per Time Period

`awk '{print substr($4, 2, 17)}' <file> | uniq -c` — Count requests per minute (Apache/Nginx Combined Log Format).

```bash
awk '{print substr($4, 2, 17)}' access.log | uniq -c
```

`awk '{print substr($4, 2, 14)}' <file> | uniq -c` — Count requests per hour.

```bash
awk '{print substr($4, 2, 14)}' access.log | uniq -c
```

`awk '{print substr($4, 2, 17)}' <file> | uniq -c | sort -rn | head -<n>` — Find the busiest minutes by request volume (traffic spikes).

```bash
awk '{print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10
```

`awk '$9 >= 500 {print substr($4, 2, 17)}' <file> | uniq -c` — Count server errors per minute to identify error spikes.

```bash
awk '$9 >= 500 {print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10
```

## Real-Time Monitoring

`tail -f <file>` — Follow a log file in real-time as new entries are appended.

```bash
tail -f /var/log/nginx/error.log
```

`tail -f <file> | grep --line-buffered '<pattern>'` — Follow a log and filter for a specific pattern in real-time.

```bash
tail -f /var/log/app.log | grep --line-buffered 'ERROR'
```

`tail -f <file1> <file2>` — Follow multiple log files simultaneously with filename headers.

```bash
tail -f /var/log/nginx/access.log /var/log/nginx/error.log
```

`tail -F <file>` — Follow with retry. Keeps following even after log rotation.

```bash
tail -F /var/log/app.log
```

`journalctl -f -u <service>` — Follow systemd journal output for a specific service.

```bash
journalctl -f -u nginx
```

`tail -f <file> | awk '$9 >= 400 {print}'` — Monitor access log and show only error responses in real-time.

```bash
tail -f access.log | awk '$9 >= 400 {print}'
```

`tail -f <file> | while read line; do echo "$(date +%T) $line"; done` — Follow a log and prepend wall-clock timestamps to each line.

```bash
tail -f app.log | while read line; do echo "$(date +%T) $line"; done
```

## Compressed & Rotated Logs

`zcat <file.gz>` — View the full contents of a gzipped log file.

```bash
zcat /var/log/syslog.2.gz
```

`zgrep '<pattern>' <file.gz>` — Search inside gzipped log files without decompressing.

```bash
zgrep 'ERROR' /var/log/app.log.*.gz
```

`zless <file.gz>` — Browse a gzipped log file interactively with a pager.

```bash
zless /var/log/syslog.3.gz
```

`zcat <files.gz> | grep '<pattern>'` — Search across multiple compressed log files.

```bash
zcat /var/log/nginx/access.log.*.gz | grep '500'
```

`cat <current> <(zcat <rotated.gz>) | grep '<pattern>'` — Search across both current and rotated compressed logs.

```bash
cat access.log <(zcat access.log.*.gz) | grep 'POST /api/login'
```

## Systemd Journal (journalctl)

`journalctl -u <service>` — Show all journal entries for a specific systemd service.

```bash
journalctl -u nginx
```

`journalctl -u <service> -n <count>` — Show the last N entries for a service.

```bash
journalctl -u mysql -n 50
```

`journalctl -p <priority>` — Filter by priority level: emerg, alert, crit, err, warning, notice, info, debug.

```bash
journalctl -p err
```

`journalctl -p <priority> --since '<time>'` — Show entries of a specific priority since a given time.

```bash
journalctl -p warning --since '1 hour ago'
```

`journalctl --since '<start>' --until '<end>'` — Query entries within a time range.

```bash
journalctl --since '2026-02-16 08:00' --until '2026-02-16 12:00'
```

`journalctl -u <service> --no-pager -o json-pretty` — Output journal entries as formatted JSON for further processing.

```bash
journalctl -u nginx --no-pager -o json-pretty | head -100
```

`journalctl --disk-usage` — Show how much disk space the journal is using.

`journalctl -k` — Show only kernel messages (equivalent to dmesg).

```bash
journalctl -k -p err
```

## Multi-Line Log Entries

`grep -A <n> '<pattern>' <file>` — Show a fixed number of lines after each match (e.g. stack traces).

```bash
grep -A 20 'Exception' /var/log/app.log
```

`awk '/<start>/{found=1} found; /<end>/{found=0}' <file>` — Extract blocks between a start and end pattern.

```bash
awk '/BEGIN STACKTRACE/{found=1} found; /END STACKTRACE/{found=0}' app.log
```

`grep -Pzo '(?s)<start>.*?<end>' <file>` — Extract multi-line blocks using Perl-compatible regex.

```bash
grep -Pzo '(?s)Exception.*?\n\n' app.log
```

`awk '/^[0-9]{4}-/{if(buf && buf ~ /<pattern>/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /<pattern>/) print buf}' <file>` — Collect multi-line entries starting with a timestamp and filter by pattern.

```bash
awk '/^[0-9]{4}-/{if(buf && buf ~ /ERROR/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /ERROR/) print buf}' app.log
```

## Field Extraction with awk

`awk '{print $<n>}' <file>` — Print a specific field (column) from each log line.

```bash
awk '{print $1}' access.log
```

`awk -F'<sep>' '{print $<n>}' <file>` — Split lines by a custom field separator and extract a field.

```bash
awk -F'|' '{print $3}' app.log
```

`awk -F'"' '{print $2}' <file>` — Extract the request line from Apache/Nginx Combined Log Format.

```bash
awk -F'"' '{print $2}' access.log
```

`awk '{print $NF}' <file>` — Print the last field of each line.

```bash
awk '{print $NF}' access.log
```

`awk '{for(i=<start>;i<=NF;i++) printf "%s ", $i; print ""}' <file>` — Print all fields from field N to end of line (skip leading columns).

```bash
awk '{for(i=4;i<=NF;i++) printf "%s ", $i; print ""}' /var/log/syslog
```

## Sorting & Deduplication

`sort <file> | uniq` — Remove duplicate lines from sorted output.

```bash
awk '{print $7}' access.log | sort | uniq
```

`sort -u <file>` — Sort and remove duplicates in one step.

```bash
awk '{print $1}' access.log | sort -u
```

`sort <file> | uniq -c | sort -rn` — Count occurrences and sort by frequency (descending).

```bash
awk '{print $7}' access.log | sort | uniq -c | sort -rn
```

`sort <file> | uniq -d` — Show only lines that appear more than once (duplicates only).

```bash
awk '{print $1}' access.log | sort | uniq -d
```

`sort -t'<sep>' -k<field> -rn <file>` — Sort by a specific field numerically in reverse order.

```bash
du -sh /var/log/*.log | sort -t'\t' -k1 -rh
```

## Practical Recipes

`awk '{ip[$1]++} END {for (k in ip) if (ip[k]>100) print ip[k], k}' access.log` — Find IP addresses with more than 100 requests (potential abuse).

`awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20` — Top 20 URLs returning 404 errors.

`awk -F'"' '$2 ~ /POST/ {print $2}' access.log | sort | uniq -c | sort -rn` — Count POST requests grouped by URL.

`grep 'ERROR' app.log | awk '{print $1, $2}' | cut -d: -f1,2 | uniq -c` — Count errors grouped by hour for trend analysis.

`diff <(grep 'ERROR' app.log.1) <(grep 'ERROR' app.log)` — Compare error patterns between yesterday's and today's log.

`awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' <file>` — Show the first and last log entry to determine the time span of a log file.

```bash
awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' access.log
```

`awk '{total+=$10; count++} END {printf "Avg: %.0f bytes (%d requests)\n", total/count, count}' access.log` — Calculate the average response size across all requests.

`awk '$NF > <seconds>' <file>` — Find slow requests where the last field contains response time.

```bash
awk '$NF > 5.0' access.log
```

<!-- PROSE:outro -->
## Conclusion

For ad-hoc analysis you do not need a heavy toolkit: `grep`, `awk` and the familiar `sort | uniq -c | sort -rn` chain reliably cover top IPs, status codes, URLs and bandwidth, while `tail -f` turns the same data into live monitoring. Once those same reports become a daily task or you want a dashboard, though, it pays to move to a dedicated tool such as GoAccess. Either way, remember to adapt the field indices to your actual log format.

## Further Reading

- [GoAccess](https://goaccess.io/) – interactive real-time log analyzer for Apache and nginx
- [GNU Awk manual](https://www.gnu.org/software/gawk/manual/) – complete reference for awk
- [grep(1) – manual page](https://www.gnu.org/software/grep/manual/grep.html) – every option of GNU grep
<!-- PROSE:outro:end -->

## Related Commands

- [apache](https://www.jpkc.com/db/en/cheatsheets/web-servers/apache/) – web server whose access and error logs you analyze here
- [caddy](https://www.jpkc.com/db/en/cheatsheets/web-servers/caddy/) – modern web server with its own (often JSON) log format
- [certbot](https://www.jpkc.com/db/en/cheatsheets/web-servers/certbot/) – manage TLS certificates whose renewals show up in the log

