Log Analysis — Inspecting Web Server Logs with Built-in Tools

Practical recipes for analyzing Apache and nginx logs with grep, awk, sort and uniq — top IPs, status codes, bandwidth and real-time monitoring.

Before you fire up GoAccess or a full log stack for a quick lookup, the tools already on your system usually get you there: grep, awk, sort, uniq and tail -f answer most questions about Apache and nginx logs in seconds. This recipe collection shows you how to find top IPs, check the distribution of status codes, surface the most-requested URLs, sum up bandwidth and watch errors in real time. Keep in mind that the field indices ($1, $7, $9 …) depend on your actual log format — the examples assume the Combined Log Format. For ad-hoc analysis these one-liners are more than enough; only recurring reports or a dashboard make a dedicated tool worthwhile.

Quick Inspection

wc -l <file> — Count the total number of lines in a log file.

wc -l access.log

head -n <count> <file> — View the first N lines to understand the log format.

head -n 20 /var/log/syslog

tail -n <count> <file> — View the most recent N log entries.

tail -n 50 /var/log/nginx/error.log

less +F <file> — Open a log file in follow mode (like tail -f but with scrollback). Press Ctrl+C to scroll, F to resume following.

less +F /var/log/syslog

file <logfile> — Detect if a log file is plain text, gzipped, or another format.

file /var/log/syslog.2.gz

du -sh <file> — Check the size of a log file before processing.

du -sh /var/log/nginx/access.log

Filtering by Pattern

grep '<pattern>' <file> — Find all lines matching a pattern.

grep 'ERROR' /var/log/app.log

grep -i '<pattern>' <file> — Case-insensitive pattern search.

grep -i 'timeout' /var/log/app.log

grep -E 'ERROR|WARN|FATAL' <file> — Find lines matching any of multiple patterns.

grep -E 'ERROR|WARN|FATAL' /var/log/app.log

grep -v '<pattern>' <file> — Exclude lines matching a pattern (invert match).

grep -v 'healthcheck' access.log

grep -v -e '<a>' -e '<b>' <file> — Exclude lines matching multiple patterns.

grep -v -e 'bot' -e 'crawler' -e 'monitoring' access.log

grep -c '<pattern>' <file> — Count how many lines match a pattern.

grep -c '500' access.log

grep -B <n> -A <n> '<pattern>' <file> — Show context lines before and after each match.

grep -B 3 -A 5 'OutOfMemoryError' app.log

Time-Based Filtering

grep '<date>' <file> — Filter log entries for a specific date.

grep '2026-02-16' /var/log/app.log

grep '<date>.*<time_prefix>' <file> — Filter entries for a specific date and hour.

grep '16/Feb/2026:14' access.log

awk '$0 >= "<start>" && $0 <= "<end>"' <file> — Extract log entries within a time range (works with ISO timestamps at line start).

awk '$0 >= "2026-02-16 10:00" && $0 <= "2026-02-16 12:00"' app.log

sed -n '/<start>/,/<end>/p' <file> — Extract all lines between two timestamp patterns.

sed -n '/2026-02-16 10:00/,/2026-02-16 12:00/p' app.log

awk '/^<start>/,/^<end>/' <file> — Extract a range of lines between two matching patterns with awk.

awk '/^2026-02-16 14:00/,/^2026-02-16 15:00/' app.log

journalctl --since '<start>' --until '<end>' — Query systemd journal for a specific time range.

journalctl --since '2026-02-16 10:00' --until '2026-02-16 12:00'

Frequency & Counting

sort <file> | uniq -c | sort -rn | head -<n> — Count and rank the most frequent lines.

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

awk '{print $<field>}' <file> | sort | uniq -c | sort -rn — Count occurrences of a specific field (column).

awk '{print $9}' access.log | sort | uniq -c | sort -rn

grep -oE '<pattern>' <file> | sort | uniq -c | sort -rn — Extract and count specific patterns from log lines.

grep -oE '\b[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\b' access.log | sort | uniq -c | sort -rn | head

awk '{count[$<field>]++} END {for (k in count) print count[k], k}' <file> | sort -rn — Count field occurrences with awk (faster for large files, single pass).

awk '{count[$9]++} END {for (k in count) print count[k], k}' access.log | sort -rn

grep -c '<pattern>' <file1> <file2> <file3> — Count matches per file across multiple log files.

grep -c 'ERROR' /var/log/app.log.*

Apache/Nginx Access Logs

awk '{print $9}' <file> | sort | uniq -c | sort -rn — Count HTTP status codes (field 9 in Combined Log Format).

awk '{print $9}' access.log | sort | uniq -c | sort -rn

awk '{print $1}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the top N client IP addresses by request count.

awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20

awk '{print $7}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the most requested URLs.

awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20

awk '$9 == <code>' <file> — Filter requests by a specific HTTP status code.

awk '$9 == 404' access.log

awk '$9 >= 500' <file> — Find all server errors (5xx status codes).

awk '$9 >= 500' access.log

awk '{print $10}' <file> | paste -sd+ | bc — Calculate total bytes transferred (field 10 in Combined Log Format).

awk '$9 == 200 {print $10}' access.log | paste -sd+ | bc

awk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' <file> — Calculate total bandwidth in human-readable format.

awk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' access.log

awk -F'"' '{print $6}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the top User-Agent strings (field 6 when splitting by quotes).

awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -rn | head -10

awk '$9 == 404 {print $7}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the most common 404 URLs.

awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20

Requests per Time Period

awk '{print substr($4, 2, 17)}' <file> | uniq -c — Count requests per minute (Apache/Nginx Combined Log Format).

awk '{print substr($4, 2, 17)}' access.log | uniq -c

awk '{print substr($4, 2, 14)}' <file> | uniq -c — Count requests per hour.

awk '{print substr($4, 2, 14)}' access.log | uniq -c

awk '{print substr($4, 2, 17)}' <file> | uniq -c | sort -rn | head -<n> — Find the busiest minutes by request volume (traffic spikes).

awk '{print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10

awk '$9 >= 500 {print substr($4, 2, 17)}' <file> | uniq -c — Count server errors per minute to identify error spikes.

awk '$9 >= 500 {print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10

Real-Time Monitoring

tail -f <file> — Follow a log file in real-time as new entries are appended.

tail -f /var/log/nginx/error.log

tail -f <file> | grep --line-buffered '<pattern>' — Follow a log and filter for a specific pattern in real-time.

tail -f /var/log/app.log | grep --line-buffered 'ERROR'

tail -f <file1> <file2> — Follow multiple log files simultaneously with filename headers.

tail -f /var/log/nginx/access.log /var/log/nginx/error.log

tail -F <file> — Follow with retry. Keeps following even after log rotation.

tail -F /var/log/app.log

journalctl -f -u <service> — Follow systemd journal output for a specific service.

journalctl -f -u nginx

tail -f <file> | awk '$9 >= 400 {print}' — Monitor access log and show only error responses in real-time.

tail -f access.log | awk '$9 >= 400 {print}'

tail -f <file> | while read line; do echo "$(date +%T) $line"; done — Follow a log and prepend wall-clock timestamps to each line.

tail -f app.log | while read line; do echo "$(date +%T) $line"; done

Compressed & Rotated Logs

zcat <file.gz> — View the full contents of a gzipped log file.

zcat /var/log/syslog.2.gz

zgrep '<pattern>' <file.gz> — Search inside gzipped log files without decompressing.

zgrep 'ERROR' /var/log/app.log.*.gz

zless <file.gz> — Browse a gzipped log file interactively with a pager.

zless /var/log/syslog.3.gz

zcat <files.gz> | grep '<pattern>' — Search across multiple compressed log files.

zcat /var/log/nginx/access.log.*.gz | grep '500'

cat <current> <(zcat <rotated.gz>) | grep '<pattern>' — Search across both current and rotated compressed logs.

cat access.log <(zcat access.log.*.gz) | grep 'POST /api/login'

Systemd Journal (journalctl)

journalctl -u <service> — Show all journal entries for a specific systemd service.

journalctl -u nginx

journalctl -u <service> -n <count> — Show the last N entries for a service.

journalctl -u mysql -n 50

journalctl -p <priority> — Filter by priority level: emerg, alert, crit, err, warning, notice, info, debug.

journalctl -p err

journalctl -p <priority> --since '<time>' — Show entries of a specific priority since a given time.

journalctl -p warning --since '1 hour ago'

journalctl --since '<start>' --until '<end>' — Query entries within a time range.

journalctl --since '2026-02-16 08:00' --until '2026-02-16 12:00'

journalctl -u <service> --no-pager -o json-pretty — Output journal entries as formatted JSON for further processing.

journalctl -u nginx --no-pager -o json-pretty | head -100

journalctl --disk-usage — Show how much disk space the journal is using.

journalctl -k — Show only kernel messages (equivalent to dmesg).

journalctl -k -p err

Multi-Line Log Entries

grep -A <n> '<pattern>' <file> — Show a fixed number of lines after each match (e.g. stack traces).

grep -A 20 'Exception' /var/log/app.log

awk '/<start>/{found=1} found; /<end>/{found=0}' <file> — Extract blocks between a start and end pattern.

awk '/BEGIN STACKTRACE/{found=1} found; /END STACKTRACE/{found=0}' app.log

grep -Pzo '(?s)<start>.*?<end>' <file> — Extract multi-line blocks using Perl-compatible regex.

grep -Pzo '(?s)Exception.*?\n\n' app.log

awk '/^[0-9]{4}-/{if(buf && buf ~ /<pattern>/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /<pattern>/) print buf}' <file> — Collect multi-line entries starting with a timestamp and filter by pattern.

awk '/^[0-9]{4}-/{if(buf && buf ~ /ERROR/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /ERROR/) print buf}' app.log

Field Extraction with awk

awk '{print $<n>}' <file> — Print a specific field (column) from each log line.

awk '{print $1}' access.log

awk -F'<sep>' '{print $<n>}' <file> — Split lines by a custom field separator and extract a field.

awk -F'|' '{print $3}' app.log

awk -F'"' '{print $2}' <file> — Extract the request line from Apache/Nginx Combined Log Format.

awk -F'"' '{print $2}' access.log

awk '{print $NF}' <file> — Print the last field of each line.

awk '{print $NF}' access.log

awk '{for(i=<start>;i<=NF;i++) printf "%s ", $i; print ""}' <file> — Print all fields from field N to end of line (skip leading columns).

awk '{for(i=4;i<=NF;i++) printf "%s ", $i; print ""}' /var/log/syslog

Sorting & Deduplication

sort <file> | uniq — Remove duplicate lines from sorted output.

awk '{print $7}' access.log | sort | uniq

sort -u <file> — Sort and remove duplicates in one step.

awk '{print $1}' access.log | sort -u

sort <file> | uniq -c | sort -rn — Count occurrences and sort by frequency (descending).

awk '{print $7}' access.log | sort | uniq -c | sort -rn

sort <file> | uniq -d — Show only lines that appear more than once (duplicates only).

awk '{print $1}' access.log | sort | uniq -d

sort -t'<sep>' -k<field> -rn <file> — Sort by a specific field numerically in reverse order.

du -sh /var/log/*.log | sort -t'\t' -k1 -rh

Practical Recipes

awk '{ip[$1]++} END {for (k in ip) if (ip[k]>100) print ip[k], k}' access.log — Find IP addresses with more than 100 requests (potential abuse).

awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20 — Top 20 URLs returning 404 errors.

awk -F'"' '$2 ~ /POST/ {print $2}' access.log | sort | uniq -c | sort -rn — Count POST requests grouped by URL.

grep 'ERROR' app.log | awk '{print $1, $2}' | cut -d: -f1,2 | uniq -c — Count errors grouped by hour for trend analysis.

diff <(grep 'ERROR' app.log.1) <(grep 'ERROR' app.log) — Compare error patterns between yesterday's and today's log.

awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' <file> — Show the first and last log entry to determine the time span of a log file.

awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' access.log

awk '{total+=$10; count++} END {printf "Avg: %.0f bytes (%d requests)\n", total/count, count}' access.log — Calculate the average response size across all requests.

awk '$NF > <seconds>' <file> — Find slow requests where the last field contains response time.

awk '$NF > 5.0' access.log

Conclusion

For ad-hoc analysis you do not need a heavy toolkit: grep, awk and the familiar sort | uniq -c | sort -rn chain reliably cover top IPs, status codes, URLs and bandwidth, while tail -f turns the same data into live monitoring. Once those same reports become a daily task or you want a dashboard, though, it pays to move to a dedicated tool such as GoAccess. Either way, remember to adapt the field indices to your actual log format.

Further Reading

  • apache – web server whose access and error logs you analyze here
  • caddy – modern web server with its own (often JSON) log format
  • certbot – manage TLS certificates whose renewals show up in the log