Log Analysis — Inspecting Web Server Logs with Built-in Tools
Practical recipes for analyzing Apache and nginx logs with grep, awk, sort and uniq — top IPs, status codes, bandwidth and real-time monitoring.
Before you fire up GoAccess or a full log stack for a quick lookup, the tools already on your system usually get you there: grep, awk, sort, uniq and tail -f answer most questions about Apache and nginx logs in seconds. This recipe collection shows you how to find top IPs, check the distribution of status codes, surface the most-requested URLs, sum up bandwidth and watch errors in real time. Keep in mind that the field indices ($1, $7, $9 …) depend on your actual log format — the examples assume the Combined Log Format. For ad-hoc analysis these one-liners are more than enough; only recurring reports or a dashboard make a dedicated tool worthwhile.
Quick Inspection
wc -l <file> — Count the total number of lines in a log file.
wc -l access.loghead -n <count> <file> — View the first N lines to understand the log format.
head -n 20 /var/log/syslogtail -n <count> <file> — View the most recent N log entries.
tail -n 50 /var/log/nginx/error.logless +F <file> — Open a log file in follow mode (like tail -f but with scrollback). Press Ctrl+C to scroll, F to resume following.
less +F /var/log/syslogfile <logfile> — Detect if a log file is plain text, gzipped, or another format.
file /var/log/syslog.2.gzdu -sh <file> — Check the size of a log file before processing.
du -sh /var/log/nginx/access.logFiltering by Pattern
grep '<pattern>' <file> — Find all lines matching a pattern.
grep 'ERROR' /var/log/app.loggrep -i '<pattern>' <file> — Case-insensitive pattern search.
grep -i 'timeout' /var/log/app.loggrep -E 'ERROR|WARN|FATAL' <file> — Find lines matching any of multiple patterns.
grep -E 'ERROR|WARN|FATAL' /var/log/app.loggrep -v '<pattern>' <file> — Exclude lines matching a pattern (invert match).
grep -v 'healthcheck' access.loggrep -v -e '<a>' -e '<b>' <file> — Exclude lines matching multiple patterns.
grep -v -e 'bot' -e 'crawler' -e 'monitoring' access.loggrep -c '<pattern>' <file> — Count how many lines match a pattern.
grep -c '500' access.loggrep -B <n> -A <n> '<pattern>' <file> — Show context lines before and after each match.
grep -B 3 -A 5 'OutOfMemoryError' app.logTime-Based Filtering
grep '<date>' <file> — Filter log entries for a specific date.
grep '2026-02-16' /var/log/app.loggrep '<date>.*<time_prefix>' <file> — Filter entries for a specific date and hour.
grep '16/Feb/2026:14' access.logawk '$0 >= "<start>" && $0 <= "<end>"' <file> — Extract log entries within a time range (works with ISO timestamps at line start).
awk '$0 >= "2026-02-16 10:00" && $0 <= "2026-02-16 12:00"' app.logsed -n '/<start>/,/<end>/p' <file> — Extract all lines between two timestamp patterns.
sed -n '/2026-02-16 10:00/,/2026-02-16 12:00/p' app.logawk '/^<start>/,/^<end>/' <file> — Extract a range of lines between two matching patterns with awk.
awk '/^2026-02-16 14:00/,/^2026-02-16 15:00/' app.logjournalctl --since '<start>' --until '<end>' — Query systemd journal for a specific time range.
journalctl --since '2026-02-16 10:00' --until '2026-02-16 12:00'Frequency & Counting
sort <file> | uniq -c | sort -rn | head -<n> — Count and rank the most frequent lines.
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20awk '{print $<field>}' <file> | sort | uniq -c | sort -rn — Count occurrences of a specific field (column).
awk '{print $9}' access.log | sort | uniq -c | sort -rngrep -oE '<pattern>' <file> | sort | uniq -c | sort -rn — Extract and count specific patterns from log lines.
grep -oE '\b[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+\b' access.log | sort | uniq -c | sort -rn | headawk '{count[$<field>]++} END {for (k in count) print count[k], k}' <file> | sort -rn — Count field occurrences with awk (faster for large files, single pass).
awk '{count[$9]++} END {for (k in count) print count[k], k}' access.log | sort -rngrep -c '<pattern>' <file1> <file2> <file3> — Count matches per file across multiple log files.
grep -c 'ERROR' /var/log/app.log.*Apache/Nginx Access Logs
awk '{print $9}' <file> | sort | uniq -c | sort -rn — Count HTTP status codes (field 9 in Combined Log Format).
awk '{print $9}' access.log | sort | uniq -c | sort -rnawk '{print $1}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the top N client IP addresses by request count.
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20awk '{print $7}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the most requested URLs.
awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20awk '$9 == <code>' <file> — Filter requests by a specific HTTP status code.
awk '$9 == 404' access.logawk '$9 >= 500' <file> — Find all server errors (5xx status codes).
awk '$9 >= 500' access.logawk '{print $10}' <file> | paste -sd+ | bc — Calculate total bytes transferred (field 10 in Combined Log Format).
awk '$9 == 200 {print $10}' access.log | paste -sd+ | bcawk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' <file> — Calculate total bandwidth in human-readable format.
awk '{sum += $10} END {printf "%.2f GB\n", sum/1073741824}' access.logawk -F'"' '{print $6}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the top User-Agent strings (field 6 when splitting by quotes).
awk -F'"' '{print $6}' access.log | sort | uniq -c | sort -rn | head -10awk '$9 == 404 {print $7}' <file> | sort | uniq -c | sort -rn | head -<n> — Find the most common 404 URLs.
awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20Requests per Time Period
awk '{print substr($4, 2, 17)}' <file> | uniq -c — Count requests per minute (Apache/Nginx Combined Log Format).
awk '{print substr($4, 2, 17)}' access.log | uniq -cawk '{print substr($4, 2, 14)}' <file> | uniq -c — Count requests per hour.
awk '{print substr($4, 2, 14)}' access.log | uniq -cawk '{print substr($4, 2, 17)}' <file> | uniq -c | sort -rn | head -<n> — Find the busiest minutes by request volume (traffic spikes).
awk '{print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10awk '$9 >= 500 {print substr($4, 2, 17)}' <file> | uniq -c — Count server errors per minute to identify error spikes.
awk '$9 >= 500 {print substr($4, 2, 17)}' access.log | uniq -c | sort -rn | head -10Real-Time Monitoring
tail -f <file> — Follow a log file in real-time as new entries are appended.
tail -f /var/log/nginx/error.logtail -f <file> | grep --line-buffered '<pattern>' — Follow a log and filter for a specific pattern in real-time.
tail -f /var/log/app.log | grep --line-buffered 'ERROR'tail -f <file1> <file2> — Follow multiple log files simultaneously with filename headers.
tail -f /var/log/nginx/access.log /var/log/nginx/error.logtail -F <file> — Follow with retry. Keeps following even after log rotation.
tail -F /var/log/app.logjournalctl -f -u <service> — Follow systemd journal output for a specific service.
journalctl -f -u nginxtail -f <file> | awk '$9 >= 400 {print}' — Monitor access log and show only error responses in real-time.
tail -f access.log | awk '$9 >= 400 {print}'tail -f <file> | while read line; do echo "$(date +%T) $line"; done — Follow a log and prepend wall-clock timestamps to each line.
tail -f app.log | while read line; do echo "$(date +%T) $line"; doneCompressed & Rotated Logs
zcat <file.gz> — View the full contents of a gzipped log file.
zcat /var/log/syslog.2.gzzgrep '<pattern>' <file.gz> — Search inside gzipped log files without decompressing.
zgrep 'ERROR' /var/log/app.log.*.gzzless <file.gz> — Browse a gzipped log file interactively with a pager.
zless /var/log/syslog.3.gzzcat <files.gz> | grep '<pattern>' — Search across multiple compressed log files.
zcat /var/log/nginx/access.log.*.gz | grep '500'cat <current> <(zcat <rotated.gz>) | grep '<pattern>' — Search across both current and rotated compressed logs.
cat access.log <(zcat access.log.*.gz) | grep 'POST /api/login'Systemd Journal (journalctl)
journalctl -u <service> — Show all journal entries for a specific systemd service.
journalctl -u nginxjournalctl -u <service> -n <count> — Show the last N entries for a service.
journalctl -u mysql -n 50journalctl -p <priority> — Filter by priority level: emerg, alert, crit, err, warning, notice, info, debug.
journalctl -p errjournalctl -p <priority> --since '<time>' — Show entries of a specific priority since a given time.
journalctl -p warning --since '1 hour ago'journalctl --since '<start>' --until '<end>' — Query entries within a time range.
journalctl --since '2026-02-16 08:00' --until '2026-02-16 12:00'journalctl -u <service> --no-pager -o json-pretty — Output journal entries as formatted JSON for further processing.
journalctl -u nginx --no-pager -o json-pretty | head -100journalctl --disk-usage — Show how much disk space the journal is using.
journalctl -k — Show only kernel messages (equivalent to dmesg).
journalctl -k -p errMulti-Line Log Entries
grep -A <n> '<pattern>' <file> — Show a fixed number of lines after each match (e.g. stack traces).
grep -A 20 'Exception' /var/log/app.logawk '/<start>/{found=1} found; /<end>/{found=0}' <file> — Extract blocks between a start and end pattern.
awk '/BEGIN STACKTRACE/{found=1} found; /END STACKTRACE/{found=0}' app.loggrep -Pzo '(?s)<start>.*?<end>' <file> — Extract multi-line blocks using Perl-compatible regex.
grep -Pzo '(?s)Exception.*?\n\n' app.logawk '/^[0-9]{4}-/{if(buf && buf ~ /<pattern>/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /<pattern>/) print buf}' <file> — Collect multi-line entries starting with a timestamp and filter by pattern.
awk '/^[0-9]{4}-/{if(buf && buf ~ /ERROR/) print buf; buf=$0; next} {buf=buf ORS $0} END {if(buf ~ /ERROR/) print buf}' app.logField Extraction with awk
awk '{print $<n>}' <file> — Print a specific field (column) from each log line.
awk '{print $1}' access.logawk -F'<sep>' '{print $<n>}' <file> — Split lines by a custom field separator and extract a field.
awk -F'|' '{print $3}' app.logawk -F'"' '{print $2}' <file> — Extract the request line from Apache/Nginx Combined Log Format.
awk -F'"' '{print $2}' access.logawk '{print $NF}' <file> — Print the last field of each line.
awk '{print $NF}' access.logawk '{for(i=<start>;i<=NF;i++) printf "%s ", $i; print ""}' <file> — Print all fields from field N to end of line (skip leading columns).
awk '{for(i=4;i<=NF;i++) printf "%s ", $i; print ""}' /var/log/syslogSorting & Deduplication
sort <file> | uniq — Remove duplicate lines from sorted output.
awk '{print $7}' access.log | sort | uniqsort -u <file> — Sort and remove duplicates in one step.
awk '{print $1}' access.log | sort -usort <file> | uniq -c | sort -rn — Count occurrences and sort by frequency (descending).
awk '{print $7}' access.log | sort | uniq -c | sort -rnsort <file> | uniq -d — Show only lines that appear more than once (duplicates only).
awk '{print $1}' access.log | sort | uniq -dsort -t'<sep>' -k<field> -rn <file> — Sort by a specific field numerically in reverse order.
du -sh /var/log/*.log | sort -t'\t' -k1 -rhPractical Recipes
awk '{ip[$1]++} END {for (k in ip) if (ip[k]>100) print ip[k], k}' access.log — Find IP addresses with more than 100 requests (potential abuse).
awk '$9 == 404 {print $7}' access.log | sort | uniq -c | sort -rn | head -20 — Top 20 URLs returning 404 errors.
awk -F'"' '$2 ~ /POST/ {print $2}' access.log | sort | uniq -c | sort -rn — Count POST requests grouped by URL.
grep 'ERROR' app.log | awk '{print $1, $2}' | cut -d: -f1,2 | uniq -c — Count errors grouped by hour for trend analysis.
diff <(grep 'ERROR' app.log.1) <(grep 'ERROR' app.log) — Compare error patterns between yesterday's and today's log.
awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' <file> — Show the first and last log entry to determine the time span of a log file.
awk 'NR==1{start=$0} END{print "First:", start; print "Last:", $0}' access.logawk '{total+=$10; count++} END {printf "Avg: %.0f bytes (%d requests)\n", total/count, count}' access.log — Calculate the average response size across all requests.
awk '$NF > <seconds>' <file> — Find slow requests where the last field contains response time.
awk '$NF > 5.0' access.log Conclusion
For ad-hoc analysis you do not need a heavy toolkit: grep, awk and the familiar sort | uniq -c | sort -rn chain reliably cover top IPs, status codes, URLs and bandwidth, while tail -f turns the same data into live monitoring. Once those same reports become a daily task or you want a dashboard, though, it pays to move to a dedicated tool such as GoAccess. Either way, remember to adapt the field indices to your actual log format.
Further Reading
- GoAccess – interactive real-time log analyzer for Apache and nginx
- GNU Awk manual – complete reference for awk
- grep(1) – manual page – every option of GNU grep