Post

Linux Performance Monitoring

Linux Performance Monitoring Cheat Sheet for Sysadmins

Overview

Linux performance monitoring involves measuring and analyzing system metrics to identify bottlenecks, optimize resource utilization, and ensure system stability. This cheat sheet covers:

  • CPU performance: utilization, context switches, run queues
  • Memory performance: usage, swapping, page faults
  • Disk I/O: throughput, latency, utilization
  • Network performance: connections, throughput, errors
  • System-wide metrics: load average, process accounting
  • Advanced tracing: eBPF, bpftrace, perf, BCC tools for deep dive analysis

Key performance concepts:

  • Utilization: % of time resource is busy
  • Saturation: amount of queued work (backlog)
  • Errors: count of error events
  • Capacity: maximum throughput of resource

Quick Start

Essential One-Liners

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# System overview (CPU, memory, load)
top -b -n 1 | head -20

# Real-time CPU and memory usage
htop

# Disk I/O statistics (all devices)
iostat -xz 1

# Network connections and listening ports
ss -tuln

# Memory usage summary
free -h

# System load and process count
uptime

# CPU performance counters (basic)
perf stat -a sleep 1

# Trace system calls (all processes)
strace -c -p <PID>

CPU Performance Monitoring

Basic CPU Tools

Command Description Example
top Interactive process viewer, CPU usage top -u <user>
htop Enhanced top with colors, tree view htop -p <PID>
mpstat Per-CPU statistics mpstat -P ALL 1
vmstat Virtual memory stats (includes CPU) vmstat 1
sar System activity reporter (historical) sar -u 1 5
dstat Versatile resource statistics dstat -c -d -n 1
nmon Interactive system monitor nmon
uptime Load average and uptime uptime
cat /proc/loadavg Raw load average data cat /proc/loadavg

CPU Metrics Explained

  • %us: User space CPU time
  • %sy: System (kernel) CPU time
  • %ni: Nice priority processes
  • %id: Idle CPU time
  • %wa: I/O wait (CPU idle waiting for I/O)
  • %hi: Hardware interrupts
  • %si: Software interrupts
  • %st: Steal time (virtualized environments)
  • Load Average: Running + queued processes (1-minute, 5-minute, 15-minute)

Advanced CPU Analysis with perf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Profile CPU cycles for 5 seconds
perf record -F 99 -g -a sleep 5

# Show annotated disassembly
perf annotate

# Generate flame graph (requires FlameGraph scripts)
perf record -F 99 -g -a sleep 5
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg

# Top functions by CPU cycles
perf top

# Count specific events
perf stat -e cycles,instructions,cache-misses -a sleep 1

# Trace process scheduling
echo 1 > /proc/sys/kernel/sched_debug
cat /proc/sched_debug

Memory Performance Monitoring

Basic Memory Tools

Command Description Example
free Memory usage summary free -h
vmstat Includes swap and memory stats vmstat 1
smem Proportional set size (PSS) smem -t -p
ps Process memory usage ps aux --sort=-%mem
top Interactive memory view top (press M)
htop Memory columns, tree view htop (F5 for tree)
sar Historical memory stats sar -r 1 5
numastat NUMA memory allocation numastat

Memory Metrics

  • total: Total usable RAM
  • used: Used memory (applications + cache)
  • free: Unused memory
  • shared: Memory used by tmpfs
  • buff/cache: Buffer and cache memory
  • available: Memory available for new apps (without swapping)
  • Swap: Virtual memory on disk
  • PSS: Proportional set size (shared memory divided)
  • RSS: Resident set size (physical memory used)

Advanced Memory Analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Memory leak detection with valgrind
valgrind --leak-check=full ./program

# Page fault statistics
vmstat 1 10

# NUMA balancing statistics
cat /proc/sys/kernel/numa_balancing

# Slab allocator usage
cat /proc/slabinfo

# Memory fragmentation index
grep -E "MemFree|MemTotal" /proc/meminfo

Disk I/O Performance Monitoring

Basic Disk I/O Tools

Command Description Example
iostat Device I/O statistics iostat -xz 1
iotop Process-level I/O iotop
df Disk space usage df -h
du Directory disk usage du -sh /*
lsblk Block device tree lsblk
blkid Block device attributes blkid
fio Flexible I/O tester fio --name=test --rw=randread --bs=4k --ioengine=libaio --iodepth=32 --size=1G --numjobs=4 --runtime=60 --group_reporting

I/O Metrics

  • %util: Device utilization (0-100%)
  • await: Average I/O wait time (ms)
  • svctm: Service time (ms) - often unreliable
  • r/s, w/s: Read/write operations per second
  • rkB/s, wkB/s: Read/write throughput (KB/s)
  • avgqu-sz: Average queue length

Advanced Disk I/O Analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# Block layer tracing with bpftrace
bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }'

# I/O latency distribution
bpftrace -e 'tracepoint:block:block_rq_complete { @latency = hist(nsecs); }'

# Top processes by I/O
sudo /usr/share/bcc/tools/biosnoop

# Disk I/O flame graph
perf record -e block:block_rq_issue -a sleep 10
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > io-flamegraph.svg

# File system latency
sudo /usr/share/bcc/tools/ext4slower 1

Network Performance Monitoring

Basic Network Tools

Command Description Example
ss Socket statistics (replaces netstat) ss -tuln
netstat Legacy network stats netstat -tuln
ip IP configuration and stats ip -s link
ping Connectivity and latency ping -c 10 host
traceroute Path and latency traceroute host
mtr Combined ping/traceroute mtr host
nethogs Per-process network usage nethogs
iftop Bandwidth usage by connection iftop -nP
bmon Bandwidth monitor bmon
sar Network interface stats sar -n DEV 1 5

Network Metrics

  • rx/tx: Received/transmitted packets/bytes
  • errs/drop: Errors and drops
  • fifo: FIFO buffer overrun
  • coll: Collisions (Ethernet)
  • rx/tx-queue: Network driver queue lengths
  • TCP metrics: retransmits, timewait, established, listen

Advanced Network Analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# TCP connection tracking
ss -t -i state established

# Network packet capture (basic)
tcpdump -i eth0 -c 100 -w capture.pcap

# Packet capture with filters
tcpdump -i eth0 'tcp port 80'

# Network latency with hping3
hping3 -S -p 80 -c 10 host

# TCP retransmission analysis
ss -t -e state established | grep retrans

# BPF-based packet tracing
bpftrace -e 'tracepoint:skb:skb_birth { @[comm] = count(); }'

# Socket buffer overflow detection
bpftrace -e 'tracepoint:sock:sock_exceed_buf_limit { printf("%s %d\n", comm, sk->sk_sndbuf); }'

System-Wide Performance Monitoring

System Metrics Tools

Command Description Example
uptime Load average and uptime uptime
w Who is logged in and load w
who Logged in users who
last Login history last -a
dmesg Kernel ring buffer dmesg | tail -50
journalctl Systemd journal logs journalctl -u nginx --since "1 hour ago"
sar Historical system data sar -q 1 10
collectl Comprehensive system monitor collectl -s +C +D +N +S

Load Average Interpretation

  • 1-minute: Current load
  • 5-minute: Medium-term load
  • 15-minute: Long-term load

Rule of thumb: Load average should not exceed number of CPU cores for sustained periods.

1
2
3
4
5
6
7
8
# Check CPU count
nproc

# Compare load to CPU count
cat /proc/loadavg | awk '{print $1, $2, $3}'

# Historical load from sar
sar -q 1 10

Process-Level Monitoring

Process Tools

Command Description Example
ps Process status ps aux --sort=-%cpu
top Interactive process viewer top -p <PID>
htop Enhanced process viewer htop
pgrep Find process by name pgrep -f nginx
pkill Kill process by name pkill -f nginx
kill Send signal to process kill -9 <PID>
strace Trace system calls strace -p <PID>
ltrace Trace library calls ltrace -p <PID>
lsof List open files lsof -p <PID>
nice Start process with priority nice -n 19 command
renice Change process priority renice -n 10 -p <PID>
cgroups Control resource usage systemd-cgls

Advanced Process Tracing with bpftrace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Trace process execve (new processes)
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s %s\n", comm, str(args->filename)); }'

# Count process creations by executable
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { @[comm] = count(); }'

# Trace process exits
bpftrace -e 'tracepoint:sched:sched_process_exit { @[comm] = count(); }'

# Top CPU consumers (using BCC)
sudo /usr/share/bcc/tools/cpudist

# Process I/O
sudo /usr/share/bcc/tools/biosnoop

# Process memory allocation
sudo /usr/share/bcc/tools.memleak

Advanced eBPF and BCC Tools

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a revolutionary Linux kernel technology that allows safe, efficient, and programmable tracing without kernel modules. It’s used for:

  • Performance profiling: CPU, memory, I/O, network
  • Security monitoring: Syscall filtering, file access
  • Networking: XDP, TC, socket filters
  • Observability: Dynamic tracing with low overhead

Key eBPF concepts:

  • CO-RE (Compile Once – Run Everywhere): Single BPF bytecode runs on multiple kernel versions
  • BPF tokens: Fine-grained permissions for BPF operations
  • BPF arena: New memory model for BPF programs
  • Tracepoints: Stable kernel instrumentation points
  • Kprobes/Uprobes: Dynamic function entry/exit tracing

Essential BCC Tools

BCC (BPF Compiler Collection) provides ready-to-use eBPF tools. Install: apt install bpfcc-tools or yum install bcc-tools

Tool Purpose Example
execsnoop Trace process execution execsnoop
opensnoop Trace file opens opensnoop -n <process>
ext4slower Slow ext4 operations ext4slower 1
biolatency Block I/O latency biolatency
biosnoop Block I/O requests biosnoop
cachestat Cache hit/miss stats cachestat 1
tcpconnect TCP connection attempts tcpconnect
tcpretrans TCP retransmissions tcpretrans
funccount Count function calls funccount -p <PID> <function>
profile CPU profiling (sampling) profile -F 99 -d 10
offcputime Off-CPU time analysis offcputime -p <PID>
runqlat Run queue latency runqlat
wakeuptime Process wakeup latency wakeuptime
bpfstat BPF program statistics bpfstat

bpftrace One-Liners

bpftrace is a high-level tracing language for eBPF. Install: apt install bpftrace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# System call rate every second
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ = count(); } interval:s:1 { print(@); clear(@); }'

# Top system calls by count
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[probe] = count(); } interval:s:5 { print(@); clear(@); }'

# Trace file opens by specific process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "nginx"/ { printf("%s %s\n", comm, str(args->filename)); }'

# Count read/write bytes per process
bpftrace -e 'tracepoint:syscalls:sys_enter_read /pid > 0/ { @[comm] = sum(args->count); } tracepoint:syscalls:sys_enter_write /pid > 0/ { @[comm] = sum(args->count); } interval:s:10 { print(@); clear(@); }'

# Trace process blocking (off-CPU)
bpftrace -e 'tracepoint:sched:sched_switch /prev_comm != "swapper"/ { @[prev_comm] = hist(delta); }'

# Network packet drops
bpftrace -e 'tracepoint:skb:skb_kfree_skb { @[comm] = count(); }'

# CPU migration events
bpftrace -e 'tracepoint:sched:sched_migrate_task { printf("%s migrated from %d to %d\n", comm, args->orig_cpu, args->dest_cpu); }'

# Kernel function entry/exit
bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'

# Memory allocation (kmalloc)
bpftrace -e 'kprobe:kmalloc { @[probe] = count(); } interval:s:5 { print(@); clear(@); }'

Advanced bpftrace Scripts

Save these as .bt files and run with bpftrace script.bt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#!/usr/bin/env bpftrace
# offcpu.bt - Off-CPU time analysis by stack trace

kprobe:sched_switch
{
  @start[tid] = nsecs;
}

kretprobe:sched_switch
/@start[tid]/
{
  @offcpu[stack] = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}

{
  trunc(@offcpu, -10);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/usr/bin/env bpftrace
# iolatency.bt - Disk I/O latency by device

tracepoint:block:block_rq_issue
{
  @start[args->dev] = nsecs;
}

tracepoint:block:block_rq_complete
/@start[args->dev]/
{
  @latency[devname] = hist(nsecs - @start[args->dev]);
  delete(@start[args->dev]);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/env bpftrace
# tcplife.bt - TCP connection lifetime

tracepoint:tcp:tcp_set_state
/args->newstate == 2/  # TCP_ESTABLISHED
{
  @start[tid] = nsecs;
}

tracepoint:tcp:tcp_set_state
/@start[tid] && args->newstate == 7/  # TCP_CLOSE
{
  @lifetime[comm] = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}

Flame Graphs

Flame graphs visualize profiled software, showing which code paths are hottest (most frequently on-CPU).

Generating CPU Flame Graphs

1
2
3
4
5
6
7
8
9
# Using perf (Linux)
perf record -F 99 -g -a sleep 10
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > cpu-flamegraph.svg

# Using bpftrace (simpler)
bpftrace -e 'profile:hz:99 { @[ustack] = count(); } interval:s:10 { print(@); }' | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > bpftrace-flamegraph.svg

# Using BCC
/usr/share/bcc/tools/profile -F 99 -d 10 | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > bcc-flamegraph.svg

Generating Off-CPU Flame Graphs

Off-CPU time shows where threads block (I/O, locks, etc.).

1
2
3
4
5
# Using bpftrace offcpu.bt script
bpftrace offcpu.bt | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > offcpu-flamegraph.svg

# Using BCC offcputime
/usr/share/bcc/tools/offcputime -p <PID> -f 10 | ./stackcollapse-bcc.pl | ./flamegraph.pl > offcpu-bcc.svg

Flame Graph Resources

  • Download scripts: https://github.com/brendangregg/FlameGraph
  • Interactive zoom: Click any box to zoom in
  • Colors: Warm (red/yellow) = hot paths, cool (blue/green) = cold paths
  • Width = total time spent in that function

Performance Analysis Workflows

CPU Bottleneck Diagnosis

  1. Check overall CPU utilization: top or htop
  2. Identify high CPU processes: ps aux --sort=-%cpu
  3. Profile with perf top or perf record + flame graph
  4. Analyze kernel vs user time: vmstat 1 (us vs sy)
  5. Check run queue: uptime (load average vs CPU count)
  6. Investigate interrupts: vmstat 1 (hi, si)

Memory Bottleneck Diagnosis

  1. Check memory usage: free -h
  2. Identify memory-hog processes: ps aux --sort=-%mem
  3. Check for swapping: vmstat 1 (si/so columns)
  4. Analyze page faults: vmstat 1 (majflt)
  5. NUMA issues: numastat
  6. Slab usage: cat /proc/slabinfo

Disk I/O Bottleneck Diagnosis

  1. Check device utilization: iostat -xz 1 (%util > 80% = saturated)
  2. Identify slow devices: high await, avgqu-sz
  3. Process-level I/O: iotop
  4. Trace I/O with bpftrace: biolatency, biosnoop
  5. File system level: ext4slower, xfsslower
  6. Generate I/O flame graphs

Network Bottleneck Diagnosis

  1. Check interface stats: ip -s link (errors, drops)
  2. Connection count: ss -s
  3. Listening ports: ss -tuln
  4. Packet drops: netstat -s | grep drop
  5. TCP retransmits: ss -t -e
  6. Trace with tcpconnect, tcpretrans

Application Slowdown Diagnosis

  1. Check system metrics (CPU, memory, I/O)
  2. Identify slow processes: top, ps
  3. Trace system calls: strace -p <PID>
  4. Profile CPU: perf record -p <PID>
  5. Analyze off-CPU time: offcputime -p <PID>
  6. Generate flame graphs for both on-CPU and off-CPU

Reference Tables

Metric Thresholds

Metric Warning Critical
CPU Utilization >80% sustained >95% sustained
Load Average >cores * 2 >cores * 4
Memory Usage >90% >95% + swapping
Disk %util >70% >90%
Disk await >20ms >50ms
Network drops >0.1% >1%
TCP retransmits >1% >5%

Tool Quick Reference

Task Tool Command
System overview top top
CPU profiling perf perf record -F 99 -g -a sleep 5
Memory usage free free -h
Disk I/O iostat iostat -xz 1
Network stats ss ss -tuln
Process tree pstree pstree -p
System calls strace strace -p <PID>
Advanced tracing bpftrace bpftrace -e '...'
BCC tools Various /usr/share/bcc/tools/<tool>
Flame graphs perf + scripts See Flame Graphs section

Additional Resources


This cheat sheet provides both essential daily monitoring commands and advanced tracing techniques for comprehensive Linux performance analysis.

This post is licensed under CC BY 4.0 by the author.