Linux Performance Monitoring

Posted Mar 13, 2026

By Muzammil Shaik 13 min read

Linux Performance Monitoring Cheat Sheet for Sysadmins

Overview

Linux performance monitoring involves measuring and analyzing system metrics to identify bottlenecks, optimize resource utilization, and ensure system stability. This cheat sheet covers:

CPU performance: utilization, context switches, run queues
Memory performance: usage, swapping, page faults
Disk I/O: throughput, latency, utilization
Network performance: connections, throughput, errors
System-wide metrics: load average, process accounting
Advanced tracing: eBPF, bpftrace, perf, BCC tools for deep dive analysis

Key performance concepts:

Utilization: % of time resource is busy
Saturation: amount of queued work (backlog)
Errors: count of error events
Capacity: maximum throughput of resource

Quick Start

Essential One-Liners

# System overview (CPU, memory, load)
top -b -n 1 | head -20

# Real-time CPU and memory usage
htop

# Disk I/O statistics (all devices)
iostat -xz 1

# Network connections and listening ports
ss -tuln

# Memory usage summary
free -h

# System load and process count
uptime

# CPU performance counters (basic)
perf stat -a sleep 1

# Trace system calls (all processes)
strace -c -p <PID>

CPU Performance Monitoring

Basic CPU Tools

Command	Description	Example
`top`	Interactive process viewer, CPU usage	`top -u <user>`
`htop`	Enhanced top with colors, tree view	`htop -p <PID>`
`mpstat`	Per-CPU statistics	`mpstat -P ALL 1`
`vmstat`	Virtual memory stats (includes CPU)	`vmstat 1`
`sar`	System activity reporter (historical)	`sar -u 1 5`
`dstat`	Versatile resource statistics	`dstat -c -d -n 1`
`nmon`	Interactive system monitor	`nmon`
`uptime`	Load average and uptime	`uptime`
`cat /proc/loadavg`	Raw load average data	`cat /proc/loadavg`

CPU Metrics Explained

%us: User space CPU time
%sy: System (kernel) CPU time
%ni: Nice priority processes
%id: Idle CPU time
%wa: I/O wait (CPU idle waiting for I/O)
%hi: Hardware interrupts
%si: Software interrupts
%st: Steal time (virtualized environments)
Load Average: Running + queued processes (1-minute, 5-minute, 15-minute)

Advanced CPU Analysis with perf

# Profile CPU cycles for 5 seconds
perf record -F 99 -g -a sleep 5

# Show annotated disassembly
perf annotate

# Generate flame graph (requires FlameGraph scripts)
perf record -F 99 -g -a sleep 5
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > flamegraph.svg

# Top functions by CPU cycles
perf top

# Count specific events
perf stat -e cycles,instructions,cache-misses -a sleep 1

# Trace process scheduling
echo 1 > /proc/sys/kernel/sched_debug
cat /proc/sched_debug

Memory Performance Monitoring

Basic Memory Tools

Command	Description	Example
`free`	Memory usage summary	`free -h`
`vmstat`	Includes swap and memory stats	`vmstat 1`
`smem`	Proportional set size (PSS)	`smem -t -p`
`ps`	Process memory usage	`ps aux --sort=-%mem`
`top`	Interactive memory view	`top` (press `M`)
`htop`	Memory columns, tree view	`htop` (F5 for tree)
`sar`	Historical memory stats	`sar -r 1 5`
`numastat`	NUMA memory allocation	`numastat`

Memory Metrics

total: Total usable RAM
used: Used memory (applications + cache)
free: Unused memory
shared: Memory used by tmpfs
buff/cache: Buffer and cache memory
available: Memory available for new apps (without swapping)
Swap: Virtual memory on disk
PSS: Proportional set size (shared memory divided)
RSS: Resident set size (physical memory used)

Advanced Memory Analysis

# Memory leak detection with valgrind
valgrind --leak-check=full ./program

# Page fault statistics
vmstat 1 10

# NUMA balancing statistics
cat /proc/sys/kernel/numa_balancing

# Slab allocator usage
cat /proc/slabinfo

# Memory fragmentation index
grep -E "MemFree|MemTotal" /proc/meminfo

Disk I/O Performance Monitoring

Basic Disk I/O Tools

Command	Description	Example
`iostat`	Device I/O statistics	`iostat -xz 1`
`iotop`	Process-level I/O	`iotop`
`df`	Disk space usage	`df -h`
`du`	Directory disk usage	`du -sh /*`
`lsblk`	Block device tree	`lsblk`
`blkid`	Block device attributes	`blkid`
`fio`	Flexible I/O tester	`fio --name=test --rw=randread --bs=4k --ioengine=libaio --iodepth=32 --size=1G --numjobs=4 --runtime=60 --group_reporting`

I/O Metrics

%util: Device utilization (0-100%)
await: Average I/O wait time (ms)
svctm: Service time (ms) - often unreliable
r/s, w/s: Read/write operations per second
rkB/s, wkB/s: Read/write throughput (KB/s)
avgqu-sz: Average queue length

Advanced Disk I/O Analysis

# Block layer tracing with bpftrace
bpftrace -e 'tracepoint:block:block_rq_issue { @[comm] = count(); }'

# I/O latency distribution
bpftrace -e 'tracepoint:block:block_rq_complete { @latency = hist(nsecs); }'

# Top processes by I/O
sudo /usr/share/bcc/tools/biosnoop

# Disk I/O flame graph
perf record -e block:block_rq_issue -a sleep 10
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > io-flamegraph.svg

# File system latency
sudo /usr/share/bcc/tools/ext4slower 1

Network Performance Monitoring

Basic Network Tools

Command	Description	Example
`ss`	Socket statistics (replaces netstat)	`ss -tuln`
`netstat`	Legacy network stats	`netstat -tuln`
`ip`	IP configuration and stats	`ip -s link`
`ping`	Connectivity and latency	`ping -c 10 host`
`traceroute`	Path and latency	`traceroute host`
`mtr`	Combined ping/traceroute	`mtr host`
`nethogs`	Per-process network usage	`nethogs`
`iftop`	Bandwidth usage by connection	`iftop -nP`
`bmon`	Bandwidth monitor	`bmon`
`sar`	Network interface stats	`sar -n DEV 1 5`

Network Metrics

rx/tx: Received/transmitted packets/bytes
errs/drop: Errors and drops
fifo: FIFO buffer overrun
coll: Collisions (Ethernet)
rx/tx-queue: Network driver queue lengths
TCP metrics: retransmits, timewait, established, listen

Advanced Network Analysis

# TCP connection tracking
ss -t -i state established

# Network packet capture (basic)
tcpdump -i eth0 -c 100 -w capture.pcap

# Packet capture with filters
tcpdump -i eth0 'tcp port 80'

# Network latency with hping3
hping3 -S -p 80 -c 10 host

# TCP retransmission analysis
ss -t -e state established | grep retrans

# BPF-based packet tracing
bpftrace -e 'tracepoint:skb:skb_birth { @[comm] = count(); }'

# Socket buffer overflow detection
bpftrace -e 'tracepoint:sock:sock_exceed_buf_limit { printf("%s %d\n", comm, sk->sk_sndbuf); }'

System-Wide Performance Monitoring

System Metrics Tools

Command	Description	Example
`uptime`	Load average and uptime	`uptime`
`w`	Who is logged in and load	`w`
`who`	Logged in users	`who`
`last`	Login history	`last -a`
`dmesg`	Kernel ring buffer	`dmesg \| tail -50`
`journalctl`	Systemd journal logs	`journalctl -u nginx --since "1 hour ago"`
`sar`	Historical system data	`sar -q 1 10`
`collectl`	Comprehensive system monitor	`collectl -s +C +D +N +S`

Load Average Interpretation

1-minute: Current load
5-minute: Medium-term load
15-minute: Long-term load

Rule of thumb: Load average should not exceed number of CPU cores for sustained periods.

# Check CPU count
nproc

# Compare load to CPU count
cat /proc/loadavg | awk '{print $1, $2, $3}'

# Historical load from sar
sar -q 1 10

Process-Level Monitoring

Process Tools

Command	Description	Example
`ps`	Process status	`ps aux --sort=-%cpu`
`top`	Interactive process viewer	`top -p <PID>`
`htop`	Enhanced process viewer	`htop`
`pgrep`	Find process by name	`pgrep -f nginx`
`pkill`	Kill process by name	`pkill -f nginx`
`kill`	Send signal to process	`kill -9 <PID>`
`strace`	Trace system calls	`strace -p <PID>`
`ltrace`	Trace library calls	`ltrace -p <PID>`
`lsof`	List open files	`lsof -p <PID>`
`nice`	Start process with priority	`nice -n 19 command`
`renice`	Change process priority	`renice -n 10 -p <PID>`
`cgroups`	Control resource usage	`systemd-cgls`

Advanced Process Tracing with bpftrace

# Trace process execve (new processes)
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { printf("%s %s\n", comm, str(args->filename)); }'

# Count process creations by executable
bpftrace -e 'tracepoint:syscalls:sys_enter_execve { @[comm] = count(); }'

# Trace process exits
bpftrace -e 'tracepoint:sched:sched_process_exit { @[comm] = count(); }'

# Top CPU consumers (using BCC)
sudo /usr/share/bcc/tools/cpudist

# Process I/O
sudo /usr/share/bcc/tools/biosnoop

# Process memory allocation
sudo /usr/share/bcc/tools.memleak

Advanced eBPF and BCC Tools

What is eBPF?

eBPF (extended Berkeley Packet Filter) is a revolutionary Linux kernel technology that allows safe, efficient, and programmable tracing without kernel modules. It’s used for:

Performance profiling: CPU, memory, I/O, network
Security monitoring: Syscall filtering, file access
Networking: XDP, TC, socket filters
Observability: Dynamic tracing with low overhead

Key eBPF concepts:

CO-RE (Compile Once – Run Everywhere): Single BPF bytecode runs on multiple kernel versions
BPF tokens: Fine-grained permissions for BPF operations
BPF arena: New memory model for BPF programs
Tracepoints: Stable kernel instrumentation points
Kprobes/Uprobes: Dynamic function entry/exit tracing

Essential BCC Tools

BCC (BPF Compiler Collection) provides ready-to-use eBPF tools. Install: apt install bpfcc-tools or yum install bcc-tools

Tool	Purpose	Example
`execsnoop`	Trace process execution	`execsnoop`
`opensnoop`	Trace file opens	`opensnoop -n <process>`
`ext4slower`	Slow ext4 operations	`ext4slower 1`
`biolatency`	Block I/O latency	`biolatency`
`biosnoop`	Block I/O requests	`biosnoop`
`cachestat`	Cache hit/miss stats	`cachestat 1`
`tcpconnect`	TCP connection attempts	`tcpconnect`
`tcpretrans`	TCP retransmissions	`tcpretrans`
`funccount`	Count function calls	`funccount -p <PID> <function>`
`profile`	CPU profiling (sampling)	`profile -F 99 -d 10`
`offcputime`	Off-CPU time analysis	`offcputime -p <PID>`
`runqlat`	Run queue latency	`runqlat`
`wakeuptime`	Process wakeup latency	`wakeuptime`
`bpfstat`	BPF program statistics	`bpfstat`

bpftrace One-Liners

bpftrace is a high-level tracing language for eBPF. Install: apt install bpftrace

# System call rate every second
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @ = count(); } interval:s:1 { print(@); clear(@); }'

# Top system calls by count
bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[probe] = count(); } interval:s:5 { print(@); clear(@); }'

# Trace file opens by specific process
bpftrace -e 'tracepoint:syscalls:sys_enter_openat /comm == "nginx"/ { printf("%s %s\n", comm, str(args->filename)); }'

# Count read/write bytes per process
bpftrace -e 'tracepoint:syscalls:sys_enter_read /pid > 0/ { @[comm] = sum(args->count); } tracepoint:syscalls:sys_enter_write /pid > 0/ { @[comm] = sum(args->count); } interval:s:10 { print(@); clear(@); }'

# Trace process blocking (off-CPU)
bpftrace -e 'tracepoint:sched:sched_switch /prev_comm != "swapper"/ { @[prev_comm] = hist(delta); }'

# Network packet drops
bpftrace -e 'tracepoint:skb:skb_kfree_skb { @[comm] = count(); }'

# CPU migration events
bpftrace -e 'tracepoint:sched:sched_migrate_task { printf("%s migrated from %d to %d\n", comm, args->orig_cpu, args->dest_cpu); }'

# Kernel function entry/exit
bpftrace -e 'kprobe:vfs_read { @start[tid] = nsecs; } kretprobe:vfs_read /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'

# Memory allocation (kmalloc)
bpftrace -e 'kprobe:kmalloc { @[probe] = count(); } interval:s:5 { print(@); clear(@); }'

Advanced bpftrace Scripts

Save these as .bt files and run with bpftrace script.bt

#!/usr/bin/env bpftrace
# offcpu.bt - Off-CPU time analysis by stack trace

kprobe:sched_switch
{
  @start[tid] = nsecs;
}

kretprobe:sched_switch
/@start[tid]/
{
  @offcpu[stack] = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}

{
  trunc(@offcpu, -10);
}

#!/usr/bin/env bpftrace
# iolatency.bt - Disk I/O latency by device

tracepoint:block:block_rq_issue
{
  @start[args->dev] = nsecs;
}

tracepoint:block:block_rq_complete
/@start[args->dev]/
{
  @latency[devname] = hist(nsecs - @start[args->dev]);
  delete(@start[args->dev]);
}

#!/usr/bin/env bpftrace
# tcplife.bt - TCP connection lifetime

tracepoint:tcp:tcp_set_state
/args->newstate == 2/  # TCP_ESTABLISHED
{
  @start[tid] = nsecs;
}

tracepoint:tcp:tcp_set_state
/@start[tid] && args->newstate == 7/  # TCP_CLOSE
{
  @lifetime[comm] = hist(nsecs - @start[tid]);
  delete(@start[tid]);
}

Flame Graphs

Flame graphs visualize profiled software, showing which code paths are hottest (most frequently on-CPU).

Generating CPU Flame Graphs

# Using perf (Linux)
perf record -F 99 -g -a sleep 10
perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > cpu-flamegraph.svg

# Using bpftrace (simpler)
bpftrace -e 'profile:hz:99 { @[ustack] = count(); } interval:s:10 { print(@); }' | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > bpftrace-flamegraph.svg

# Using BCC
/usr/share/bcc/tools/profile -F 99 -d 10 | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > bcc-flamegraph.svg

Generating Off-CPU Flame Graphs

Off-CPU time shows where threads block (I/O, locks, etc.).

# Using bpftrace offcpu.bt script
bpftrace offcpu.bt | ./stackcollapse-bpftrace.pl | ./flamegraph.pl > offcpu-flamegraph.svg

# Using BCC offcputime
/usr/share/bcc/tools/offcputime -p <PID> -f 10 | ./stackcollapse-bcc.pl | ./flamegraph.pl > offcpu-bcc.svg

Flame Graph Resources

Download scripts: https://github.com/brendangregg/FlameGraph
Interactive zoom: Click any box to zoom in
Colors: Warm (red/yellow) = hot paths, cool (blue/green) = cold paths
Width = total time spent in that function

Performance Analysis Workflows

CPU Bottleneck Diagnosis

Check overall CPU utilization: top or htop
Identify high CPU processes: ps aux --sort=-%cpu
Profile with perf top or perf record + flame graph
Analyze kernel vs user time: vmstat 1 (us vs sy)
Check run queue: uptime (load average vs CPU count)
Investigate interrupts: vmstat 1 (hi, si)

Memory Bottleneck Diagnosis

Check memory usage: free -h
Identify memory-hog processes: ps aux --sort=-%mem
Check for swapping: vmstat 1 (si/so columns)
Analyze page faults: vmstat 1 (majflt)
NUMA issues: numastat
Slab usage: cat /proc/slabinfo

Disk I/O Bottleneck Diagnosis

Check device utilization: iostat -xz 1 (%util > 80% = saturated)
Identify slow devices: high await, avgqu-sz
Process-level I/O: iotop
Trace I/O with bpftrace: biolatency, biosnoop
File system level: ext4slower, xfsslower
Generate I/O flame graphs

Network Bottleneck Diagnosis

Check interface stats: ip -s link (errors, drops)
Connection count: ss -s
Listening ports: ss -tuln
Packet drops: netstat -s | grep drop
TCP retransmits: ss -t -e
Trace with tcpconnect, tcpretrans

Application Slowdown Diagnosis

Check system metrics (CPU, memory, I/O)
Identify slow processes: top, ps
Trace system calls: strace -p <PID>
Profile CPU: perf record -p <PID>
Analyze off-CPU time: offcputime -p <PID>
Generate flame graphs for both on-CPU and off-CPU

Reference Tables

Metric Thresholds

Metric	Warning	Critical
CPU Utilization	>80% sustained	>95% sustained
Load Average	>cores * 2	>cores * 4
Memory Usage	>90%	>95% + swapping
Disk %util	>70%	>90%
Disk await	>20ms	>50ms
Network drops	>0.1%	>1%
TCP retransmits	>1%	>5%

Tool Quick Reference

Task	Tool	Command
System overview	`top`	`top`
CPU profiling	`perf`	`perf record -F 99 -g -a sleep 5`
Memory usage	`free`	`free -h`
Disk I/O	`iostat`	`iostat -xz 1`
Network stats	`ss`	`ss -tuln`
Process tree	`pstree`	`pstree -p`
System calls	`strace`	`strace -p <PID>`
Advanced tracing	`bpftrace`	`bpftrace -e '...'`
BCC tools	Various	`/usr/share/bcc/tools/<tool>`
Flame graphs	`perf` + scripts	See Flame Graphs section

Additional Resources

This cheat sheet provides both essential daily monitoring commands and advanced tracing techniques for comprehensive Linux performance analysis.

Cheatsheet

This post is licensed under CC BY 4.0 by the author.