Skip to content

Linux Internals Study Guide

Scenarios

linux-internals-study-guide

Troubleshooting Scenarios¶

Practice scenarios to prepare for interview troubleshooting questions.

Scenario 1: High CPU Usage¶

Situation: Production web server showing 95% CPU usage.Application is slow.

Your investigation:

"What recent changes were made?"
Check which process: top, ps aux --sort=-%cpu
Multiple Apache workers at 100% CPU
Check threads: top -H -p <PID>
System call trace: strace -p <PID>
See many accept() calls and processing
Check connections: ss -tan | wc -l → 10,000 connections
Check Apache config: MaxClients too high? Attack?
Check logs: /var/log/apache2/access.log → Many requests from same IPs
Diagnosis: DDoS attack

Actions: Rate limiting, block IPs, scale capacity

Scenario 2: Memory Leak¶

Situation: Application memory usage growing over time, eventually OOM killed.

Investigation:

Monitor memory: watch 'ps -p <PID> -o pid,vsz,rss'
Memory increases linearly
Check memory breakdown: pmap -x <PID>
Large heap allocation
Run with debugging: valgrind --leak-check=full ./app
Shows memory allocated but not freed
Diagnosis: Memory leak in code

Actions: Fix leak, monitor, restart process periodically as workaround

Scenario 3: Disk Full¶

Situation: Server stops working, errors about disk space.

Investigation:

Check filesystems: df -h → /var at 100%
Find large files: du -sh /var/* → /var/log is 50GB
Find specific files: du -ah /var/log | sort -rh | head -20
Old rotated logs not deleted
Check logrotate config: /etc/logrotate.d/
Logrotate not running? Check cron
Diagnosis: Logrotate misconfigured

Actions: Delete old logs, fix logrotate, add monitoring

Scenario 4: Network Connectivity¶

Situation: Can't SSH to server, but it's pingable.

Investigation:

Ping works: ping 192.168.1.10 ✓
SSH timeout: ssh user@192.168.1.10
Check if SSH listening: nmap -p 22 192.168.1.10 → Filtered
Try from different location → Same result
Login via console/KVM
Check SSH status: systemctl status sshd → Running
Check listening: ss -tln | grep :22 → Listening on 0.0.0.0:22
Check firewall: iptables -L -n -v
INPUT chain DROPs port 22
Diagnosis: Firewall rule blocking SSH

Actions: Fix firewall rule, investigate who changed it

Scenario 5: Slow DNS Resolution¶

Situation: Applications slow, DNS lookups taking 5-10 seconds.

Investigation:

Test DNS: time nslookup google.com → 8 seconds
Check resolv.conf: cat /etc/resolv.conf → nameserver 8.8.8.8
Ping nameserver: ping 8.8.8.8 → Timeout
Network issue to Google DNS
Try local resolver: dig @192.168.1.1 google.com → Fast
Check route: traceroute 8.8.8.8 → Times out at firewall
Diagnosis: Firewall blocking outbound DNS to 8.8.8.8

Actions: Use local DNS, fix firewall, redundant DNS servers

Scenario 6: Process Won't Die¶

Situation: Process won't terminate even with kill -9.

Investigation:

Try SIGKILL: kill -9 <PID> → Still running
Check process state: ps aux | grep <PID> → State 'D'
'D' = uninterruptible sleep (usually I/O)
Check what it's doing: cat /proc/<PID>/wchan → Shows kernel function
Stuck in kernel doing I/O
Check disk: iostat -x 1 → Device /dev/sdb has high await, 100% util
Check dmesg: dmesg | tail → Disk errors
Diagnosis: Failing disk, process stuck in I/O

Actions: Can't kill (kernel operation), fix disk, may need reboot

Scenario 7: Web Server 502 Errors¶

Situation: Nginx returning 502 Bad Gateway errors.

Investigation:

Check Nginx logs: /var/log/nginx/error.log
"Connection refused" to backend (127.0.0.1:8080)
Check backend status: systemctl status app → Failed
Check why failed: journalctl -u app -n 100
"bind: Address already in use"
Check who's on 8080: ss -tlnp | grep :8080 → Different process
Rogue process on backend port
Diagnosis: Another process took backend port

Actions: Kill rogue process, start backend, investigate how it happened

Scenario 8: High Load but Low CPU¶

Situation: Load average 20.0 on 4-CPU system, but CPU idle.

Investigation:

Check load and CPU: top → Load 20, CPU 95% idle
Load average includes D state processes
Check D state: ps aux | grep ' D ' → 16 processes in D state
All doing I/O
Check I/O: iostat -x 1 → %iowait high, %util 100%
Disk saturated
Check what I/O: iotop → Database writes
Diagnosis: Disk I/O bottleneck

Actions: Faster disks, optimize queries, add caching

How to Approach Interview Scenarios¶

Ask questions:
- When did it start?
- What changed?
- Intermittent or constant?
- Error messages?
Start broad, narrow down:
- System-level stats first
- Then process-specific
- Finally deep dive
Explain your reasoning:
- Why each command
- What you're looking for
- How it helps
Form hypotheses:
- Based on symptoms
- Test each one
- Adjust based on findings
Document findings:
- What you discovered
- Root cause
- Fix applied

Practice Exercise¶

Create your own scenarios:

Pick a symptom
Work backward to root cause
List investigation steps
Test on actual system if possible