Memory Management¶
Deep Understanding Required
Memory management is a critical topic for Linux Internals interviews. Expect questions that go from high-level concepts down to kernel implementation details, especially around malloc() and virtual memory.
Overview¶
Linux uses sophisticated memory management to provide each process with its own virtual address space while efficiently using physical RAM.
Memory Management Unit (MMU)¶
The MMU is a hardware component that translates virtual addresses to physical addresses.
MMU Functionality¶
graph LR
A[CPU generates Virtual Address] --> B[MMU]
B --> C[Page Table Lookup]
C --> D[Physical Address]
D --> E[Physical RAM]
C --> F[Page Fault]
F --> G[Kernel Handler]
MMU Operations:
- Address Translation: Virtual → Physical using page tables
- Access Control: Check read/write/execute permissions
- Cache Management: TLB (Translation Lookaside Buffer) for fast lookups
- Page Fault Generation: When page not in memory
Translation Lookaside Buffer (TLB)¶
The TLB is a cache of recent virtual-to-physical address translations.
- Purpose: Speed up address translation
- Size: Typically 64-512 entries
- Performance: TLB hit ~0 cycles, TLB miss ~100+ cycles
- Management: Flushed on context switch (expensive!)
Virtual Memory vs Physical Memory¶
Virtual Memory¶
- Per-process: Each process has its own address space
- Size: 32-bit: 4GB, 64-bit: 256TB (typical limit)
- Isolation: Processes can't access each other's memory
- Continuous: Appears as continuous block to process
Physical Memory (RAM)¶
- Shared: All processes share physical RAM
- Limited: Actual RAM size (e.g., 8GB, 16GB)
- Fragmented: Can be physically non-contiguous
- Managed: Kernel allocates and tracks usage
Address Space Layout¶
High Address (0x7FFFFFFF on 32-bit)
┌─────────────────────────────────┐
│ Kernel Space (1GB) │
│ 0xC0000000+ │
├─────────────────────────────────┤
│ Stack (grows down) │
│ ↓ │
│ │
│ Memory-mapped region │
│ (shared libraries, mmap) │
│ │
│ ↑ │
│ Heap (grows up) │
├─────────────────────────────────┤
│ BSS (uninitialized data) │
├─────────────────────────────────┤
│ Data (initialized data) │
├─────────────────────────────────┤
│ Text (code) - read-only │
└─────────────────────────────────┘
Low Address (0x00000000)
Pages and Page Tables¶
Page Structure¶
A page is the basic unit of memory management, typically 4KB (4096 bytes).
# View page size
getconf PAGE_SIZE
# Usually 4096
# View huge page sizes
cat /proc/meminfo | grep Huge
Page Sizes:
- Regular: 4KB (most common)
- Huge Pages: 2MB (x86-64)
- Gigantic Pages: 1GB (x86-64)
Page Tables¶
Multi-level page tables map virtual to physical addresses:
4-Level Page Table (x86-64):
Virtual Address (48 bits used)
├── PGD Index (9 bits) → Page Global Directory
├── PUD Index (9 bits) → Page Upper Directory
├── PMD Index (9 bits) → Page Middle Directory
├── PTE Index (9 bits) → Page Table Entry
└── Offset (12 bits) → Within physical page
Page Table Entry (PTE) flags:
- Present: Page is in RAM
- Read/Write: Write permission
- User/Supervisor: User-space accessible
- Accessed: Page has been read
- Dirty: Page has been written
- Page Size: Use larger page
- No Execute (NX): Cannot execute code
# View process page table
cat /proc/<PID>/pagemap # Binary format
cat /proc/<PID>/maps # Human readable
Heap vs Stack¶
Stack¶
Characteristics:
- LIFO: Last In, First Out
- Automatic: Variables created/destroyed automatically
- Fixed size: Typically 8MB (ulimit -s)
- Fast: Simple pointer manipulation
- Thread-local: Each thread has its own stack
- Grows down: From high to low addresses
Used for:
- Local variables
- Function parameters
- Return addresses
- Function call frames
void function() {
int local_var = 42; // On stack
char buffer[1024]; // On stack
} // Automatically freed when function returns
Stack Frame:
High Address
┌──────────────────┐
│ Previous Frame │
├──────────────────┤
│ Return Address │
├──────────────────┤
│ Saved Registers │
├──────────────────┤
│ Local Variables │
├──────────────────┤
│ Parameters │
└──────────────────┘ ← Stack Pointer (SP)
Low Address
Heap¶
Characteristics:
- Dynamic: Manual allocation/deallocation
- Variable size: Can grow up to virtual memory limit
- Slower: Requires system calls or allocator management
- Shared: Shared among threads
- Grows up: From low to high addresses
- Fragmentation: Can become fragmented
Used for:
- Dynamic arrays
- Long-lived objects
- Large data structures
- Shared memory
void function() {
int *ptr = malloc(1024); // On heap
// ... use ptr ...
free(ptr); // Must manually free
}
Comparison¶
| Feature | Stack | Heap |
|---|---|---|
| Allocation | Automatic | Manual (malloc/free) |
| Speed | Fast | Slower |
| Size | Limited (MB) | Large (GB) |
| Lifetime | Function scope | Until freed |
| Fragmentation | No | Yes |
| Thread safety | Per-thread | Shared, needs locks |
malloc() Implementation¶
Key Interview Question
"How does malloc() work?" - Expect to explain both the glibc implementation and kernel-level details.
High-Level Flow¶
What happens:
- Check size: Small (<128KB) or large allocation?
- Small allocations: Use
brk()/sbrk()to expand heap - Large allocations: Use
mmap()to get anonymous memory - Internal management: glibc maintains free lists
- Return pointer: To allocated memory
glibc malloc (ptmalloc2)¶
Internal Structures:
Bins: Free chunks organized by size
- Fast bins: Small, recently freed chunks (LIFO)
- Small bins: Small chunks (16-512 bytes)
- Large bins: Large chunks (512+ bytes)
- Unsorted bin: Recently freed chunks
Chunk Structure:
struct malloc_chunk {
size_t prev_size; // Size of previous chunk (if free)
size_t size; // Size of this chunk
void *fd; // Forward pointer (free list)
void *bk; // Backward pointer (free list)
// ... user data ...
};
brk() vs mmap()¶
brk() System Call¶
Characteristics:
- Extends program break: Top of heap
- Linear growth: Heap grows continuously
- Fast: Single system call
- Cannot free middle: Can only shrink from top
- Used for: Small allocations (<128KB)
Before brk():
Heap: [====] ← program break
After brk():
Heap: [========] ← new program break (expanded)
mmap() System Call¶
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
Characteristics:
- Arbitrary location: Can allocate anywhere
- Page-aligned: Always multiples of page size
- Can free individually: With
munmap() - Used for: Large allocations (>128KB), file mapping
- Flexible: Can set permissions, share memory
// Anonymous memory allocation
void *ptr = mmap(NULL, 8192,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
Flags:
MAP_PRIVATE: Changes not visible to other processesMAP_SHARED: Changes shared with other processesMAP_ANONYMOUS: Not backed by fileMAP_FIXED: Use specified address
free() Implementation¶
What happens:
- Get chunk header: Access metadata before ptr
- Check neighbors: Are adjacent chunks free?
- Coalesce: Merge with adjacent free chunks
- Add to bin: Place in appropriate free list
- Release to OS: If large chunk or top chunk, may call
munmap()orbrk()
Page Faults¶
A page fault occurs when a process accesses a memory page that's not currently in physical RAM.
Types of Page Faults¶
Minor Page Fault¶
Page is in memory but not mapped in process's page table.
Causes:
- Copy-on-Write (COW)
- Lazy loading
- Page reclaimed but still in cache
Resolution:
- Fast (~microseconds)
- No disk I/O
- Update page table
Major Page Fault¶
Page must be loaded from disk.
Causes:
- First access to memory-mapped file
- Page swapped to disk
- Page never loaded
Resolution:
- Slow (~milliseconds)
- Disk I/O required
- Update page table after load
Page Fault Handling¶
graph TD
A[Process accesses memory] --> B{Page in RAM?}
B -->|Yes| C{Mapped in page table?}
C -->|Yes| D[Access succeeds]
C -->|No| E[Minor page fault]
E --> F[Update page table]
F --> D
B -->|No| G[Major page fault]
G --> H[Find/Load page from disk]
H --> F
Kernel handler steps:
- CPU generates fault: Page not present
- Kernel handles interrupt
- Determine fault type: Minor or major
- Locate page: Page cache, swap, file
- Allocate physical frame: If needed
- Load data: From disk if major fault
- Update page table: Map virtual to physical
- Flush TLB: Invalidate cached entry
- Resume execution: Return to faulting instruction
# Monitor page faults
time ./program # Shows major/minor faults
/usr/bin/time -v ./program # Detailed stats
ps -o min_flt,maj_flt <PID> # Per-process faults
Swapping and Swap Space¶
Swap Space¶
When physical RAM is full, Linux moves inactive pages to swap space on disk.
# View swap usage
free -h
swapon --show
# Add swap file
dd if=/dev/zero of=/swapfile bs=1M count=1024
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
# Remove swap
swapoff /swapfile
Swap Process¶
Swap Out (memory → disk):
- Select victim pages: Least recently used
- Write to swap: If page is dirty
- Update page table: Mark as swapped
- Free physical frame: For other use
Swap In (disk → memory):
- Page fault occurs: Access swapped page
- Allocate physical frame
- Read from swap: Load page data
- Update page table: Map to new frame
- Resume execution
Swappiness¶
Controls tendency to swap vs reclaim page cache.
# View current swappiness (default: 60)
cat /proc/sys/vm/swappiness
# Set swappiness (0-100)
# 0 = avoid swapping, prefer OOM
# 100 = aggressive swapping
echo 10 | sudo tee /proc/sys/vm/swappiness
Out of Memory (OOM) Killer¶
When the system runs out of memory and cannot free enough, the OOM killer terminates processes.
OOM Killer Selection¶
The kernel scores each process based on:
- Memory usage (higher = worse)
- Age (newer = worse)
- Priority (nice value)
- OOM adjust value
# View OOM score
cat /proc/<PID>/oom_score
# Adjust OOM score (-1000 to 1000)
echo -500 > /proc/<PID>/oom_score_adj # Less likely to be killed
echo 1000 > /proc/<PID>/oom_score_adj # Very likely to be killed
# Disable OOM killer for process
echo -1000 > /proc/<PID>/oom_score_adj
OOM Killer Messages¶
Example output:
Out of memory: Kill process 1234 (myapp) score 800 or sacrifice child
Killed process 1234 (myapp) total-vm:4000000kB, anon-rss:2000000kB, file-rss:0kB
Memory-Mapped Files¶
Use mmap() to map files directly into process memory.
Benefits¶
- Efficient I/O: No separate read/write calls
- Shared memory: Multiple processes can map same file
- Lazy loading: Pages loaded on-demand
- Automatic sync: Changes written back to file
Example¶
#include <sys/mman.h>
#include <fcntl.h>
int fd = open("data.bin", O_RDWR);
struct stat sb;
fstat(fd, &sb);
// Map file into memory
void *mapped = mmap(NULL, sb.st_size,
PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
// Access file as memory
char *data = (char *)mapped;
data[0] = 'A'; // Modifies file
// Sync changes to disk
msync(mapped, sb.st_size, MS_SYNC);
// Unmap
munmap(mapped, sb.st_size);
close(fd);
Buffer Cache and Page Cache¶
Page Cache¶
The page cache stores pages of files in RAM to speed up file I/O.
Contents:
- Regular file pages
- Block device pages
- Memory-mapped file pages
# View page cache usage
free -h
cat /proc/meminfo | grep -i cache
# Clear page cache (don't do in production!)
sync
echo 3 > /proc/sys/vm/drop_caches
Buffer Cache¶
The buffer cache stores block device metadata (historically separate, now unified with page cache).
Contains:
- Superblocks
- Inodes
- Directory entries
- Block bitmaps
Copy-on-Write (COW)¶
COW is an optimization that delays copying memory until a write occurs.
How COW Works¶
After fork():
- Share pages: Child and parent point to same physical pages
- Mark read-only: Set pages as copy-on-write
- On write: Page fault handler creates copy
graph LR
A[Parent process] -->|fork| B[Child process]
A -->|shares| C[Physical Pages read-only]
B -->|shares| C
A -->|write| D[Copy page]
D --> E[Parent has own copy]
C --> F[Child still has original]
Benefits¶
- Fast fork(): No immediate copying
- Memory efficient: Only copy pages that change
- Common case: Many processes fork then exec (no writes)
Implementation¶
Page table flags:
- Both processes have read-only PTEs
- On write attempt:
- Page fault occurs
- Kernel allocates new physical page
- Copies data to new page
- Updates writing process's PTE
- Makes page writable
Memory Zones¶
The kernel divides physical memory into zones:
Zone Types¶
ZONE_DMA (0-16MB)
- Legacy DMA devices
- Old hardware that can't address >16MB
ZONE_DMA32 (0-4GB)
- 32-bit DMA devices
- Devices that can't address >4GB
ZONE_NORMAL (varies)
- Normal allocations
- Directly mapped by kernel
ZONE_HIGHMEM (32-bit only, >896MB)
- Not directly mapped by kernel
- Requires temporary mapping
- Not present on 64-bit systems
Practice Questions¶
- Explain the complete flow of malloc() for a 1KB allocation.
- What is the difference between minor and major page faults?
- Why does fork() not immediately copy all memory?
- When would you use brk() vs mmap()?
- How does the OOM killer decide which process to kill?
- What is the purpose of the TLB and why is it important?
- Explain the difference between virtual and physical memory.
- What happens to memory-mapped files when a process crashes?
Further Reading¶
man 3 mallocman 2 mmap,man 2 brkman 5 proc- /proc/meminfo, /proc/PID/maps- "The Linux Programming Interface" Chapters 6-8
- "Understanding the Linux Virtual Memory Manager" by Mel Gorman