Memory Management¶

Deep Understanding Required

Memory management is a critical topic for Linux Internals interviews. Expect questions that go from high-level concepts down to kernel implementation details, especially around malloc() and virtual memory.

Overview¶

Linux uses sophisticated memory management to provide each process with its own virtual address space while efficiently using physical RAM.

Memory Management Unit (MMU)¶

The MMU is a hardware component that translates virtual addresses to physical addresses.

MMU Functionality¶

graph LR
    A[CPU generates Virtual Address] --> B[MMU]
    B --> C[Page Table Lookup]
    C --> D[Physical Address]
    D --> E[Physical RAM]
    C --> F[Page Fault]
    F --> G[Kernel Handler]

MMU Operations:

Address Translation: Virtual → Physical using page tables
Access Control: Check read/write/execute permissions
Cache Management: TLB (Translation Lookaside Buffer) for fast lookups
Page Fault Generation: When page not in memory

Translation Lookaside Buffer (TLB)¶

The TLB is a cache of recent virtual-to-physical address translations.

Purpose: Speed up address translation
Size: Typically 64-512 entries
Performance: TLB hit ~0 cycles, TLB miss ~100+ cycles
Management: Flushed on context switch (expensive!)

# View TLB information
cat /proc/cpuinfo | grep -i tlb

Virtual Memory vs Physical Memory¶

Virtual Memory¶

Per-process: Each process has its own address space
Size: 32-bit: 4GB, 64-bit: 256TB (typical limit)
Isolation: Processes can't access each other's memory
Continuous: Appears as continuous block to process

Physical Memory (RAM)¶

Shared: All processes share physical RAM
Limited: Actual RAM size (e.g., 8GB, 16GB)
Fragmented: Can be physically non-contiguous
Managed: Kernel allocates and tracks usage

Address Space Layout¶

High Address (0x7FFFFFFF on 32-bit)
┌─────────────────────────────────┐
│      Kernel Space (1GB)         │
│         0xC0000000+             │
├─────────────────────────────────┤
│      Stack (grows down)         │
│            ↓                    │
│                                 │
│     Memory-mapped region        │
│   (shared libraries, mmap)      │
│                                 │
│            ↑                    │
│      Heap (grows up)            │
├─────────────────────────────────┤
│      BSS (uninitialized data)   │
├─────────────────────────────────┤
│      Data (initialized data)    │
├─────────────────────────────────┤
│      Text (code) - read-only    │
└─────────────────────────────────┘
Low Address (0x00000000)

Pages and Page Tables¶

Page Structure¶

A page is the basic unit of memory management, typically 4KB (4096 bytes).

# View page size
getconf PAGE_SIZE
# Usually 4096

# View huge page sizes
cat /proc/meminfo | grep Huge

Page Sizes:

Regular: 4KB (most common)
Huge Pages: 2MB (x86-64)
Gigantic Pages: 1GB (x86-64)

Page Tables¶

Multi-level page tables map virtual to physical addresses:

4-Level Page Table (x86-64):

Virtual Address (48 bits used)
├── PGD Index (9 bits) → Page Global Directory
├── PUD Index (9 bits) → Page Upper Directory
├── PMD Index (9 bits) → Page Middle Directory
├── PTE Index (9 bits) → Page Table Entry
└── Offset (12 bits)   → Within physical page

Page Table Entry (PTE) flags:

Present: Page is in RAM
Read/Write: Write permission
User/Supervisor: User-space accessible
Accessed: Page has been read
Dirty: Page has been written
Page Size: Use larger page
No Execute (NX): Cannot execute code

# View process page table
cat /proc/<PID>/pagemap  # Binary format
cat /proc/<PID>/maps     # Human readable

Heap vs Stack¶

Stack¶

Characteristics:

LIFO: Last In, First Out
Automatic: Variables created/destroyed automatically
Fixed size: Typically 8MB (ulimit -s)
Fast: Simple pointer manipulation
Thread-local: Each thread has its own stack
Grows down: From high to low addresses

Used for:

Local variables
Function parameters
Return addresses
Function call frames

void function() {
    int local_var = 42;        // On stack
    char buffer[1024];         // On stack
}  // Automatically freed when function returns

Stack Frame:

High Address
┌──────────────────┐
│  Previous Frame  │
├──────────────────┤
│  Return Address  │
├──────────────────┤
│  Saved Registers │
├──────────────────┤
│  Local Variables │
├──────────────────┤
│  Parameters      │
└──────────────────┘ ← Stack Pointer (SP)
Low Address

Heap¶

Characteristics:

Dynamic: Manual allocation/deallocation
Variable size: Can grow up to virtual memory limit
Slower: Requires system calls or allocator management
Shared: Shared among threads
Grows up: From low to high addresses
Fragmentation: Can become fragmented

Used for:

Dynamic arrays
Long-lived objects
Large data structures
Shared memory

void function() {
    int *ptr = malloc(1024);   // On heap
    // ... use ptr ...
    free(ptr);                 // Must manually free
}

Comparison¶

Feature	Stack	Heap
Allocation	Automatic	Manual (malloc/free)
Speed	Fast	Slower
Size	Limited (MB)	Large (GB)
Lifetime	Function scope	Until freed
Fragmentation	No	Yes
Thread safety	Per-thread	Shared, needs locks

malloc() Implementation¶

Key Interview Question

"How does malloc() work?" - Expect to explain both the glibc implementation and kernel-level details.

High-Level Flow¶

void *ptr = malloc(1024);

What happens:

Check size: Small (<128KB) or large allocation?
Small allocations: Use brk()/sbrk() to expand heap
Large allocations: Use mmap() to get anonymous memory
Internal management: glibc maintains free lists
Return pointer: To allocated memory

glibc malloc (ptmalloc2)¶

Internal Structures:

Bins: Free chunks organized by size

Fast bins: Small, recently freed chunks (LIFO)
Small bins: Small chunks (16-512 bytes)
Large bins: Large chunks (512+ bytes)
Unsorted bin: Recently freed chunks

Chunk Structure:

struct malloc_chunk {
    size_t prev_size;   // Size of previous chunk (if free)
    size_t size;        // Size of this chunk
    void *fd;           // Forward pointer (free list)
    void *bk;           // Backward pointer (free list)
    // ... user data ...
};

brk() vs mmap()¶

brk() System Call¶

#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);

Characteristics:

Extends program break: Top of heap
Linear growth: Heap grows continuously
Fast: Single system call
Cannot free middle: Can only shrink from top
Used for: Small allocations (<128KB)

Before brk():
Heap: [====] ← program break

After brk():
Heap: [========] ← new program break (expanded)

mmap() System Call¶

#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Characteristics:

Arbitrary location: Can allocate anywhere
Page-aligned: Always multiples of page size
Can free individually: With munmap()
Used for: Large allocations (>128KB), file mapping
Flexible: Can set permissions, share memory

// Anonymous memory allocation
void *ptr = mmap(NULL, 8192,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS,
                 -1, 0);

Flags:

MAP_PRIVATE: Changes not visible to other processes
MAP_SHARED: Changes shared with other processes
MAP_ANONYMOUS: Not backed by file
MAP_FIXED: Use specified address

free() Implementation¶

free(ptr);

What happens:

Get chunk header: Access metadata before ptr
Check neighbors: Are adjacent chunks free?
Coalesce: Merge with adjacent free chunks
Add to bin: Place in appropriate free list
Release to OS: If large chunk or top chunk, may call munmap() or brk()

Page Faults¶

A page fault occurs when a process accesses a memory page that's not currently in physical RAM.

Types of Page Faults¶

Minor Page Fault¶

Page is in memory but not mapped in process's page table.

Causes:

Copy-on-Write (COW)
Lazy loading
Page reclaimed but still in cache

Resolution:

Fast (~microseconds)
No disk I/O
Update page table

Major Page Fault¶

Page must be loaded from disk.

Causes:

First access to memory-mapped file
Page swapped to disk
Page never loaded

Resolution:

Slow (~milliseconds)
Disk I/O required
Update page table after load

Page Fault Handling¶

graph TD
    A[Process accesses memory] --> B{Page in RAM?}
    B -->|Yes| C{Mapped in page table?}
    C -->|Yes| D[Access succeeds]
    C -->|No| E[Minor page fault]
    E --> F[Update page table]
    F --> D
    B -->|No| G[Major page fault]
    G --> H[Find/Load page from disk]
    H --> F

Kernel handler steps:

CPU generates fault: Page not present
Kernel handles interrupt
Determine fault type: Minor or major
Locate page: Page cache, swap, file
Allocate physical frame: If needed
Load data: From disk if major fault
Update page table: Map virtual to physical
Flush TLB: Invalidate cached entry
Resume execution: Return to faulting instruction

# Monitor page faults
time ./program                    # Shows major/minor faults
/usr/bin/time -v ./program       # Detailed stats
ps -o min_flt,maj_flt <PID>     # Per-process faults

Swapping and Swap Space¶

Swap Space¶

When physical RAM is full, Linux moves inactive pages to swap space on disk.

# View swap usage
free -h
swapon --show

# Add swap file
dd if=/dev/zero of=/swapfile bs=1M count=1024
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Remove swap
swapoff /swapfile

Swap Process¶

Swap Out (memory → disk):

Select victim pages: Least recently used
Write to swap: If page is dirty
Update page table: Mark as swapped
Free physical frame: For other use

Swap In (disk → memory):

Page fault occurs: Access swapped page
Allocate physical frame
Read from swap: Load page data
Update page table: Map to new frame
Resume execution

Swappiness¶

Controls tendency to swap vs reclaim page cache.

# View current swappiness (default: 60)
cat /proc/sys/vm/swappiness

# Set swappiness (0-100)
# 0 = avoid swapping, prefer OOM
# 100 = aggressive swapping
echo 10 | sudo tee /proc/sys/vm/swappiness

Out of Memory (OOM) Killer¶

When the system runs out of memory and cannot free enough, the OOM killer terminates processes.

OOM Killer Selection¶

The kernel scores each process based on:

Memory usage (higher = worse)
Age (newer = worse)
Priority (nice value)
OOM adjust value

# View OOM score
cat /proc/<PID>/oom_score

# Adjust OOM score (-1000 to 1000)
echo -500 > /proc/<PID>/oom_score_adj  # Less likely to be killed
echo 1000 > /proc/<PID>/oom_score_adj  # Very likely to be killed

# Disable OOM killer for process
echo -1000 > /proc/<PID>/oom_score_adj

OOM Killer Messages¶

# View OOM killer logs
dmesg | grep -i "out of memory"
journalctl -k | grep -i oom

Example output:

Out of memory: Kill process 1234 (myapp) score 800 or sacrifice child
Killed process 1234 (myapp) total-vm:4000000kB, anon-rss:2000000kB, file-rss:0kB

Memory-Mapped Files¶

Use mmap() to map files directly into process memory.

Benefits¶

Efficient I/O: No separate read/write calls
Shared memory: Multiple processes can map same file
Lazy loading: Pages loaded on-demand
Automatic sync: Changes written back to file

Example¶

#include <sys/mman.h>
#include <fcntl.h>

int fd = open("data.bin", O_RDWR);
struct stat sb;
fstat(fd, &sb);

// Map file into memory
void *mapped = mmap(NULL, sb.st_size,
                    PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, 0);

// Access file as memory
char *data = (char *)mapped;
data[0] = 'A';  // Modifies file

// Sync changes to disk
msync(mapped, sb.st_size, MS_SYNC);

// Unmap
munmap(mapped, sb.st_size);
close(fd);

Buffer Cache and Page Cache¶

Page Cache¶

The page cache stores pages of files in RAM to speed up file I/O.

Contents:

Regular file pages
Block device pages
Memory-mapped file pages

# View page cache usage
free -h
cat /proc/meminfo | grep -i cache

# Clear page cache (don't do in production!)
sync
echo 3 > /proc/sys/vm/drop_caches

Buffer Cache¶

The buffer cache stores block device metadata (historically separate, now unified with page cache).

Contains:

Superblocks
Inodes
Directory entries
Block bitmaps

Copy-on-Write (COW)¶

COW is an optimization that delays copying memory until a write occurs.

How COW Works¶

pid_t pid = fork();

After fork():

Share pages: Child and parent point to same physical pages
Mark read-only: Set pages as copy-on-write
On write: Page fault handler creates copy

graph LR
    A[Parent process] -->|fork| B[Child process]
    A -->|shares| C[Physical Pages read-only]
    B -->|shares| C
    A -->|write| D[Copy page]
    D --> E[Parent has own copy]
    C --> F[Child still has original]

Benefits¶

Fast fork(): No immediate copying
Memory efficient: Only copy pages that change
Common case: Many processes fork then exec (no writes)

Implementation¶

Page table flags:

Both processes have read-only PTEs
On write attempt:
Page fault occurs
Kernel allocates new physical page
Copies data to new page
Updates writing process's PTE
Makes page writable

Memory Zones¶

The kernel divides physical memory into zones:

Zone Types¶

ZONE_DMA (0-16MB)

Legacy DMA devices
Old hardware that can't address >16MB

ZONE_DMA32 (0-4GB)

32-bit DMA devices
Devices that can't address >4GB

ZONE_NORMAL (varies)

Normal allocations
Directly mapped by kernel

ZONE_HIGHMEM (32-bit only, >896MB)

Not directly mapped by kernel
Requires temporary mapping
Not present on 64-bit systems

# View memory zones
cat /proc/zoneinfo

Practice Questions¶

Explain the complete flow of malloc() for a 1KB allocation.
What is the difference between minor and major page faults?
Why does fork() not immediately copy all memory?
When would you use brk() vs mmap()?
How does the OOM killer decide which process to kill?
What is the purpose of the TLB and why is it important?
Explain the difference between virtual and physical memory.
What happens to memory-mapped files when a process crashes?

Memory Management¶

Overview¶

Memory Management Unit (MMU)¶

MMU Functionality¶

Translation Lookaside Buffer (TLB)¶

Virtual Memory vs Physical Memory¶

Virtual Memory¶

Physical Memory (RAM)¶

Address Space Layout¶

Pages and Page Tables¶

Page Structure¶

Page Tables¶

Heap vs Stack¶

Stack¶

Heap¶

Comparison¶

malloc() Implementation¶

High-Level Flow¶

glibc malloc (ptmalloc2)¶

brk() vs mmap()¶

brk() System Call¶

mmap() System Call¶

free() Implementation¶

Page Faults¶

Types of Page Faults¶

Minor Page Fault¶

Major Page Fault¶

Page Fault Handling¶

Swapping and Swap Space¶

Swap Space¶

Swap Process¶

Swappiness¶

Out of Memory (OOM) Killer¶

OOM Killer Selection¶

OOM Killer Messages¶

Memory-Mapped Files¶

Benefits¶

Example¶

Buffer Cache and Page Cache¶

Page Cache¶

Buffer Cache¶

Copy-on-Write (COW)¶

How COW Works¶

Benefits¶

Implementation¶

Memory Zones¶

Zone Types¶

Practice Questions¶

Further Reading¶