Skip to content

Memory Management

Deep Understanding Required

Memory management is a critical topic for Linux Internals interviews. Expect questions that go from high-level concepts down to kernel implementation details, especially around malloc() and virtual memory.

Overview

Linux uses sophisticated memory management to provide each process with its own virtual address space while efficiently using physical RAM.

Memory Management Unit (MMU)

The MMU is a hardware component that translates virtual addresses to physical addresses.

MMU Functionality

graph LR
    A[CPU generates Virtual Address] --> B[MMU]
    B --> C[Page Table Lookup]
    C --> D[Physical Address]
    D --> E[Physical RAM]
    C --> F[Page Fault]
    F --> G[Kernel Handler]

MMU Operations:

  1. Address Translation: Virtual → Physical using page tables
  2. Access Control: Check read/write/execute permissions
  3. Cache Management: TLB (Translation Lookaside Buffer) for fast lookups
  4. Page Fault Generation: When page not in memory

Translation Lookaside Buffer (TLB)

The TLB is a cache of recent virtual-to-physical address translations.

  • Purpose: Speed up address translation
  • Size: Typically 64-512 entries
  • Performance: TLB hit ~0 cycles, TLB miss ~100+ cycles
  • Management: Flushed on context switch (expensive!)
# View TLB information
cat /proc/cpuinfo | grep -i tlb

Virtual Memory vs Physical Memory

Virtual Memory

  • Per-process: Each process has its own address space
  • Size: 32-bit: 4GB, 64-bit: 256TB (typical limit)
  • Isolation: Processes can't access each other's memory
  • Continuous: Appears as continuous block to process

Physical Memory (RAM)

  • Shared: All processes share physical RAM
  • Limited: Actual RAM size (e.g., 8GB, 16GB)
  • Fragmented: Can be physically non-contiguous
  • Managed: Kernel allocates and tracks usage

Address Space Layout

High Address (0x7FFFFFFF on 32-bit)
┌─────────────────────────────────┐
│      Kernel Space (1GB)         │
│         0xC0000000+             │
├─────────────────────────────────┤
│      Stack (grows down)         │
│            ↓                    │
│                                 │
│     Memory-mapped region        │
│   (shared libraries, mmap)      │
│                                 │
│            ↑                    │
│      Heap (grows up)            │
├─────────────────────────────────┤
│      BSS (uninitialized data)   │
├─────────────────────────────────┤
│      Data (initialized data)    │
├─────────────────────────────────┤
│      Text (code) - read-only    │
└─────────────────────────────────┘
Low Address (0x00000000)

Pages and Page Tables

Page Structure

A page is the basic unit of memory management, typically 4KB (4096 bytes).

# View page size
getconf PAGE_SIZE
# Usually 4096

# View huge page sizes
cat /proc/meminfo | grep Huge

Page Sizes:

  • Regular: 4KB (most common)
  • Huge Pages: 2MB (x86-64)
  • Gigantic Pages: 1GB (x86-64)

Page Tables

Multi-level page tables map virtual to physical addresses:

4-Level Page Table (x86-64):

Virtual Address (48 bits used)
├── PGD Index (9 bits) → Page Global Directory
├── PUD Index (9 bits) → Page Upper Directory
├── PMD Index (9 bits) → Page Middle Directory
├── PTE Index (9 bits) → Page Table Entry
└── Offset (12 bits)   → Within physical page

Page Table Entry (PTE) flags:

  • Present: Page is in RAM
  • Read/Write: Write permission
  • User/Supervisor: User-space accessible
  • Accessed: Page has been read
  • Dirty: Page has been written
  • Page Size: Use larger page
  • No Execute (NX): Cannot execute code
# View process page table
cat /proc/<PID>/pagemap  # Binary format
cat /proc/<PID>/maps     # Human readable

Heap vs Stack

Stack

Characteristics:

  • LIFO: Last In, First Out
  • Automatic: Variables created/destroyed automatically
  • Fixed size: Typically 8MB (ulimit -s)
  • Fast: Simple pointer manipulation
  • Thread-local: Each thread has its own stack
  • Grows down: From high to low addresses

Used for:

  • Local variables
  • Function parameters
  • Return addresses
  • Function call frames
void function() {
    int local_var = 42;        // On stack
    char buffer[1024];         // On stack
}  // Automatically freed when function returns

Stack Frame:

High Address
┌──────────────────┐
│  Previous Frame  │
├──────────────────┤
│  Return Address  │
├──────────────────┤
│  Saved Registers │
├──────────────────┤
│  Local Variables │
├──────────────────┤
│  Parameters      │
└──────────────────┘ ← Stack Pointer (SP)
Low Address

Heap

Characteristics:

  • Dynamic: Manual allocation/deallocation
  • Variable size: Can grow up to virtual memory limit
  • Slower: Requires system calls or allocator management
  • Shared: Shared among threads
  • Grows up: From low to high addresses
  • Fragmentation: Can become fragmented

Used for:

  • Dynamic arrays
  • Long-lived objects
  • Large data structures
  • Shared memory
void function() {
    int *ptr = malloc(1024);   // On heap
    // ... use ptr ...
    free(ptr);                 // Must manually free
}

Comparison

Feature Stack Heap
Allocation Automatic Manual (malloc/free)
Speed Fast Slower
Size Limited (MB) Large (GB)
Lifetime Function scope Until freed
Fragmentation No Yes
Thread safety Per-thread Shared, needs locks

malloc() Implementation

Key Interview Question

"How does malloc() work?" - Expect to explain both the glibc implementation and kernel-level details.

High-Level Flow

void *ptr = malloc(1024);

What happens:

  1. Check size: Small (<128KB) or large allocation?
  2. Small allocations: Use brk()/sbrk() to expand heap
  3. Large allocations: Use mmap() to get anonymous memory
  4. Internal management: glibc maintains free lists
  5. Return pointer: To allocated memory

glibc malloc (ptmalloc2)

Internal Structures:

Bins: Free chunks organized by size

  • Fast bins: Small, recently freed chunks (LIFO)
  • Small bins: Small chunks (16-512 bytes)
  • Large bins: Large chunks (512+ bytes)
  • Unsorted bin: Recently freed chunks

Chunk Structure:

struct malloc_chunk {
    size_t prev_size;   // Size of previous chunk (if free)
    size_t size;        // Size of this chunk
    void *fd;           // Forward pointer (free list)
    void *bk;           // Backward pointer (free list)
    // ... user data ...
};

brk() vs mmap()

brk() System Call

#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);

Characteristics:

  • Extends program break: Top of heap
  • Linear growth: Heap grows continuously
  • Fast: Single system call
  • Cannot free middle: Can only shrink from top
  • Used for: Small allocations (<128KB)
Before brk():
Heap: [====] ← program break

After brk():
Heap: [========] ← new program break (expanded)

mmap() System Call

#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);

Characteristics:

  • Arbitrary location: Can allocate anywhere
  • Page-aligned: Always multiples of page size
  • Can free individually: With munmap()
  • Used for: Large allocations (>128KB), file mapping
  • Flexible: Can set permissions, share memory
// Anonymous memory allocation
void *ptr = mmap(NULL, 8192,
                 PROT_READ | PROT_WRITE,
                 MAP_PRIVATE | MAP_ANONYMOUS,
                 -1, 0);

Flags:

  • MAP_PRIVATE: Changes not visible to other processes
  • MAP_SHARED: Changes shared with other processes
  • MAP_ANONYMOUS: Not backed by file
  • MAP_FIXED: Use specified address

free() Implementation

free(ptr);

What happens:

  1. Get chunk header: Access metadata before ptr
  2. Check neighbors: Are adjacent chunks free?
  3. Coalesce: Merge with adjacent free chunks
  4. Add to bin: Place in appropriate free list
  5. Release to OS: If large chunk or top chunk, may call munmap() or brk()

Page Faults

A page fault occurs when a process accesses a memory page that's not currently in physical RAM.

Types of Page Faults

Minor Page Fault

Page is in memory but not mapped in process's page table.

Causes:

  • Copy-on-Write (COW)
  • Lazy loading
  • Page reclaimed but still in cache

Resolution:

  • Fast (~microseconds)
  • No disk I/O
  • Update page table

Major Page Fault

Page must be loaded from disk.

Causes:

  • First access to memory-mapped file
  • Page swapped to disk
  • Page never loaded

Resolution:

  • Slow (~milliseconds)
  • Disk I/O required
  • Update page table after load

Page Fault Handling

graph TD
    A[Process accesses memory] --> B{Page in RAM?}
    B -->|Yes| C{Mapped in page table?}
    C -->|Yes| D[Access succeeds]
    C -->|No| E[Minor page fault]
    E --> F[Update page table]
    F --> D
    B -->|No| G[Major page fault]
    G --> H[Find/Load page from disk]
    H --> F

Kernel handler steps:

  1. CPU generates fault: Page not present
  2. Kernel handles interrupt
  3. Determine fault type: Minor or major
  4. Locate page: Page cache, swap, file
  5. Allocate physical frame: If needed
  6. Load data: From disk if major fault
  7. Update page table: Map virtual to physical
  8. Flush TLB: Invalidate cached entry
  9. Resume execution: Return to faulting instruction
# Monitor page faults
time ./program                    # Shows major/minor faults
/usr/bin/time -v ./program       # Detailed stats
ps -o min_flt,maj_flt <PID>     # Per-process faults

Swapping and Swap Space

Swap Space

When physical RAM is full, Linux moves inactive pages to swap space on disk.

# View swap usage
free -h
swapon --show

# Add swap file
dd if=/dev/zero of=/swapfile bs=1M count=1024
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

# Remove swap
swapoff /swapfile

Swap Process

Swap Out (memory → disk):

  1. Select victim pages: Least recently used
  2. Write to swap: If page is dirty
  3. Update page table: Mark as swapped
  4. Free physical frame: For other use

Swap In (disk → memory):

  1. Page fault occurs: Access swapped page
  2. Allocate physical frame
  3. Read from swap: Load page data
  4. Update page table: Map to new frame
  5. Resume execution

Swappiness

Controls tendency to swap vs reclaim page cache.

# View current swappiness (default: 60)
cat /proc/sys/vm/swappiness

# Set swappiness (0-100)
# 0 = avoid swapping, prefer OOM
# 100 = aggressive swapping
echo 10 | sudo tee /proc/sys/vm/swappiness

Out of Memory (OOM) Killer

When the system runs out of memory and cannot free enough, the OOM killer terminates processes.

OOM Killer Selection

The kernel scores each process based on:

  • Memory usage (higher = worse)
  • Age (newer = worse)
  • Priority (nice value)
  • OOM adjust value
# View OOM score
cat /proc/<PID>/oom_score

# Adjust OOM score (-1000 to 1000)
echo -500 > /proc/<PID>/oom_score_adj  # Less likely to be killed
echo 1000 > /proc/<PID>/oom_score_adj  # Very likely to be killed

# Disable OOM killer for process
echo -1000 > /proc/<PID>/oom_score_adj

OOM Killer Messages

# View OOM killer logs
dmesg | grep -i "out of memory"
journalctl -k | grep -i oom

Example output:

Out of memory: Kill process 1234 (myapp) score 800 or sacrifice child
Killed process 1234 (myapp) total-vm:4000000kB, anon-rss:2000000kB, file-rss:0kB

Memory-Mapped Files

Use mmap() to map files directly into process memory.

Benefits

  • Efficient I/O: No separate read/write calls
  • Shared memory: Multiple processes can map same file
  • Lazy loading: Pages loaded on-demand
  • Automatic sync: Changes written back to file

Example

#include <sys/mman.h>
#include <fcntl.h>

int fd = open("data.bin", O_RDWR);
struct stat sb;
fstat(fd, &sb);

// Map file into memory
void *mapped = mmap(NULL, sb.st_size,
                    PROT_READ | PROT_WRITE,
                    MAP_SHARED, fd, 0);

// Access file as memory
char *data = (char *)mapped;
data[0] = 'A';  // Modifies file

// Sync changes to disk
msync(mapped, sb.st_size, MS_SYNC);

// Unmap
munmap(mapped, sb.st_size);
close(fd);

Buffer Cache and Page Cache

Page Cache

The page cache stores pages of files in RAM to speed up file I/O.

Contents:

  • Regular file pages
  • Block device pages
  • Memory-mapped file pages
# View page cache usage
free -h
cat /proc/meminfo | grep -i cache

# Clear page cache (don't do in production!)
sync
echo 3 > /proc/sys/vm/drop_caches

Buffer Cache

The buffer cache stores block device metadata (historically separate, now unified with page cache).

Contains:

  • Superblocks
  • Inodes
  • Directory entries
  • Block bitmaps

Copy-on-Write (COW)

COW is an optimization that delays copying memory until a write occurs.

How COW Works

pid_t pid = fork();

After fork():

  1. Share pages: Child and parent point to same physical pages
  2. Mark read-only: Set pages as copy-on-write
  3. On write: Page fault handler creates copy
graph LR
    A[Parent process] -->|fork| B[Child process]
    A -->|shares| C[Physical Pages read-only]
    B -->|shares| C
    A -->|write| D[Copy page]
    D --> E[Parent has own copy]
    C --> F[Child still has original]

Benefits

  • Fast fork(): No immediate copying
  • Memory efficient: Only copy pages that change
  • Common case: Many processes fork then exec (no writes)

Implementation

Page table flags:

  • Both processes have read-only PTEs
  • On write attempt:
  • Page fault occurs
  • Kernel allocates new physical page
  • Copies data to new page
  • Updates writing process's PTE
  • Makes page writable

Memory Zones

The kernel divides physical memory into zones:

Zone Types

ZONE_DMA (0-16MB)

  • Legacy DMA devices
  • Old hardware that can't address >16MB

ZONE_DMA32 (0-4GB)

  • 32-bit DMA devices
  • Devices that can't address >4GB

ZONE_NORMAL (varies)

  • Normal allocations
  • Directly mapped by kernel

ZONE_HIGHMEM (32-bit only, >896MB)

  • Not directly mapped by kernel
  • Requires temporary mapping
  • Not present on 64-bit systems
# View memory zones
cat /proc/zoneinfo

Practice Questions

  1. Explain the complete flow of malloc() for a 1KB allocation.
  2. What is the difference between minor and major page faults?
  3. Why does fork() not immediately copy all memory?
  4. When would you use brk() vs mmap()?
  5. How does the OOM killer decide which process to kill?
  6. What is the purpose of the TLB and why is it important?
  7. Explain the difference between virtual and physical memory.
  8. What happens to memory-mapped files when a process crashes?

Further Reading

  • man 3 malloc
  • man 2 mmap, man 2 brk
  • man 5 proc - /proc/meminfo, /proc/PID/maps
  • "The Linux Programming Interface" Chapters 6-8
  • "Understanding the Linux Virtual Memory Manager" by Mel Gorman