Understanding ELF File Layout in Memory

Understanding Section-to-Segment Relationships

Jan 18, 2025

Introduction

In Unix-like operating systems, the ELF (Executable and Linkable Format) is the standard file format for executables, object code, and shared libraries. Understanding how ELF files are laid out in memory is crucial for developers, system programmers, and anyone interested in low-level programming.

This blog post will provide a brief guide to the ELF file layout in memory, including how segments are arranged, how sections are mapped to segments, and how to use tools like readelf to inspect these details.

ELF File Structure Overview

Before diving into memory layout, it's essential to understand the basic structure of an ELF file. An ELF file consists of several parts:

ELF Header: Contains information about the file type, architecture, and other global properties.
Program Headers: Describe the segments to be loaded into memory.
Section Headers: Provide details about the sections used by the linker and debugger.
Segments and Sections: The actual data that makes up the executable or library.

Sections vs. Segments

Sections: Logical components of the file used by the linker and debugger (e.g., .text, .data, .bss, .rodata).
Segments: The loader loads Physical components into memory (e.g., loadable segments, dynamic linking information).

ELF Types: ET_EXEC and ET_DYN

ELF files can be of two main types:

ET_EXEC: Statically linked executables that are loaded at fixed addresses in memory.
ET_DYN: Position-Independent Executables (PIE) and shared libraries that can be loaded at any address.

Memory Layout for ET_EXEC

For ET_EXEC the segments are loaded into memory at the addresses specified in the program headers. These addresses are fixed and determined at link time. If a ET_EXEC file is loaded at a different address, it will not function correctly.

To view the program headers and linked addresses of a ET_EXEC file, you can use the readelf command:

readelf -l a.out

This command displays the program headers, showing the virtual addresses (p_vaddr) where each segment is loaded.

Memory Layout for ET_DYN

For ET_DYN files, the segments are loaded into memory at a base address, and the same offset relocates all segment virtual addresses. This allows ET_DYN files to be loaded at any address in memory, which is essential for Position-Independent Code (PIC) and ASLR (Address Space Layout Randomization).

To determine the base address and calculate the loaded addresses of segments, you can use the following approach:

Identify the linked address of the first loadable segment.
Determine the loaded address of the first segment at runtime.
Calculate the offset between the linked and loaded addresses.
Apply this offset to p_vaddr all other segments to find their loaded addresses.

Mapping Sections to Segments

Sections are not directly loaded into memory; instead, they are grouped into segments. To understand how sections like .rodata and .bss are mapped to segments, you can use the readelf command to view the section-to-segment mappings:

readelf -l a.out

This command will show you which sections are included in each segment.

Example: Mapping Sections to Segments

Consider the following simple C program:

#include <stdio.h>

int main() {
    printf("Hello, ELF!\n");
    return 0;
}

Compile the program with debugging symbols:

gcc -g -o example example.c

View the program headers:

readelf -l example

Output:

Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1060
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000628 0x0000000000000628  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000175 0x0000000000000175  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x00000000000000f4 0x00000000000000f4  R      0x1000
  LOAD           0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x0000000000000258 0x0000000000000260  RW     0x1000
  DYNAMIC        0x0000000000002dc8 0x0000000000003dc8 0x0000000000003dc8
                 0x00000000000001f0 0x00000000000001f0  RW     0x8
  NOTE           0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000030 0x0000000000000030  R      0x8
  NOTE           0x0000000000000368 0x0000000000000368 0x0000000000000368
                 0x0000000000000044 0x0000000000000044  R      0x4
  GNU_PROPERTY   0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000030 0x0000000000000030  R      0x8
  GNU_EH_FRAME   0x0000000000002010 0x0000000000002010 0x0000000000002010
                 0x0000000000000034 0x0000000000000034  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x0000000000000248 0x0000000000000248  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .interp 
   02     .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 
   03     .init .plt .plt.got .plt.sec .text .fini 
   04     .rodata .eh_frame_hdr .eh_frame 
   05     .init_array .fini_array .dynamic .got .data .bss 
   06     .dynamic 
   07     .note.gnu.property 
   08     .note.gnu.build-id .note.ABI-tag 
   09     .note.gnu.property 
   10     .eh_frame_hdr 
   11     
   12     .init_array .fini_array .dynamic .got

From this output, you can see:

Segment 03: Contains .text, .rodata, and .eh_frame sections.
Segment 05: Contains .data and .bss sections.

This means that the read-only data (rodata) is placed in a different segment than the code (text), and the read-write data (data and bss) is in a separate segment.

Determining Segment Arrangement

To determine the arrangement of segments in memory, you can use the following steps:

Identify the ELF Type: Use readelf -h a.out to check if the file is ET_EXEC or ET_DYN.
View Program Headers: Use readelf -l a.out to see the program headers and segment details.
Check Section to Segment Mapping: Understand which sections are included in each segment.
Analyze Memory Layout: For ET_EXEC, the segments are loaded at the linked addresses. For ET_DYN, calculate the offset based on the base address.

Example: Analyzing an ET_EXEC File

Consider the previous example executable:

➜  ~ gcc -no-pie -o example example.c
➜  ~ readelf -h example | grep Type
  Type:                              EXEC (Executable file)

Example: Analyzing an ET_DYN File

Take a shared library like libc.so.6:

readelf -h /lib/x86_64-linux-gnu/libc.so.6 | grep Type