Introduction
In Unix-like operating systems, the ELF (Executable and Linkable Format) is the standard file format for executables, object code, and shared libraries. Understanding how ELF files are laid out in memory is crucial for developers, system programmers, and anyone interested in low-level programming.
This blog post will provide a brief guide to the ELF file layout in memory, including how segments are arranged, how sections are mapped to segments, and how to use tools like readelf
to inspect these details.
ELF File Structure Overview
Before diving into memory layout, it's essential to understand the basic structure of an ELF file. An ELF file consists of several parts:
ELF Header: Contains information about the file type, architecture, and other global properties.
Program Headers: Describe the segments to be loaded into memory.
Section Headers: Provide details about the sections used by the linker and debugger.
Segments and Sections: The actual data that makes up the executable or library.
Sections vs. Segments
Sections: Logical components of the file used by the linker and debugger (e.g.,
.text
,.data
,.bss
,.rodata
).Segments: The loader loads Physical components into memory (e.g., loadable segments, dynamic linking information).
ELF Types: ET_EXEC and ET_DYN
ELF files can be of two main types:
ET_EXEC: Statically linked executables that are loaded at fixed addresses in memory.
ET_DYN: Position-Independent Executables (PIE) and shared libraries that can be loaded at any address.
Memory Layout for ET_EXEC
For ET_EXEC
the segments are loaded into memory at the addresses specified in the program headers. These addresses are fixed and determined at link time. If a ET_EXEC
file is loaded at a different address, it will not function correctly.
To view the program headers and linked addresses of a ET_EXEC
file, you can use the readelf
command:
readelf -l a.out
This command displays the program headers, showing the virtual addresses (p_vaddr
) where each segment is loaded.
Memory Layout for ET_DYN
For ET_DYN
files, the segments are loaded into memory at a base address, and the same offset relocates all segment virtual addresses. This allows ET_DYN
files to be loaded at any address in memory, which is essential for Position-Independent Code (PIC) and ASLR (Address Space Layout Randomization).
To determine the base address and calculate the loaded addresses of segments, you can use the following approach:
Identify the linked address of the first loadable segment.
Determine the loaded address of the first segment at runtime.
Calculate the offset between the linked and loaded addresses.
Apply this offset to
p_vaddr
all other segments to find their loaded addresses.
Mapping Sections to Segments
Sections are not directly loaded into memory; instead, they are grouped into segments. To understand how sections like .rodata
and .bss
are mapped to segments, you can use the readelf
command to view the section-to-segment mappings:
readelf -l a.out
This command will show you which sections are included in each segment.
Example: Mapping Sections to Segments
Consider the following simple C program:
#include <stdio.h>
int main() {
printf("Hello, ELF!\n");
return 0;
}
Compile the program with debugging symbols:
gcc -g -o example example.c
View the program headers:
readelf -l example
Output:
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x1060
There are 13 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002d8 0x00000000000002d8 R 0x8
INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000628 0x0000000000000628 R 0x1000
LOAD 0x0000000000001000 0x0000000000001000 0x0000000000001000
0x0000000000000175 0x0000000000000175 R E 0x1000
LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000
0x00000000000000f4 0x00000000000000f4 R 0x1000
LOAD 0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
0x0000000000000258 0x0000000000000260 RW 0x1000
DYNAMIC 0x0000000000002dc8 0x0000000000003dc8 0x0000000000003dc8
0x00000000000001f0 0x00000000000001f0 RW 0x8
NOTE 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000030 0x0000000000000030 R 0x8
NOTE 0x0000000000000368 0x0000000000000368 0x0000000000000368
0x0000000000000044 0x0000000000000044 R 0x4
GNU_PROPERTY 0x0000000000000338 0x0000000000000338 0x0000000000000338
0x0000000000000030 0x0000000000000030 R 0x8
GNU_EH_FRAME 0x0000000000002010 0x0000000000002010 0x0000000000002010
0x0000000000000034 0x0000000000000034 R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
0x0000000000000248 0x0000000000000248 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .plt.got .plt.sec .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.property
08 .note.gnu.build-id .note.ABI-tag
09 .note.gnu.property
10 .eh_frame_hdr
11
12 .init_array .fini_array .dynamic .got
From this output, you can see:
Segment 03: Contains
.text
,.rodata
, and.eh_frame
sections.Segment 05: Contains
.data
and.bss
sections.
This means that the read-only data (rodata
) is placed in a different segment than the code (text
), and the read-write data (data
and bss
) is in a separate segment.
Determining Segment Arrangement
To determine the arrangement of segments in memory, you can use the following steps:
Identify the ELF Type: Use
readelf -h a.out
to check if the file isET_EXEC
orET_DYN
.View Program Headers: Use
readelf -l a.out
to see the program headers and segment details.Check Section to Segment Mapping: Understand which sections are included in each segment.
Analyze Memory Layout: For
ET_EXEC
, the segments are loaded at the linked addresses. ForET_DYN
, calculate the offset based on the base address.
Example: Analyzing an ET_EXEC File
Consider the previous example
executable:
➜ ~ gcc -no-pie -o example example.c
➜ ~ readelf -h example | grep Type
Type: EXEC (Executable file)
Example: Analyzing an ET_DYN File
Take a shared library like libc.so.6
:
readelf -h /lib/x86_64-linux-gnu/libc.so.6 | grep Type
Output:
Type: DYN (Shared object file)
View the program headers:
readelf -l /lib/x86_64-linux-gnu/libc.so.6
You'll notice that the p_vaddr
values are relative to the linked address, and the actual loaded addresses will be determined at runtime based on the base address.
Practical Implications
Understanding the ELF file layout in memory has several practical implications:
Debugging: Knowing which sections are loaded where can aid in debugging.
Optimization: You can optimize memory usage by controlling section placement.
Security: Understanding memory layout is crucial for security considerations like ASLR.
Further Reading
Exercises
Exercise 1: Create a simple C program, compile it, and analyze its ELF headers using
readelf
. Identify the sections and their corresponding segments.Exercise 2: Write a program that uses the
.bss
,.data
, and.rodata
sections. Usereadelf
to see how these sections are mapped into segments.Exercise 3: Explore the program headers of a shared library and compare them to those of an executable. Note the differences in segment mappings.