Understanding the Role of File Descriptors in Anonymous Mappings with mmap

Purpose of the MAP_ANONYMOUS flag in the mmap system call

Dec 31, 2024

Introduction

In system programming, the mmap system call is a powerful tool for memory management, allowing developers to map files or devices directly into memory. One of its lesser-understood aspects is the use of anonymous mappings with the MAP_ANONYMOUS flag.

This blog post explains the reasons why some implementations require the file descriptor (fd) to be set to -1 when using MAP_ANONYMOUS, exploring the historical context, portability concerns, and best practices for ensuring code reliability across different systems.

What is mmap and Its Parameters?

The mmap function maps a file or device into memory, providing an efficient way to work with large data sets. Its key parameters include:

Address: The starting address of the mapping.
Length: The size of the mapping.
Protection Flags: Specify read, write, and execute permissions.
Flags: Determine whether the mapping is shared or private.
File Descriptor (fd): Identifies the file or device to map.
Offset: The starting point within the file.

When using MAP_ANONYMOUS, a file does not back the mapping; thus, the fd parameter becomes irrelevant.

What is the purpose of the MAP_ANONYMOUS flag in the mmap system call?

Anonymous mappings are a powerful and flexible memory management mechanism in modern operating systems, particularly in Unix-like systems such as Linux. They can be thought of as zeroized virtual files—large blocks of memory pre-filled with zeros and ready for use. Unlike memory allocated on the heap, anonymous mappings reside outside the traditional data segment, which helps avoid fragmentation of the heap.

This makes them particularly useful for allocating large blocks of memory or for specialized use cases like inter-process communication (IPC). Anonymous mappings are created using the mmap system call, and their behavior can be fine-tuned using flags such as MAP_ANONYMOUS, MAP_PRIVATE, and MAP_SHARED.

When using MAP_ANONYMOUS in combination with MAP_PRIVATE, each call to mmap creates a distinct, zeroized memory region. This type of mapping is commonly used for allocating new memory blocks, especially for large allocations that exceed the MMAP_THRESHOLD (typically 128 kB), which is the point at which malloc switches from using the heap to using anonymous mappings.

In this mode, if a process forks, the child process inherits the parent’s mappings, but any modifications made by the child are handled in a copy-on-write manner. This ensures that changes made by the child do not affect the parent’s memory. On the other hand, MAP_ANONYMOUS combined with MAP_SHARED creates distinct mappings that do not share pages with other mappings.

However, processes that inherit these mappings (such as child processes) can modify the shared memory directly, without copy-on-write semantics. This makes shared anonymous mappings useful for IPC between related processes, similar to System V shared memory segments.

The Role of fd in mmap

For file-backed mappings, fd specify the file to map. However, with MAP_ANONYMOUS, fd is ignored. Setting it to -1 is a convention to indicate no file is involved, ensuring compatibility with systems that expect a valid fd.

There are two primary ways to create anonymous mappings on Linux. The first method involves specifying the MAP_ANONYMOUS flag and passing -1 as the file descriptor (fd) in the mmap call. This is the most straightforward and portable approach on Linux.

The second method involves opening /dev/zero and passing the resulting file descriptor to mmap. This method is more commonly used on systems like BSD that lack the MAP_ANONYMOUS flag. Both methods achieve the same result: a zeroized block of memory that can be used for various purposes.

Anonymous mappings offer several advantages, including the absence of virtual address space fragmentation—once unmapped, the memory is immediately returned to the system. They are also highly flexible, allowing modifications to allocation size, permissions, and the ability to receive memory advice (e.g., madvise). Additionally, each allocation is a distinct mapping, separate from the global heap, which can simplify memory management in certain scenarios.

However, there are some disadvantages to consider. The size of each mapping must be an integer multiple of the system’s page size, which can lead to wasted address space. Furthermore, creating and returning mappings incurs more overhead compared to allocations from the pre-allocated heap. Despite these drawbacks, anonymous mappings remain a valuable tool for developers needing efficient and flexible memory management, especially in scenarios involving large allocations or IPC.

Thanks for reading Low-Level Lore! This post is public so feel free to share it.

#include <sys/mman.h>
#include <unistd.h>

void *map_anonymous_memory(size_t size) {
    void *addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
    if (addr == MAP_FAILED) {
        // handle error
    }
    return addr;
}

The above code shows the correct usage of mmap with MAP_ANONYMOUS.

Conclusion

Understanding the role of fd anonymous mappings is crucial for writing portable and reliable code. By setting fd to -1 when using MAP_ANONYMOUS, developers ensure their code works across diverse systems, respecting historical implementation quirks and modern standards.

References and Further Reading

Linux man page for mmap
POSIX standard specifications
Historical Unix system documentation
SunOS 4.1.3 mmap code: link
Linux mmap code: link

Low-Level Lore