Why You Can't Directly Move Data Between Two Memory Addresses in x86 Assembly and How to Do It

Understanding instruction encoding limitations and mastering efficient data movement techniques from single bytes to large blocks

Jun 12, 2025

In x86 assembly, you can't directly move data between two memory addresses using the mov instruction due to encoding limitations
Using a register as an intermediate, like mov al, [src] then mov [dest], al, is the standard approach for small data
The movs instructions, like movsb, are recommended for moving blocks of data, especially with the rep prefix for efficiency

Why You Can't Directly Move Data

In x86 assembly, the mov instruction is great for moving data, but it has a catch: it can't handle two memory addresses at once. If you try something like mov [dest], [src], you'll get an error because x86 instructions are designed to have at most one memory operand. This is due to how the CPU encodes instructions to keep things efficient and simple for the hardware.

How to Move Data Instead

The most common method is to use a register as a middleman. For example, to move a byte:

Load the data into a register: mov al, [src]
Then store it to the destination: mov [dest], al

For larger data, use wider registers like EAX for 32 bits or RAX for 64 bits. It's best to use call-clobbered registers like EAX, ECX, or EDX, as they don't need saving across function calls. In 64-bit mode, you can also use R8-R15, but watch out—using the lower 8 bits might need a REX prefix, making the instruction longer.

Moving Larger Blocks

Handling Larger Data with movs Instructions

Using registers works great for small data, but what if you need to copy a string, an array, or a large buffer? Looping with mov instructions gets clunky fast. Enter the movs family of instructions:

movsb for bytes (move byte)
movsw for words (move word, 2 bytes)
movsd for doublewords (move doubleword, 4 bytes)
movsq for quadwords (move quadword, 8 bytes, in 64-bit mode)

These are specialized for string operations, using specific registers for addressing:

Source: SI (16-bit), ESI (32-bit), or RSI (64-bit).
Destination: DI, EDI, or RDI.

The movs instructions automatically update these pointers after each move, based on the direction flag (DF):

DF = 0 (set with cld): Pointers increment (move forward).
DF = 1 (set with std): Pointers decrement (move backward).

For example, movsb does this:

Copies a byte from [SI] or [RSI] to [DI] or [RDI].
Adjusts SI and DI by 1 (byte size), either up or down based on DF.

Similarly, movsd moves 4 bytes and adjusts pointers by 4. The adjustment matches the data size: 1 for movsb, 2 for movsw, 4 for movsd, 8 for movsq.

You can find detailed specs for movs at Felix Cloutier’s x86 reference. It’s a handy resource for checking exact behavior, especially for edge cases like segment overrides.

Detailed Exploration of Memory Operations

The Limitation of mov Instruction

The mov instruction is a fundamental tool in x86 assembly, used for moving data between various locations: registers, memory, and immediates. However, it has a critical limitation: it cannot directly move data between two memory addresses. Attempting something like mov [dest], [src] results in a compile error, such as "invalid combination of opcode and operands."

The reason lies in the x86 instruction encoding. Most x86 instructions, including mov, are structured to have at most one memory operand. This design choice simplifies the hardware and keeps instruction encoding compact.

Using Registers as Intermediates

Since direct memory-to-memory moves aren't possible with mov, the standard approach is to use a register as an intermediary:

For a single byte:

mov al, [src]
mov [dest], al

For 32-bit data:

mov eax, [src]
mov [dest], eax

For 64-bit data:

mov rax, [src]
mov [dest], rax

The choice of register matters. It's recommended to use call-clobbered registers—those not preserved across function calls—like EAX, ECX, EDX in 32-bit, and in 64-bit, you can also use R8 through R15.

When dealing with bytes, be cautious with high byte registers like AH, BH, CH, DH. On some CPUs, using these can cause partial register stalls, hurting performance. Instead, opt for the low byte registers (AL, CL, DL) or use movzx to zero-extend into a full register:

movzx eax, byte [src]
mov [dest], al

Exploring movs for String Operations

For moving larger blocks, like strings or arrays, mov with registers can get tedious if you need to loop. That's where the movs family comes in:

movsb (move byte)
movsw (move word, 2 bytes)
movsd (move doubleword, 4 bytes)
movsq (move quadword, 8 bytes, in 64-bit mode)

These instructions use SI (or RSI in 64-bit) as the source pointer and DI (or RDI) as the destination. After each move, the pointers are adjusted based on the direction flag (DF). If DF is 0 (set with cld), they increment; if 1 (set with std), they decrement.

Examples:

movsb moves a byte, then adjusts SI and DI by 1
movsd moves 4 bytes, adjusting by 4

Leveraging rep for Block Moves

The real power of movs shines when paired with the rep prefix, which repeats the instruction based on a count in the CX register (ECX in 32-bit, RCX in 64-bit). For instance, to copy 100 bytes:

mov esi, src    ; Source address
mov edi, dest   ; Destination address
mov ecx, 100    ; Count of bytes
cld             ; Clear DF for forward movement
rep movsb       ; Repeat movsb 100 times

This copies 100 bytes from [src] to [dest], incrementing RSI and RDI each time. The rep prefix checks the count register, decrements it after each move, and stops when it hits zero.

Performance Considerations and Optimizations

While rep movs is great for large blocks, it's not always the fastest for small data. Setting up the pointers and count register has overhead, so for just a few bytes, using registers might be quicker. For large blocks, though, rep movs can be optimized by the CPU, especially on modern Intel and AMD processors.

Alignment is another factor. Memory accesses are faster when data is aligned to boundaries like 16 bytes or 64 bytes, depending on the CPU. Some CPUs optimize rep movs for aligned data, so aligning your buffers can help.

Methods Comparison

Edge Cases and Gotchas

Direction Flag: Always set DF explicitly with cld or std before using movs. Some functions, especially in C libraries, might leave DF in an unexpected state.
Segment Overrides: In 16-bit or 32-bit code, movs uses DS for the source and ES for the destination by default. In 64-bit mode, segment registers are mostly ignored, but be cautious in mixed-mode code.
Interrupts: rep movs can be interrupted, which is great for OS code but means you need to ensure ECX, ESI, and EDI are preserved if you’re writing interrupt handlers.

Practical Examples

Copy a single byte:

mov al, [buffer1]  ; Load byte from buffer1
mov [buffer2], al  ; Store to buffer2

Copy a 100-byte block:

mov esi, buffer1   ; Source
mov edi, buffer2   ; Destination
mov ecx, 100       ; 100 bytes
cld                ; Forward direction
rep movsb          ; Copy the block

Conclusion

In x86 assembly, you can't directly move data between two memory addresses with mov due to encoding limitations, but you can use a register as an intermediate for small data or leverage movs with rep for blocks. Understanding these methods, along with optimization tips like alignment, is key to writing efficient assembly code.

Thanks for reading Low-Level Lore! This post is public so feel free to share it.

Low-Level Lore