Relearning Assembly for Malware Analysis: General-Purpose Registers (x86/x64)

When I open a malware sample in Ghidra/IDA after not doing assembly for a while, I felt like, I lost touch with my basics.

So I’m rebuilding from the bottom: 8086-era 16-bit regs, then the 32-bit IA-32 expansions, and finally x86-64. What I want at the end is simple: when I see mov, jmp, and a call, I want to kno what exactly is happening rather than high level interpretation.

The 8086 mental model: 8 “general” registers

The 8086 gives you eight 16-bit registers that show up everywhere:

AX, BX, CX, DX
SI, DI, BP, SP

A key detail I tend to forget until I trip over it again: AX/BX/CX/DX can be accessed as 8-bit halves (AH/AL, etc.), while SI/DI/BP/SP are 16-bit only in the original 8086 model.

AX: accumulator, commonly used in arithmetic and I/O patterns
BX: historically “base” for addressing (you’ll see it used as a base pointer in some styles)
CX: “count” register shows up in loops and repeat/string ops
DX: often paired with AX for wider results (mul/div patterns)
SP/BP: stack pointer and base pointer energy (stack frame / locals / args)
SI/DI: “source” / “destination” index vibes, especially around string/memory ops

This isn't common with modern compilers and CPUs, but I still find them helpful as a starting point.

16-bit → 32-bit (IA-32)

When the 80386/IA-32 era arrives, the big change is: those same registers become 32-bit, and the naming gets an E prefix:

EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP

What clicked for me is that the old 16-bit names don’t disappear — they become the low 16 bits of the 32-bit register, and the 8-bit pieces still exist where they existed before. The classic lists show this explicitly (eax/ecx/… and ax/cx/… and al/ah/etc.).

EAX (32)
└── AX (16)
    ├── AH (8 high)
    └── AL (8 low)

A Graphical representation

32-bit → 64-bit (x86-64)

In x86-64, everything expands again:

RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, plus the significant upgrade: R8–R15, and each of these has smaller names.

Example with RAX:

RAX = 64-bit
EAX = low 32-bit
AX = low 16-bit
AL/AH = low/high 8-bit pieces (of AX)

The extra registers (R8–R15) also have R8D/R8W/R8B forms for 32/16/8-bit access. (Stack Overflow)

Expansion of register size

The x64 gotcha I must remember (because it changes dataflow)

On x64, if an instruction writes to a 32-bit subregister (like EAX), it zero-extends into the full 64-bit register (RAX). Microsoft’s x64 register overview states this plainly. (learn.microsoft.com)

mov eax, 1
; implies rax = 00000000_00000001h

This matters in RE because it’s a clue:

a lot of eax/ecx/edx/r8d activity often means 32-bit integers / counters, not “full pointer” math.

What they’re “used for” (in the way I actually see them in malware RE)

1) Stack/navigation registers: SP/BP and friends

SP/ESP/RSP: where the stack is right now
BP/EBP/RBP: commonly used as a stable frame reference (especially in x86; in x64 it depends on compiler/flags, but it still appears a lot in RE)

2) Pointers + indexes: SI/DI and modern equivalents

SI/ESI/RSI and DI/EDI/RDI: frequently become “this is a pointer to a buffer” registers in decompiled/disassembled code (even if it’s not string ops)

3) “Return value lives here” intuition

On x64 (and commonly in many conventions), RAX is the return register; Windows x64 docs treat RAX as a key volatile register and it’s the one you watch after calls. (learn.microsoft.com)

A quick “evolution cheat sheet” (what I keep nearby)

8086 / 16-bit set

AX BX CX DX SI DI BP SP (Wikipedia)

32-bit IA-32 set (the same, widened)

EAX EBX ECX EDX ESI EDI EBP ESP (docs.oracle.com)

64-bit x86-64 set (widened + extra)

RAX RBX RCX RDX RSI RDI RBP RSP R8 R9 R10 R11 R12 R13 R14 R15 (learn.microsoft.com)

Where I’m going next

Next I want to write up the part that actually makes control flow readable in malware:

flags (ZF/CF/SF/OF) + cmp/test + conditional jumps