Welcome to CYB3RFY Reverse Engineering
I'm cyb3rfy â creator of the CYB3RFY YouTube channel. I publish CTF walkthroughs and TryHackMe rooms. I'm an intermediate CTF player â I often rank high in competitions, but for a long time I struggled with reverse engineering challenges. I didn't know where to start, what tools to use, or how to read assembly.
Because of that, I lost many potential top ranks.
After studying for weeks â assembly, registers, syscalls, GDB, IDA, Ghidra, radare2, strings, and more â I created notes that helped me finally understand everything.
This website is built from those notes â a complete beginner guide so anyone can start reverse engineering the right way.
What You'll Learn
This comprehensive guide covers everything from CPU architecture and assembly language to professional reverse engineering tools. Whether you're preparing for CTF competitions, bug bounty hunting, or building security expertise, you'll find detailed explanations, real code examples, practical exercises, and command references.
How to Use This Guide
Use the search bar at the top to instantly find any assembly instruction, GDB command, radare2 command, syscall, or reverse engineering concept. Every result includes detailed explanations and examples.
CPU & Registers
The CPU (Central Processing Unit) is the brain of your computer. To reverse engineer binaries, you must understand how CPUs work at a fundamental level. This chapter explains CPU architecture, registers, and why they matter for security.
What is a CPU?
A CPU executes instructions in sequence. It reads data from memory, processes it, and writes results back. The two main components of a CPU are:
What Are Registers?
Registers are tiny, ultra-fast storage units inside the CPU. They hold data that the CPU is actively using. Unlike RAM (which is gigabytes), registers are measured in bits and are incredibly fast.
When you debug a binary with GDB or examine assembly code, you're watching data move through registers. Understanding registers is essential because:
Register Hierarchy (x86-64 Architecture)
On modern 64-bit x86 CPUs, registers come in different sizes. The main "General Purpose Registers" are:
| 64-bit (QWORD) | 32-bit (DWORD) | 16-bit (WORD) | 8-bit HIGH | 8-bit LOW |
|---|---|---|---|---|
| RAX | EAX | AX | AH | AL |
| RBX | EBX | BX | BH | BL |
| RCX | ECX | CX | CH | CL |
| RDX | EDX | DX | DH | DL |
| RSI | ESI | SI | - | |
| RDI | EDI | DI | - | |
| RSP | ESP | SP | - | |
| RBP | EBP | BP | - | |
| R8 | R8D | R8W | R8B | |
| R9 | R9D | R9W | R9B | |
| R10 | R10D | R10W | R10B | |
| R11 | R11D | R11W | R11B | |
| R12 | R12D | R12W | R12B | |
| R13 | R13D | R13W | R13B | |
| R14 | R14D | R14W | R14B | |
| R15 | R15D | R15W | R15B | |
When you use a 32-bit register (like EAX), it automatically zeros the upper 32 bits of the 64-bit register (RAX). This is important in reverse engineering because it affects what data is preserved.
Key x86-64 Registers
Special Registers - Flags Register (RFLAGS)
The RFLAGS (or FLAGS in 32-bit) register contains condition flags â single bits that indicate the status of the last operation.
| Flag | Name | Meaning | Set When |
|---|---|---|---|
| ZF | Zero Flag | Result is zero | Result of last operation = 0 |
| CF | Carry Flag | Unsigned overflow | Addition/subtraction carries/borrows |
| SF | Sign Flag | Result is negative | MSB (most significant bit) = 1 |
| OF | Overflow Flag | Signed overflow | Signed arithmetic overflow occurs |
| PF | Parity Flag | Even parity | Result has even number of 1 bits |
| AF | Adjust Flag | BCD carry | Carry in lower nibble |
Why flags matter: Conditional jumps (JE, JNE, JZ, JG, etc.) check these flags to decide whether to jump. Understanding flags is essential for reading assembly code.
cmp rax, rbx ; Compare RAX with RBX (subtract, discard result)
; Sets ZF=1 if RAX==RBX, ZF=0 if different
je success ; Jump if Equal (checks ZF flag)
mov rax, 0 ; If not equal, RAX = 0
jmp end
success:
mov rax, 1 ; If equal, RAX = 1
end:
x86-64 Calling Convention
When a function is called, arguments are passed through specific registers in a specific order. This is the calling convention. Understanding it is crucial for debugging.
This is the calling convention used on 64-bit Linux systems:
| Argument # | Register |
|---|---|
| 1st argument | RDI |
| 2nd argument | RSI |
| 3rd argument | RDX |
| 4th argument | RCX |
| 5th argument | R8 |
| 6th argument | R9 |
| Return value | RAX |
Any arguments beyond the 6th are passed on the stack.
mov rdi, 10 ; arg1 = 10
mov rsi, 20 ; arg2 = 20
mov rdx, 30 ; arg3 = 30
mov rcx, 40 ; arg4 = 40
call my_function ; Call function
; RAX now contains the return value
Putting It Together
Now you understand:
These fundamentals are essential. In the next section, you'll learn how to use this knowledge to understand system calls.
System Calls (Syscalls)
A system call is a request from a user program to the kernel to perform a privileged operation. When a program needs to read a file, write to the screen, or allocate memory, it can't do it directly â it must ask the kernel through a syscall.
User Space vs Kernel Space
Modern operating systems use a layered privilege model:
â Perform calculations
â Cannot access hardware directly
â Cannot access other process memory
â Cannot perform privileged operations
RAX = syscall number, RDI/RSI/RDX = arguments
â Execute privileged instructions
â Control hardware devices
â Manage processes and resources
â Handle interrupts and exceptions
The user/kernel space separation is fundamental to operating system security. User programs cannot directly access hardware or other processes' memory. All privileged operations must go through the kernel via system calls, where permissions are checked and validated.
How System Calls Work
When your program executes a SYSCALL instruction:
x86-64 Linux Syscall ABI
On 64-bit Linux, syscalls follow a specific convention. Let's break it down:
Every syscall has a number. You put that number in RAX, then set up arguments in specific registers:
| Register | Purpose |
|---|---|
| RAX | Syscall number (which syscall to call) |
| RDI | 1st argument |
| RSI | 2nd argument |
| RDX | 3rd argument |
| R10 | 4th argument (note: RCX for functions, but R10 for syscalls) |
| R8 | 5th argument |
| R9 | 6th argument |
Return value: After the syscall, RAX contains the result (or an error code if negative).
Common Syscalls Explained
Complete System Call Database
Here's a comprehensive reference of Linux x86-64 system calls with arguments, return values, and detailed descriptions:
| RAX | Name | Arguments (RDI, RSI, RDX, R10, R8, R9) | Return Value | Description |
|---|---|---|---|---|
| 0 | read | RDI: unsigned int fd RSI: char *buf RDX: size_t count |
ssize_t (bytes read or -1) |
Reads up to count bytes from file descriptor fd into buffer buf. Returns number of bytes read, 0 on EOF, or -1 on error. Commonly used with fd=0 (stdin) to read user input. |
| 1 | write | RDI: unsigned int fd RSI: const char *buf RDX: size_t count |
ssize_t (bytes written or -1) |
Writes up to count bytes from buffer buf to file descriptor fd. Returns number of bytes written or -1 on error. Use fd=1 (stdout) for console output, fd=2 (stderr) for error messages. |
| 2 | open | RDI: const char *filename RSI: int flags RDX: mode_t mode |
int (file descriptor or -1) |
Opens file specified by filename. Flags: O_RDONLY(0), O_WRONLY(1), O_RDWR(2), O_CREAT(64), O_APPEND(1024). Mode specifies permissions (e.g., 0644). Returns file descriptor on success. |
| 3 | close | RDI: unsigned int fd | int (0 or -1) |
Closes file descriptor fd, freeing the resource. Returns 0 on success, -1 on error. Always close files when done to prevent resource leaks. |
| 4 | stat | RDI: const char *filename RSI: struct stat *statbuf |
int (0 or -1) |
Retrieves file information (size, permissions, timestamps) for filename and stores in statbuf. Returns 0 on success. Does not follow symlinks for lstat(6). |
| 5 | fstat | RDI: unsigned int fd RSI: struct stat *statbuf |
int (0 or -1) |
Like stat, but operates on an already-opened file descriptor instead of a filename. Useful when you already have the file open. |
| 8 | lseek | RDI: unsigned int fd RSI: off_t offset RDX: unsigned int whence |
off_t (new position or -1) |
Repositions file offset of fd. Whence: SEEK_SET(0)=absolute, SEEK_CUR(1)=relative, SEEK_END(2)=from end. Returns new offset from beginning of file. |
| 9 | mmap | RDI: void *addr RSI: size_t length RDX: int prot R10: int flags R8: int fd R9: off_t offset |
void* (address or MAP_FAILED) |
Maps file or device into memory. Prot: PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Flags: MAP_PRIVATE(2), MAP_ANONYMOUS(32). Returns pointer to mapped area. Critical for memory management. |
| 11 | munmap | RDI: void *addr RSI: size_t length |
int (0 or -1) |
Unmaps a previously mapped memory region starting at addr with length. Returns 0 on success. Always unmap when done to free memory. |
| 12 | brk | RDI: void *addr | void* (new break or -1) |
Changes program break (end of data segment) to addr. Used by malloc() internally. Returns new program break on success. Rarely used directly in modern programs (prefer mmap). |
| 16 | ioctl | RDI: unsigned int fd RSI: unsigned int cmd RDX: unsigned long arg |
int (varies) |
Device-specific input/output control. Command and arguments vary by device. Used for operations that don't fit read/write model (terminal settings, disk operations, etc.). |
| 22 | pipe | RDI: int pipefd[2] | int (0 or -1) |
Creates a pipe (unidirectional data channel). pipefd[0] is read end, pipefd[1] is write end. Returns 0 on success. Used for inter-process communication. |
| 32 | dup | RDI: unsigned int fildes | int (new fd or -1) |
Duplicates file descriptor fildes using the lowest available fd number. Both fds refer to same file. Returns new fd on success. |
| 33 | dup2 | RDI: unsigned int oldfd RSI: unsigned int newfd |
int (new fd or -1) |
Duplicates oldfd to newfd. If newfd is open, it's closed first. Commonly used to redirect stdin/stdout/stderr in child processes. |
| 57 | fork | (none) | pid_t (child PID or 0 in child) |
Creates new process by duplicating calling process. Returns child PID to parent, returns 0 in child process. Child gets copy of parent's memory and file descriptors. |
| 59 | execve | RDI: const char *filename RSI: char *const argv[] RDX: char *const envp[] |
int (never returns on success) |
Executes program specified by filename, replacing current process image. argv is argument array, envp is environment. Only returns on error. Used with fork() to run new programs. |
| 60 | exit | RDI: int status | void (never returns) |
Terminates calling process with exit status. 0 indicates success, non-zero indicates error. Flushes buffers, closes file descriptors, and returns status to parent. |
| 61 | wait4 | RDI: pid_t pid RSI: int *status RDX: int options R10: struct rusage *rusage |
pid_t (PID or -1) |
Waits for child process to change state. Returns child PID on success. Status contains exit code. Used by parent to collect terminated children (prevent zombies). |
| 39 | getpid | (none) | pid_t (process ID) |
Returns process ID (PID) of calling process. Always succeeds. Useful for logging, creating unique filenames, or process identification. |
| 110 | getppid | (none) | pid_t (parent PID) |
Returns parent process ID of calling process. If parent has exited, returns 1 (init/systemd). Always succeeds. |
| 102 | getuid | (none) | uid_t (user ID) |
Returns real user ID of calling process. Used for permission checks. Always succeeds. |
| 104 | getgid | (none) | gid_t (group ID) |
Returns real group ID of calling process. Used for permission checks. Always succeeds. |
| 105 | setuid | RDI: uid_t uid | int (0 or -1) |
Sets effective user ID. If privileged, sets real, effective, and saved UIDs. Returns 0 on success. Used for privilege dropping or SUID executables. |
| 106 | setgid | RDI: gid_t gid | int (0 or -1) |
Sets effective group ID. If privileged, sets real, effective, and saved GIDs. Returns 0 on success. |
| 62 | kill | RDI: pid_t pid RSI: int sig |
int (0 or -1) |
Sends signal sig to process pid. Common signals: SIGTERM(15), SIGKILL(9), SIGUSR1(10). Returns 0 on success. If pid=0, sends to process group. |
| 13 | rt_sigaction | RDI: int sig RSI: const struct sigaction *act RDX: struct sigaction *oldact |
int (0 or -1) |
Examines or changes signal handler for signal sig. act specifies new action, oldact receives old action. Returns 0 on success. Modern signal handling interface. |
| 34 | pause | (none) | int (always -1) |
Suspends process until signal is received. Always returns -1 with errno=EINTR after signal handler returns. Used for waiting on signals. |
| 41 | socket | RDI: int domain RSI: int type RDX: int protocol |
int (socket fd or -1) |
Creates communication endpoint. Domain: AF_INET(2)=IPv4, AF_INET6(10)=IPv6. Type: SOCK_STREAM(1)=TCP, SOCK_DGRAM(2)=UDP. Returns socket file descriptor. |
| 49 | bind | RDI: int sockfd RSI: const struct sockaddr *addr RDX: socklen_t addrlen |
int (0 or -1) |
Assigns address (IP and port) to socket sockfd. Must be called before listen() for servers. Returns 0 on success. Port numbers below 1024 require root. |
| 50 | listen | RDI: int sockfd RSI: int backlog |
int (0 or -1) |
Marks socket as passive (ready to accept connections). Backlog specifies maximum queue length for pending connections. Returns 0 on success. |
| 43 | accept | RDI: int sockfd RSI: struct sockaddr *addr RDX: socklen_t *addrlen |
int (new socket fd or -1) |
Accepts incoming connection on listening socket. Blocks until connection arrives. Returns new socket for the connection, original socket continues listening. |
| 42 | connect | RDI: int sockfd RSI: const struct sockaddr *addr RDX: socklen_t addrlen |
int (0 or -1) |
Initiates connection to remote address. For TCP, performs 3-way handshake. Blocks until connection established or timeout. Returns 0 on success. |
| 44 | sendto | RDI: int sockfd RSI: const void *buf RDX: size_t len R10: int flags R8: const struct sockaddr *dest_addr R9: socklen_t addrlen |
ssize_t (bytes sent or -1) |
Sends message on socket to specific address. For UDP sockets. Use send() for connected sockets. Returns number of bytes sent. |
| 45 | recvfrom | RDI: int sockfd RSI: void *buf RDX: size_t len R10: int flags R8: struct sockaddr *src_addr R9: socklen_t *addrlen |
ssize_t (bytes received or -1) |
Receives message from socket and captures sender's address. For UDP. Returns number of bytes received, 0 on connection close. |
| 201 | time | RDI: time_t *tloc | time_t (seconds since epoch) |
Returns current time as seconds since Unix epoch (Jan 1, 1970). If tloc is non-NULL, also stores there. Simple but low precision (1 second). |
| 96 | gettimeofday | RDI: struct timeval *tv RSI: struct timezone *tz |
int (0 or -1) |
Gets current time with microsecond precision. tv contains seconds and microseconds since epoch. tz is obsolete (pass NULL). Returns 0 on success. |
| 35 | nanosleep | RDI: const struct timespec *req RSI: struct timespec *rem |
int (0 or -1) |
Suspends execution for time specified in req (seconds + nanoseconds). If interrupted by signal, remaining time stored in rem. Returns 0 on success. |
| 228 | clock_gettime | RDI: clockid_t clk_id RSI: struct timespec *tp |
int (0 or -1) |
Retrieves time from specified clock. clk_id: CLOCK_REALTIME(0)=wall clock, CLOCK_MONOTONIC(1)=monotonic time. Nanosecond precision. Returns 0 on success. |
| 83 | mkdir | RDI: const char *pathname RSI: mode_t mode |
int (0 or -1) |
Creates directory specified by pathname. Mode specifies permissions (e.g., 0755). Returns 0 on success, -1 if already exists or permission denied. |
| 84 | rmdir | RDI: const char *pathname | int (0 or -1) |
Removes empty directory. Returns -1 if directory not empty or doesn't exist. Returns 0 on success. |
| 87 | unlink | RDI: const char *pathname | int (0 or -1) |
Deletes file specified by pathname. Decrements link count; if 0 and no process has file open, file is deleted. Returns 0 on success. |
| 82 | rename | RDI: const char *oldpath RSI: const char *newpath |
int (0 or -1) |
Renames/moves file from oldpath to newpath. Atomic operation. If newpath exists, it's replaced. Returns 0 on success. |
| 90 | chmod | RDI: const char *pathname RSI: mode_t mode |
int (0 or -1) |
Changes file permissions. Mode is octal like 0644 (rw-r--r--) or 0755 (rwxr-xr-x). Returns 0 on success. Only owner or root can change permissions. |
| 92 | chown | RDI: const char *pathname RSI: uid_t owner RDX: gid_t group |
int (0 or -1) |
Changes file owner and/or group. Pass -1 to leave unchanged. Only root can change owner. File owner can change group to one they belong to. Returns 0 on success. |
| 80 | chdir | RDI: const char *path | int (0 or -1) |
Changes current working directory to path. Affects relative path resolution. Returns 0 on success. Each process has its own working directory. |
| 79 | getcwd | RDI: char *buf RSI: size_t size |
char* (buf or NULL) |
Copies current working directory into buf (max size bytes). Returns buf on success, NULL on error. Size must be large enough for full path. |
| 21 | access | RDI: const char *pathname RSI: int mode |
int (0 or -1) |
Checks whether calling process can access file. Mode: F_OK(0)=exists, R_OK(4)=read, W_OK(2)=write, X_OK(1)=execute. Returns 0 if permitted. |
| 86 | link | RDI: const char *oldpath RSI: const char *newpath |
int (0 or -1) |
Creates hard link named newpath to existing file oldpath. Both names refer to same inode. Deleting one doesn't affect the other. Returns 0 on success. |
| 88 | symlink | RDI: const char *target RSI: const char *linkpath |
int (0 or -1) |
Creates symbolic link named linkpath containing string target. Symlink can point to non-existent files and cross filesystems. Returns 0 on success. |
| 89 | readlink | RDI: const char *pathname RSI: char *buf RDX: size_t bufsiz |
ssize_t (bytes copied or -1) |
Reads value of symbolic link pathname into buf. Does not null-terminate. Returns number of bytes placed in buf, -1 on error. |
| 10 | mprotect | RDI: void *addr RSI: size_t len RDX: int prot |
int (0 or -1) |
Changes memory protection for page(s) starting at addr. Prot: PROT_NONE(0), PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Used for security and exploit mitigation. Returns 0 on success. |
| 28 | madvise | RDI: void *addr RSI: size_t length RDX: int advice |
int (0 or -1) |
Gives kernel advice about memory usage. Advice: MADV_NORMAL(0), MADV_SEQUENTIAL(2), MADV_DONTNEED(4). Hints for performance optimization. Returns 0 on success. |
| 56 | clone | RDI: unsigned long flags RSI: void *child_stack RDX: int *ptid R10: int *ctid R8: unsigned long newtls |
pid_t (child PID or -1) |
Creates new process/thread. More flexible than fork(). Flags control what is shared (memory, files, etc.). Used to implement threads. Returns child PID in parent, 0 in child. |
| 231 | exit_group | RDI: int status | void (never returns) |
Terminates all threads in calling process's thread group. Like exit(), but affects all threads. Used by exit() in threaded programs. Never returns. |
| 157 | prctl | RDI: int option RSI: unsigned long arg2 RDX: unsigned long arg3 R10: unsigned long arg4 R8: unsigned long arg5 |
int (varies) |
Process control operations. Options include PR_SET_NAME (set process name), PR_SET_DUMPABLE, PR_SET_SECCOMP. Highly versatile. Return value depends on option. |
| 37 | alarm | RDI: unsigned int seconds | unsigned int (previous alarm) |
Arranges for SIGALRM to be delivered in seconds. Pass 0 to cancel. Returns number of seconds remaining from previous alarm. Only one alarm can be scheduled. |
| 72 | fcntl | RDI: unsigned int fd RSI: unsigned int cmd RDX: unsigned long arg |
int (varies) |
Performs various operations on file descriptor. Cmd: F_GETFL(3)=get flags, F_SETFL(4)=set flags, F_DUPFD(0)=duplicate. Returns depend on cmd. |
| 76 | truncate | RDI: const char *path RSI: off_t length |
int (0 or -1) |
Truncates file to specified length. If longer, extra data is discarded. If shorter, extended with null bytes. Returns 0 on success. |
| 74 | fsync | RDI: unsigned int fd | int (0 or -1) |
Synchronizes file's in-memory state with storage device (flushes all modified data and metadata). Returns 0 when data is safely on disk. Critical for data integrity. |
| 14 | rt_sigprocmask | RDI: int how RSI: const sigset_t *set RDX: sigset_t *oldset |
int (0 or -1) |
Examines and changes blocked signals. How: SIG_BLOCK(0)=add, SIG_UNBLOCK(1)=remove, SIG_SETMASK(2)=replace. Returns 0 on success. Used to protect critical sections. |
| 131 | sigaltstack | RDI: const stack_t *ss RSI: stack_t *old_ss |
int (0 or -1) |
Sets or gets alternate signal stack. Used when main stack is compromised (stack overflow). Returns 0 on success. Important for robust signal handling. |
| 101 | ptrace | RDI: long request RSI: pid_t pid RDX: void *addr R10: void *data |
long (varies) |
Process trace and debug. Allows parent to control child execution, read/write memory and registers. Used by debuggers (GDB). Powerful and dangerous. Returns vary by request. |
| 169 | reboot | RDI: int magic RSI: int magic2 RDX: int cmd R10: void *arg |
int (never on success) |
Reboots or halts system. Requires CAP_SYS_BOOT capability (root). Cmd: LINUX_REBOOT_CMD_RESTART, HALT, POWER_OFF. For emergency use. Returns only on error. |
| 23 | select | RDI: int nfds RSI: fd_set *readfds RDX: fd_set *writefds R10: fd_set *exceptfds R8: struct timeval *timeout |
int (ready fds or -1) |
Monitors multiple file descriptors for I/O readiness. Returns when fd is ready or timeout. Used for non-blocking I/O multiplexing. Returns number of ready fds. |
| 7 | poll | RDI: struct pollfd *fds RSI: nfds_t nfds RDX: int timeout |
int (ready fds or -1) |
Like select but better API. Monitors file descriptors for events (POLLIN, POLLOUT, POLLERR). Timeout in milliseconds (-1=infinite). Returns number of ready fds. |
| 213 | epoll_create | RDI: int size | int (epoll fd or -1) |
Creates epoll instance for scalable I/O event notification. More efficient than select/poll for many file descriptors. Size is ignored (kept for compatibility). Returns epoll fd. |
This table contains 80+ essential Linux x86-64 system calls with full argument details and descriptions. For a complete list of all 300+ syscalls, run man syscalls or check /usr/include/asm/unistd_64.h. You can also use ausyscall --dump if auditd is installed.
Complete Syscall Example: Write to Console
Let's write a complete program that uses syscalls to print "Hello World":
section .data
msg: db "Hello World", 0x0a
len: equ $ - msg ; Calculate length
section .text
global _start
_start:
; write syscall to print message
mov rax, 1 ; syscall: write
mov rdi, 1 ; fd: stdout
mov rsi, msg ; buffer
mov rdx, len ; length
syscall
; exit syscall
mov rax, 60 ; syscall: exit
mov rdi, 0 ; status: 0 (success)
syscall
Syscall Return Values & Error Handling
After a syscall, the kernel returns a value in RAX:
Error codes are typically in the range -1 to -4095. Common errors:
| Error Code | Constant | Meaning |
|---|---|---|
| -1 | EPERM | Operation not permitted |
| -2 | ENOENT | No such file or directory |
| -13 | EACCES | Permission denied |
| -14 | EFAULT | Bad address |
mov rax, 2 ; open syscall
mov rdi, filename
mov rsi, 0 ; O_RDONLY
syscall
; Check if error (RAX < 0)
cmp rax, 0
jl error_handler ; Jump if less (negative)
; File opened successfully, RAX contains FD
mov rbx, rax ; Save FD in RBX
jmp continue
error_handler:
; Handle error - RAX contains negative error code
neg rax ; Convert to positive for easier reading
; Now RAX = positive error code
You now understand how user programs communicate with the kernel. This is essential for writing assembly programs and understanding low-level program behavior.
Assembly Language - Complete Tutorial
Assembly language is the lowest-level programming language that directly corresponds to machine instructions. Each assembly instruction performs one CPU operation. To reverse engineer binaries, you must be able to read and understand assembly.
What is Assembly?
Assembly is a symbolic representation of machine code. Instead of writing binary (1s and 0s), you write mnemonics like MOV, ADD, JMP that are more readable. An assembler converts this into machine code.
Assembly: mov rax, 1
Machine code (hex): 48 c7 c0 01 00 00 00
They're the same instruction, just different representations.
Instruction Format
Most assembly instructions follow this format:
Important: In Intel syntax (which we use), the destination comes first, then the source. This is opposite to AT&T syntax.
Core Assembly Instructions
Memory Operations - Loading & Storing
To access memory, use square brackets []:
mov rax, [rbx] ; Load 8 bytes from address in RBX
mov [rax], 100 ; Store 100 at address in RAX
mov rcx, [rax + 8] ; Load from address (RAX + 8)
mov [rbx - 16], rax ; Store to address (RBX - 16)
Stack Operations - PUSH & POP
The stack is a Last-In-First-Out (LIFO) data structure. PUSH and POP manage it:
Complete Assembly Program Example
Let's combine everything into a complete program:
section .text
global _start
_start:
mov rax, 0 ; Counter = 0
loop:
; Print number (simplified)
add rax, 1 ; Increment counter
cmp rax, 10 ; Compare with 10
jl loop ; Jump if Less - repeat loop
; Exit
mov rax, 60
mov rdi, 0
syscall
You now understand the fundamental instructions that make up all programs. These instructions are the building blocks of everything in reverse engineering.
In the next sections, we'll learn how to structure assembly code, use assemblers and linkers, debug with GDB, and analyze binaries with professional tools.
Assembly Code Structure
Every assembly program has a specific structure with defined sections for code and data. Understanding this structure is essential for writing and analyzing assembly.
The Three Main Sections
Complete Program Structure
; ============================================
; DATA SECTION - Initialized data
; ============================================
section .data
msg: db "Hello", 0x0a
msg_len: equ $ - msg
; ============================================
; BSS SECTION - Uninitialized data (buffers)
; ============================================
section .bss
buffer: resb 1024
; ============================================
; TEXT SECTION - Code (executable)
; ============================================
section .text
global _start
_start:
; Program entry point
mov rax, 1 ; write syscall
mov rdi, 1 ; fd = stdout
mov rsi, msg ; buffer = msg
mov rdx, msg_len ; count = length
syscall
; Exit cleanly
mov rax, 60
mov rdi, 0
syscall
Global Symbols & Labels
In assembly, you can define:
global _start ; _start is accessible from outside
section .text
_start: ; Program entry (global symbol)
call my_function
mov rax, 60
syscall
my_function: ; Local label (not global)
mov rax, 1
ret
loop_start: ; Another label
add rcx, 1
cmp rcx, 10
jl loop_start ; Jump back to label
Symbol Definition with EQU
Use EQU to define constants:
section .data
msg: db "Hello World", 0x0a
msg_len: equ $ - msg ; $ = current position, msg_len = length
section .text
global _start
BUFFER_SIZE equ 256
_start:
sub rsp, BUFFER_SIZE ; Allocate space on stack
Key insight: $ - msg calculates the distance between current position and msg, giving the string length.
Now you can organize assembly programs properly with code and data sections.
Assembler, Compiler, Linker & ELF Format
To run an assembly program, you need to convert it from assembly language to machine code. This involves the assembler, linker, and understanding the ELF binary format.
The Compilation Pipeline
Step 1: Assembly â Machine Code (NASM)
NASM (Netwide Assembler) converts assembly source code to machine code object files.
nasm -f elf64 program.asm -o program.o
What happens:
The object file (.o) is NOT yet executable. It contains machine code but references to external symbols aren't resolved. We need the linker.
Step 2: Linking (LD)
The linker (ld) combines object files and resolves all symbol references to create the final executable.
ld program.o -o program
Complete Assembly Workflow Example
# Step 1: Create assembly file
cat > program.asm << 'EOF'
section .text
global _start
_start:
mov rax, 60
mov rdi, 0
syscall
EOF
# Step 2: Assemble (ASM â Object)
nasm -f elf64 program.asm -o program.o
# Step 3: Link (Object â Executable)
ld program.o -o program
# Step 4: Run
./program
echo $? ; Exit code: 0 (success)
Understanding ELF64 Format
ELF (Executable and Linkable Format) is the standard binary format for Linux. All executables, libraries, and object files use this format.
An ELF file contains:
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
Meaning of each part:
| Term | Meaning |
|---|---|
| 64-bit | 64-bit architecture (x86-64) |
| LSB | Little-Endian Byte Order (least significant byte first) |
| executable | Can be directly run as a program |
| x86-64 | Intel x86-64 instruction set |
| statically linked | All libraries compiled in (no external .so dependencies) |
| not stripped | Symbol table intact (function names visible) |
Viewing ELF Structure with readelf & objdump
You can now write assembly, assemble it with NASM, link with LD, and understand the ELF binary format.
Reverse Engineering with GDB
GDB (GNU Debugger) is the industry-standard debugger for Linux. It lets you execute programs step-by-step, inspect memory and registers, and understand exactly what code is doing.
GDB Installation
# Ubuntu/Debian
sudo apt-get install gdb
# Fedora/RHEL
sudo dnf install gdb
# macOS
brew install gdb
Starting GDB
gdb ./program
gdb -q ./program ; Quiet mode (no banner)
Essential GDB Commands
Complete GDB Debugging Walkthrough
Let's debug a real binary step-by-step:
(gdb) gdb ./crackme
(gdb) set disassembly-flavor intel ; Use Intel syntax
(gdb) info functions ; List all functions
(gdb) break main ; Set breakpoint at main
(gdb) run secret123 ; Run with password argument
Breakpoint 1 at 0x0010149a
(gdb) disassemble main ; View main function code
(gdb) si ; Step into first instruction
(gdb) info registers ; Check all registers
(gdb) x/s $rdi ; View command-line arg (1st arg in RDI)
(gdb) continue ; Run to next breakpoint
(gdb) quit ; Exit GDB
pwndbg - Enhanced GDB
pwndbg is an awesome GDB plugin that adds powerful reverse engineering features.
git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh
pwndbg enhancements:
You can now debug binaries, inspect memory, and understand program execution flow in real-time.
Radare2 - Advanced Binary Analysis
Radare2 is a powerful, open-source framework for reverse engineering and analyzing binaries. It combines static analysis, dynamic analysis, and visualization in one tool.
Radare2 Installation
# Linux
git clone https://github.com/radareorg/radare2
cd radare2
sys/install.sh
# Or via package manager
sudo apt-get install radare2
Launching Radare2
r2 ./binary ; Open binary for analysis
r2 -w ./binary ; Write mode (can modify binary)
Essential Radare2 Commands
Complete Radare2 Workflow
$ r2 ./crackme
[0x08048400]> aaa ; Analyze everything
[0x08048400]> afl ; List functions
[0x08048400]> iz ; Show strings
[0x08048400]> pdf @ sym.main ; View main function
[0x08048400]> V ; Visual mode to explore
You can now analyze binaries statically with Radare2 and visualize code flow.
Static Analysis - Professional Tools
Static analysis means examining a binary without running it. You analyze code structure, disassembly, and data flow to understand what a program does. Professional tools like Ghidra, IDA Pro, and Binary Ninja dominate this space.
Professional Static Analysis Tools
Command-Line Static Analysis Tools
Essential tools for quick binary inspection and analysis:
Advanced Static Analysis Tools
String Analysis & Pattern Matching
Beyond basic string extraction, pattern analysis helps identify functionality:
strings ./binary | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" ; Find emails
strings ./binary | grep -E "https?://[^\s]+" ; Find URLs
strings ./binary | grep -E "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$" ; Find IP addresses
strings ./binary | grep -i "key\|password\|secret\|token\|api" ; Find credentials
Why strings are useful:
Control Flow Analysis
Understanding how code branches and jumps helps identify:
All professional tools (Ghidra, IDA, Binary Ninja) show control flow graphs that visualize this.
You can now use professional tools to analyze binaries without running them.
Dynamic Analysis - Runtime Behavior
Dynamic analysis means running the binary in a controlled environment while monitoring its behavior. Watch system calls, library calls, memory modifications, and network traffic to understand what code actually does.
System Call Tracing with strace
strace intercepts and logs all system calls made by a process.
strace ./program ; Trace all syscalls
strace -e trace=open,read ./program ; Trace specific syscalls
strace -o trace.txt ./program ; Save to file
strace -c ./program ; Summary (count syscalls)
strace -p 1234 ; Attach to running process
What strace reveals:
open("/etc/passwd", O_RDONLY) = 3
read(3, "root:x:0:0:root:/root:/bin/bash\n", 32) = 32
write(1, "User found!\n", 12) = 12
exit_group(0) = ?
Meaning: Program opened /etc/passwd, read 32 bytes, wrote "User found!" to stdout (fd 1), then exited with status 0.
Library Call Tracing with ltrace
ltrace traces library function calls (libc, libcrypto, etc.).
ltrace ./program ; Trace library calls
ltrace -c ./program ; Summary (count function calls)
ltrace -o trace.txt ./program ; Save to file
ltrace -e strcmp ./program ; Trace specific functions
Useful library functions to trace:
strcmp("admin123", "password123") = -1
puts("Incorrect password") = 19
exit(1)
Insight: Program compared input with "password123". Now you know the password!
Combined strace + ltrace
Use together for complete picture:
strace -f ltrace ./program ; Both (slower)
strace -e trace=file ./program ; Focus on file operations
Advanced Dynamic Analysis - Frida
Frida is a powerful instrumentation framework. Inject code into running processes to hook functions and modify behavior in real-time.
# Install
pip install frida frida-tools
# List processes
frida-ps
# Attach to process
frida -p 1234
# Spawn and trace
frida -n ./program
Frida capabilities:
You can now trace system calls, monitor library calls, and use advanced instrumentation.
Analyzing Stripped Binaries
A stripped binary has all debug symbols removed â function names, variable names, and type information are gone. This makes reverse engineering harder but not impossible.
Identifying Stripped Binaries
file ./program
Output examples:
not stripped - has symbols
stripped - symbols removed
file -i ./program ; MIME type info
readelf -S ./program ; Show sections
nm ./program ; Empty if stripped
objdump -t ./program ; Symbol table
Techniques for Stripped Binaries
Dynamic Analysis of Stripped Binaries
Use runtime tracing to understand behavior without symbols:
# Trace syscalls to understand behavior
strace -o syscalls.txt ./program
# Trace library calls
ltrace -o libcalls.txt ./program
# Use GDB to set breakpoints and inspect registers
gdb ./program
(gdb) break *0x401000
(gdb) run
(gdb) info registers ; See actual values
Practical Example - Analyzing Stripped Binary
# 1. Identify if stripped
$ file ./crackme
crackme: ELF 64-bit, stripped
# 2. Extract strings - look for clues
$ strings ./crackme | grep -i password
Incorrect password
Access granted
# 3. Open in Ghidra
- Window â Function ID â Load standard library
- Many stdlib functions now identified
- Search â For Strings â Find "password" references
- Double-click string to see code using it
# 4. Analyze the function using string as anchor
- Look at function prologue/epilogue
- Identify comparisons and jumps
- Look for password check logic
# 5. Use dynamic analysis if stuck
$ ltrace ./crackme
strcmp("myinput", "secretpass") = -37
puts("Incorrect password") = 19
Now you know the password!
You can identify functions, recover symbols, and analyze behavior even without debug information.
Binary Patching - Code Modification
Binary patching means modifying a binary's machine code to change its behavior. Used to bypass password checks, remove license verification, or modify logic flow.
Why Patch Binaries?
Three Patching Approaches
Real-World Patching Example
# Binary: crackme - asks for password
$ ./crackme
Enter password: test
Incorrect!
# Step 1: Open in Ghidra, find password check
0x401234: mov rax, [rip + 0x2dc6] ; Load input
0x40123b: mov rbx, [rip + 0x2dc5] ; Load expected password
0x401242: cmp rax, rbx ; Compare
0x401245: jne 0x401260 ; Jump to fail if not equal
0x401247: call print_success ; Otherwise print success
# Step 2: We want to skip the jne (jump to fail)
# Option A: Replace jne with NOPs
jne opcode at 0x401245: 75 19 (2 bytes)
Replace with: 90 90 (2 NOPs)
# Step 3: Use hex editor to patch
Go to file offset 0x401245
Find bytes: 75 19
Replace with: 90 90
Save file
# Step 4: Test
$ ./crackme_patched
Enter password: anything
Success!
# Password check bypassed! Any input works now
Common Patching Targets
| What to Patch | Pattern | Replacement |
|---|---|---|
| Password check | cmp; jne failure | Replace jne with NOPs |
| License validation | call validate_license; jne fail | NOP out the jne |
| Anti-debug | call is_debugged; jne exit | Make function return 0 |
| Trial expiration | cmp rax, expiration_date | Change expiration_date value |
| Error message | lea rdi, [rip + error_str] | Change string pointer/content |
You can modify binaries to change behavior, bypass checks, and test modifications.
Anti-Reversing Techniques & Bypasses
Software developers implement anti-reversing techniques to protect intellectual property and prevent cracking. Understanding these techniques helps you bypass them and analyze protected binaries.
Common Anti-Reversing Techniques
ASLR - Address Space Layout Randomization
ASLR randomizes memory addresses each run. Makes exploitation and analysis harder.
# Check ASLR status
cat /proc/sys/kernel/randomize_va_space
0 = disabled, 1 = conservative, 2 = full
# Disable ASLR (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
# Or run single binary without ASLR
setarch $(uname -m) -R ./program
# In GDB
(gdb) set disable-randomization on
Stack Canaries
Stack canaries detect buffer overflows by placing magic value before return address.
checksec ./program
Output shows: Canary found = yes/no
readelf -x .note.gnu.property ./program
Look for 0x1 bit in CF_PROTECTION_BRANCH
DEP/NX - Data Execution Prevention
DEP/NX marks data pages as non-executable. Prevents shellcode execution.
checksec ./program
Output shows: NX enabled/disabled
readelf -l ./program | grep GNU_STACK
RWX = no NX protection, RW = NX enabled
You understand how protections work and how to bypass them.
angr - Automated Symbolic Execution
angr is a powerful binary analysis framework that uses symbolic execution to find inputs that reach specific code paths. Instead of manually analyzing, angr explores all possible paths and solves constraints.
What is Symbolic Execution?
Instead of concrete values, variables are treated as symbolic â representing all possible values. Branches create constraints.
NORMAL EXECUTION:
input = 5
if input > 10:
print("big")
else:
print("small") â This path taken
SYMBOLIC EXECUTION:
input = X (symbolic variable)
if input > 10:
â Explores this path (constraint: X > 10)
print("big")
if input ⤠10:
â Explores this path too (constraint: X ⤠10)
print("small")
Result: angr finds values satisfying each constraint!
Installation & Setup
pip install angr
pip install angr[all] ; Install with optional dependencies
Basic angr Workflow
import angr
# Load binary
project = angr.Project("./crackme")
# Create symbolic variable for input (stdin)
initial_state = project.factory.entry_state(
stdin=angr.SimFile(content_size=16) ; 16-byte input
)
# Create simulation manager
simgr = project.factory.simgr(initial_state)
# Address of success message
success_addr = 0x401234
failure_addr = 0x401256
# Explore until we find success or hit failure
simgr.explore(
find=success_addr,
avoid=failure_addr
)
# Get the solution
if simgr.found:
solution_state = simgr.found[0]
solution = solution_state.posix.dumps(0) ; 0 = stdin
print(f"Password found: {solution.decode()}")
else:
print("No solution found")
Key angr Concepts
Real-World Example - CTF Challenge
import angr
import claripy
# Load the binary
binary_path = "./crackme"
project = angr.Project(binary_path, auto_load_libs=False)
# Create initial state (execution starts at main)
main_address = 0x401234 ; Address of main()
state = project.factory.blank_state(addr=main_address)
# Create symbolic argv[1] (16 bytes)
password = claripy.BVS('password', 128) ; 16 bytes * 8 bits
# Simulate program with symbolic input in argv[1]
# (assumes binary reads argv[1] as password)
# Create simulation manager
simgr = project.factory.simgr(state)
# Explore - find "Correct!" message at 0x401300
; avoid "Incorrect!" at 0x401350
simgr.explore(find=0x401300, avoid=[0x401350])
# Check results
if simgr.found:
solution_state = simgr.found[0]
password_value = solution_state.solver.eval(password, cast_to=bytes)
print(f"[+] Password found: {password_value}")
else:
print("[-] No solution found")
if simgr.avoided:
print(f"[!] Hit avoided addresses: {simgr.avoided}")
Advanced Techniques
When angr Excels vs Struggles
| Best For | Struggles With |
|---|---|
| Finding password/key (simple comparison) | Complex floating-point math |
| Reaching specific code path | Cryptographic operations (very slow) |
| Constraint solving (small inputs) | Large state spaces (too many branches) |
| CTF challenges (designed for automation) | Real-world complex binaries |
You can now automate binary analysis and solve constraints to find inputs reaching target code paths.
đŽ Assembly Simulator & Practice
Learn assembly by writing and executing code in real-time. This interactive simulator lets you write assembly instructions, step through execution, and watch registers and memory change.
Assembly Code Editor
| RAX | 0x0 |
| RBX | 0x0 |
| RCX | 0x0 |
| RDX | 0x0 |
| RSI | 0x0 |
| RDI | 0x0 |
| RIP | 0x0 |
| ZF (Zero) | 0 |
| CF (Carry) | 0 |
| SF (Sign) | 0 |
| OF (Overflow) | 0 |