Welcome to CYB3RFY Reverse Engineering

I'm cyb3rfy — creator of the CYB3RFY YouTube channel. I publish CTF walkthroughs and TryHackMe rooms. I'm an intermediate CTF player — I often rank high in competitions, but for a long time I struggled with reverse engineering challenges. I didn't know where to start, what tools to use, or how to read assembly.

Because of that, I lost many potential top ranks.

After studying for weeks — assembly, registers, syscalls, GDB, IDA, Ghidra, radare2, strings, and more — I created notes that helped me finally understand everything.

This website is built from those notes — a complete beginner guide so anyone can start reverse engineering the right way.

What You'll Learn

This comprehensive guide covers everything from CPU architecture and assembly language to professional reverse engineering tools. Whether you're preparing for CTF competitions, bug bounty hunting, or building security expertise, you'll find detailed explanations, real code examples, practical exercises, and command references.

How to Use This Guide

Start with CPU & Registers to understand the foundation
Progress through Assembly Language basics
Learn debugging with GDB and Radare2
Master static and dynamic analysis techniques
Practice with real binaries and challenges
Use the searchable command database for quick lookups
â„šī¸ Search Feature

Use the search bar at the top to instantly find any assembly instruction, GDB command, radare2 command, syscall, or reverse engineering concept. Every result includes detailed explanations and examples.

CPU & Registers

The CPU (Central Processing Unit) is the brain of your computer. To reverse engineer binaries, you must understand how CPUs work at a fundamental level. This chapter explains CPU architecture, registers, and why they matter for security.

What is a CPU?

A CPU executes instructions in sequence. It reads data from memory, processes it, and writes results back. The two main components of a CPU are:

Control Unit (CU)
The Control Unit directs traffic in the CPU. It reads instructions from memory, decodes them, tells other parts what to do, and manages the flow of data. Think of it as the conductor of an orchestra.
Execution Unit (EU)
The Execution Unit actually performs calculations and operations. It executes arithmetic (ADD, SUB), logical operations (AND, OR), comparisons (CMP), and memory operations. It's the worker that does the real work.

What Are Registers?

Registers are tiny, ultra-fast storage units inside the CPU. They hold data that the CPU is actively using. Unlike RAM (which is gigabytes), registers are measured in bits and are incredibly fast.

Why Registers Matter for Reverse Engineering

When you debug a binary with GDB or examine assembly code, you're watching data move through registers. Understanding registers is essential because:

Function arguments are passed through registers
Return values are stored in registers
Local variables are often kept in registers
System calls use specific registers
Many vulnerabilities involve register manipulation

Register Hierarchy (x86-64 Architecture)

On modern 64-bit x86 CPUs, registers come in different sizes. The main "General Purpose Registers" are:

64-bit (QWORD) 32-bit (DWORD) 16-bit (WORD) 8-bit HIGH 8-bit LOW
RAX EAX AX AH AL
RBX EBX BX BH BL
RCX ECX CX CH CL
RDX EDX DX DH DL
RSI ESI SI -
RDI EDI DI -
RSP ESP SP -
RBP EBP BP -
R8 R8D R8W R8B
R9 R9D R9W R9B
R10 R10D R10W R10B
R11 R11D R11W R11B
R12 R12D R12W R12B
R13 R13D R13W R13B
R14 R14D R14W R14B
R15 R15D R15W R15B

When you use a 32-bit register (like EAX), it automatically zeros the upper 32 bits of the 64-bit register (RAX). This is important in reverse engineering because it affects what data is preserved.

Key x86-64 Registers

Special Registers - Flags Register (RFLAGS)

The RFLAGS (or FLAGS in 32-bit) register contains condition flags — single bits that indicate the status of the last operation.

Flag Name Meaning Set When
ZF Zero Flag Result is zero Result of last operation = 0
CF Carry Flag Unsigned overflow Addition/subtraction carries/borrows
SF Sign Flag Result is negative MSB (most significant bit) = 1
OF Overflow Flag Signed overflow Signed arithmetic overflow occurs
PF Parity Flag Even parity Result has even number of 1 bits
AF Adjust Flag BCD carry Carry in lower nibble

Why flags matter: Conditional jumps (JE, JNE, JZ, JG, etc.) check these flags to decide whether to jump. Understanding flags is essential for reading assembly code.

Example: Flags in Action
cmp rax, rbx ; Compare RAX with RBX (subtract, discard result) ; Sets ZF=1 if RAX==RBX, ZF=0 if different je success ; Jump if Equal (checks ZF flag) mov rax, 0 ; If not equal, RAX = 0 jmp end success: mov rax, 1 ; If equal, RAX = 1 end:

x86-64 Calling Convention

When a function is called, arguments are passed through specific registers in a specific order. This is the calling convention. Understanding it is crucial for debugging.

📌 x86-64 System V ABI (Linux/Unix)

This is the calling convention used on 64-bit Linux systems:

Argument # Register
1st argument RDI
2nd argument RSI
3rd argument RDX
4th argument RCX
5th argument R8
6th argument R9
Return value RAX

Any arguments beyond the 6th are passed on the stack.

Example: Function Call with 4 Arguments
mov rdi, 10 ; arg1 = 10 mov rsi, 20 ; arg2 = 20 mov rdx, 30 ; arg3 = 30 mov rcx, 40 ; arg4 = 40 call my_function ; Call function ; RAX now contains the return value

Putting It Together

Now you understand:

CPUs have two main components: Control Unit and Execution Unit
Registers are ultra-fast CPU storage
Different registers have different purposes
The Flags register controls conditional jumps
Function arguments are passed through specific registers
RIP points to the next instruction
RSP and RBP manage the call stack
✓ You Now Understand CPU Architecture!

These fundamentals are essential. In the next section, you'll learn how to use this knowledge to understand system calls.

System Calls (Syscalls)

A system call is a request from a user program to the kernel to perform a privileged operation. When a program needs to read a file, write to the screen, or allocate memory, it can't do it directly — it must ask the kernel through a syscall.

User Space vs Kernel Space

Modern operating systems use a layered privilege model:

USER SPACE (Ring 3 - Unprivileged)
Applications
Firefox, Chrome
User Programs
./your_binary
Libraries
libc, libssl
Capabilities:
✓ Read/write own memory
✓ Perform calculations
✗ Cannot access hardware directly
✗ Cannot access other process memory
✗ Cannot perform privileged operations
SYSCALL INSTRUCTION
↓ Context Switch ↓
CPU transitions from Ring 3 → Ring 0
RAX = syscall number, RDI/RSI/RDX = arguments
KERNEL SPACE (Ring 0 - Privileged)
File System
open, read, write
Memory Mgmt
mmap, brk
Process Mgmt
fork, execve
Device Drivers
Hardware I/O
Full Capabilities:
✓ Access all memory
✓ Execute privileged instructions
✓ Control hardware devices
✓ Manage processes and resources
✓ Handle interrupts and exceptions
↑ RETURN ↑
Result in RAX, returns to user space
Why This Separation Matters

The user/kernel space separation is fundamental to operating system security. User programs cannot directly access hardware or other processes' memory. All privileged operations must go through the kernel via system calls, where permissions are checked and validated.

How System Calls Work

When your program executes a SYSCALL instruction:

1. Program sets up registers with syscall number and arguments
2. SYSCALL instruction transitions to kernel mode
3. Kernel executes the requested operation
4. Control returns to user program with result in RAX

x86-64 Linux Syscall ABI

On 64-bit Linux, syscalls follow a specific convention. Let's break it down:

📌 Syscall Number & Arguments

Every syscall has a number. You put that number in RAX, then set up arguments in specific registers:

Register Purpose
RAX Syscall number (which syscall to call)
RDI 1st argument
RSI 2nd argument
RDX 3rd argument
R10 4th argument (note: RCX for functions, but R10 for syscalls)
R8 5th argument
R9 6th argument

Return value: After the syscall, RAX contains the result (or an error code if negative).

Common Syscalls Explained

Complete System Call Database

Here's a comprehensive reference of Linux x86-64 system calls with arguments, return values, and detailed descriptions:

Filter by Category:
RAX Name Arguments (RDI, RSI, RDX, R10, R8, R9) Return Value Description
0 read RDI: unsigned int fd
RSI: char *buf
RDX: size_t count
ssize_t
(bytes read or -1)
Reads up to count bytes from file descriptor fd into buffer buf. Returns number of bytes read, 0 on EOF, or -1 on error. Commonly used with fd=0 (stdin) to read user input.
1 write RDI: unsigned int fd
RSI: const char *buf
RDX: size_t count
ssize_t
(bytes written or -1)
Writes up to count bytes from buffer buf to file descriptor fd. Returns number of bytes written or -1 on error. Use fd=1 (stdout) for console output, fd=2 (stderr) for error messages.
2 open RDI: const char *filename
RSI: int flags
RDX: mode_t mode
int
(file descriptor or -1)
Opens file specified by filename. Flags: O_RDONLY(0), O_WRONLY(1), O_RDWR(2), O_CREAT(64), O_APPEND(1024). Mode specifies permissions (e.g., 0644). Returns file descriptor on success.
3 close RDI: unsigned int fd int
(0 or -1)
Closes file descriptor fd, freeing the resource. Returns 0 on success, -1 on error. Always close files when done to prevent resource leaks.
4 stat RDI: const char *filename
RSI: struct stat *statbuf
int
(0 or -1)
Retrieves file information (size, permissions, timestamps) for filename and stores in statbuf. Returns 0 on success. Does not follow symlinks for lstat(6).
5 fstat RDI: unsigned int fd
RSI: struct stat *statbuf
int
(0 or -1)
Like stat, but operates on an already-opened file descriptor instead of a filename. Useful when you already have the file open.
8 lseek RDI: unsigned int fd
RSI: off_t offset
RDX: unsigned int whence
off_t
(new position or -1)
Repositions file offset of fd. Whence: SEEK_SET(0)=absolute, SEEK_CUR(1)=relative, SEEK_END(2)=from end. Returns new offset from beginning of file.
9 mmap RDI: void *addr
RSI: size_t length
RDX: int prot
R10: int flags
R8: int fd
R9: off_t offset
void*
(address or MAP_FAILED)
Maps file or device into memory. Prot: PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Flags: MAP_PRIVATE(2), MAP_ANONYMOUS(32). Returns pointer to mapped area. Critical for memory management.
11 munmap RDI: void *addr
RSI: size_t length
int
(0 or -1)
Unmaps a previously mapped memory region starting at addr with length. Returns 0 on success. Always unmap when done to free memory.
12 brk RDI: void *addr void*
(new break or -1)
Changes program break (end of data segment) to addr. Used by malloc() internally. Returns new program break on success. Rarely used directly in modern programs (prefer mmap).
16 ioctl RDI: unsigned int fd
RSI: unsigned int cmd
RDX: unsigned long arg
int
(varies)
Device-specific input/output control. Command and arguments vary by device. Used for operations that don't fit read/write model (terminal settings, disk operations, etc.).
22 pipe RDI: int pipefd[2] int
(0 or -1)
Creates a pipe (unidirectional data channel). pipefd[0] is read end, pipefd[1] is write end. Returns 0 on success. Used for inter-process communication.
32 dup RDI: unsigned int fildes int
(new fd or -1)
Duplicates file descriptor fildes using the lowest available fd number. Both fds refer to same file. Returns new fd on success.
33 dup2 RDI: unsigned int oldfd
RSI: unsigned int newfd
int
(new fd or -1)
Duplicates oldfd to newfd. If newfd is open, it's closed first. Commonly used to redirect stdin/stdout/stderr in child processes.
57 fork (none) pid_t
(child PID or 0 in child)
Creates new process by duplicating calling process. Returns child PID to parent, returns 0 in child process. Child gets copy of parent's memory and file descriptors.
59 execve RDI: const char *filename
RSI: char *const argv[]
RDX: char *const envp[]
int
(never returns on success)
Executes program specified by filename, replacing current process image. argv is argument array, envp is environment. Only returns on error. Used with fork() to run new programs.
60 exit RDI: int status void
(never returns)
Terminates calling process with exit status. 0 indicates success, non-zero indicates error. Flushes buffers, closes file descriptors, and returns status to parent.
61 wait4 RDI: pid_t pid
RSI: int *status
RDX: int options
R10: struct rusage *rusage
pid_t
(PID or -1)
Waits for child process to change state. Returns child PID on success. Status contains exit code. Used by parent to collect terminated children (prevent zombies).
39 getpid (none) pid_t
(process ID)
Returns process ID (PID) of calling process. Always succeeds. Useful for logging, creating unique filenames, or process identification.
110 getppid (none) pid_t
(parent PID)
Returns parent process ID of calling process. If parent has exited, returns 1 (init/systemd). Always succeeds.
102 getuid (none) uid_t
(user ID)
Returns real user ID of calling process. Used for permission checks. Always succeeds.
104 getgid (none) gid_t
(group ID)
Returns real group ID of calling process. Used for permission checks. Always succeeds.
105 setuid RDI: uid_t uid int
(0 or -1)
Sets effective user ID. If privileged, sets real, effective, and saved UIDs. Returns 0 on success. Used for privilege dropping or SUID executables.
106 setgid RDI: gid_t gid int
(0 or -1)
Sets effective group ID. If privileged, sets real, effective, and saved GIDs. Returns 0 on success.
62 kill RDI: pid_t pid
RSI: int sig
int
(0 or -1)
Sends signal sig to process pid. Common signals: SIGTERM(15), SIGKILL(9), SIGUSR1(10). Returns 0 on success. If pid=0, sends to process group.
13 rt_sigaction RDI: int sig
RSI: const struct sigaction *act
RDX: struct sigaction *oldact
int
(0 or -1)
Examines or changes signal handler for signal sig. act specifies new action, oldact receives old action. Returns 0 on success. Modern signal handling interface.
34 pause (none) int
(always -1)
Suspends process until signal is received. Always returns -1 with errno=EINTR after signal handler returns. Used for waiting on signals.
41 socket RDI: int domain
RSI: int type
RDX: int protocol
int
(socket fd or -1)
Creates communication endpoint. Domain: AF_INET(2)=IPv4, AF_INET6(10)=IPv6. Type: SOCK_STREAM(1)=TCP, SOCK_DGRAM(2)=UDP. Returns socket file descriptor.
49 bind RDI: int sockfd
RSI: const struct sockaddr *addr
RDX: socklen_t addrlen
int
(0 or -1)
Assigns address (IP and port) to socket sockfd. Must be called before listen() for servers. Returns 0 on success. Port numbers below 1024 require root.
50 listen RDI: int sockfd
RSI: int backlog
int
(0 or -1)
Marks socket as passive (ready to accept connections). Backlog specifies maximum queue length for pending connections. Returns 0 on success.
43 accept RDI: int sockfd
RSI: struct sockaddr *addr
RDX: socklen_t *addrlen
int
(new socket fd or -1)
Accepts incoming connection on listening socket. Blocks until connection arrives. Returns new socket for the connection, original socket continues listening.
42 connect RDI: int sockfd
RSI: const struct sockaddr *addr
RDX: socklen_t addrlen
int
(0 or -1)
Initiates connection to remote address. For TCP, performs 3-way handshake. Blocks until connection established or timeout. Returns 0 on success.
44 sendto RDI: int sockfd
RSI: const void *buf
RDX: size_t len
R10: int flags
R8: const struct sockaddr *dest_addr
R9: socklen_t addrlen
ssize_t
(bytes sent or -1)
Sends message on socket to specific address. For UDP sockets. Use send() for connected sockets. Returns number of bytes sent.
45 recvfrom RDI: int sockfd
RSI: void *buf
RDX: size_t len
R10: int flags
R8: struct sockaddr *src_addr
R9: socklen_t *addrlen
ssize_t
(bytes received or -1)
Receives message from socket and captures sender's address. For UDP. Returns number of bytes received, 0 on connection close.
201 time RDI: time_t *tloc time_t
(seconds since epoch)
Returns current time as seconds since Unix epoch (Jan 1, 1970). If tloc is non-NULL, also stores there. Simple but low precision (1 second).
96 gettimeofday RDI: struct timeval *tv
RSI: struct timezone *tz
int
(0 or -1)
Gets current time with microsecond precision. tv contains seconds and microseconds since epoch. tz is obsolete (pass NULL). Returns 0 on success.
35 nanosleep RDI: const struct timespec *req
RSI: struct timespec *rem
int
(0 or -1)
Suspends execution for time specified in req (seconds + nanoseconds). If interrupted by signal, remaining time stored in rem. Returns 0 on success.
228 clock_gettime RDI: clockid_t clk_id
RSI: struct timespec *tp
int
(0 or -1)
Retrieves time from specified clock. clk_id: CLOCK_REALTIME(0)=wall clock, CLOCK_MONOTONIC(1)=monotonic time. Nanosecond precision. Returns 0 on success.
83 mkdir RDI: const char *pathname
RSI: mode_t mode
int
(0 or -1)
Creates directory specified by pathname. Mode specifies permissions (e.g., 0755). Returns 0 on success, -1 if already exists or permission denied.
84 rmdir RDI: const char *pathname int
(0 or -1)
Removes empty directory. Returns -1 if directory not empty or doesn't exist. Returns 0 on success.
87 unlink RDI: const char *pathname int
(0 or -1)
Deletes file specified by pathname. Decrements link count; if 0 and no process has file open, file is deleted. Returns 0 on success.
82 rename RDI: const char *oldpath
RSI: const char *newpath
int
(0 or -1)
Renames/moves file from oldpath to newpath. Atomic operation. If newpath exists, it's replaced. Returns 0 on success.
90 chmod RDI: const char *pathname
RSI: mode_t mode
int
(0 or -1)
Changes file permissions. Mode is octal like 0644 (rw-r--r--) or 0755 (rwxr-xr-x). Returns 0 on success. Only owner or root can change permissions.
92 chown RDI: const char *pathname
RSI: uid_t owner
RDX: gid_t group
int
(0 or -1)
Changes file owner and/or group. Pass -1 to leave unchanged. Only root can change owner. File owner can change group to one they belong to. Returns 0 on success.
80 chdir RDI: const char *path int
(0 or -1)
Changes current working directory to path. Affects relative path resolution. Returns 0 on success. Each process has its own working directory.
79 getcwd RDI: char *buf
RSI: size_t size
char*
(buf or NULL)
Copies current working directory into buf (max size bytes). Returns buf on success, NULL on error. Size must be large enough for full path.
21 access RDI: const char *pathname
RSI: int mode
int
(0 or -1)
Checks whether calling process can access file. Mode: F_OK(0)=exists, R_OK(4)=read, W_OK(2)=write, X_OK(1)=execute. Returns 0 if permitted.
86 link RDI: const char *oldpath
RSI: const char *newpath
int
(0 or -1)
Creates hard link named newpath to existing file oldpath. Both names refer to same inode. Deleting one doesn't affect the other. Returns 0 on success.
88 symlink RDI: const char *target
RSI: const char *linkpath
int
(0 or -1)
Creates symbolic link named linkpath containing string target. Symlink can point to non-existent files and cross filesystems. Returns 0 on success.
89 readlink RDI: const char *pathname
RSI: char *buf
RDX: size_t bufsiz
ssize_t
(bytes copied or -1)
Reads value of symbolic link pathname into buf. Does not null-terminate. Returns number of bytes placed in buf, -1 on error.
10 mprotect RDI: void *addr
RSI: size_t len
RDX: int prot
int
(0 or -1)
Changes memory protection for page(s) starting at addr. Prot: PROT_NONE(0), PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Used for security and exploit mitigation. Returns 0 on success.
28 madvise RDI: void *addr
RSI: size_t length
RDX: int advice
int
(0 or -1)
Gives kernel advice about memory usage. Advice: MADV_NORMAL(0), MADV_SEQUENTIAL(2), MADV_DONTNEED(4). Hints for performance optimization. Returns 0 on success.
56 clone RDI: unsigned long flags
RSI: void *child_stack
RDX: int *ptid
R10: int *ctid
R8: unsigned long newtls
pid_t
(child PID or -1)
Creates new process/thread. More flexible than fork(). Flags control what is shared (memory, files, etc.). Used to implement threads. Returns child PID in parent, 0 in child.
231 exit_group RDI: int status void
(never returns)
Terminates all threads in calling process's thread group. Like exit(), but affects all threads. Used by exit() in threaded programs. Never returns.
157 prctl RDI: int option
RSI: unsigned long arg2
RDX: unsigned long arg3
R10: unsigned long arg4
R8: unsigned long arg5
int
(varies)
Process control operations. Options include PR_SET_NAME (set process name), PR_SET_DUMPABLE, PR_SET_SECCOMP. Highly versatile. Return value depends on option.
37 alarm RDI: unsigned int seconds unsigned int
(previous alarm)
Arranges for SIGALRM to be delivered in seconds. Pass 0 to cancel. Returns number of seconds remaining from previous alarm. Only one alarm can be scheduled.
72 fcntl RDI: unsigned int fd
RSI: unsigned int cmd
RDX: unsigned long arg
int
(varies)
Performs various operations on file descriptor. Cmd: F_GETFL(3)=get flags, F_SETFL(4)=set flags, F_DUPFD(0)=duplicate. Returns depend on cmd.
76 truncate RDI: const char *path
RSI: off_t length
int
(0 or -1)
Truncates file to specified length. If longer, extra data is discarded. If shorter, extended with null bytes. Returns 0 on success.
74 fsync RDI: unsigned int fd int
(0 or -1)
Synchronizes file's in-memory state with storage device (flushes all modified data and metadata). Returns 0 when data is safely on disk. Critical for data integrity.
14 rt_sigprocmask RDI: int how
RSI: const sigset_t *set
RDX: sigset_t *oldset
int
(0 or -1)
Examines and changes blocked signals. How: SIG_BLOCK(0)=add, SIG_UNBLOCK(1)=remove, SIG_SETMASK(2)=replace. Returns 0 on success. Used to protect critical sections.
131 sigaltstack RDI: const stack_t *ss
RSI: stack_t *old_ss
int
(0 or -1)
Sets or gets alternate signal stack. Used when main stack is compromised (stack overflow). Returns 0 on success. Important for robust signal handling.
101 ptrace RDI: long request
RSI: pid_t pid
RDX: void *addr
R10: void *data
long
(varies)
Process trace and debug. Allows parent to control child execution, read/write memory and registers. Used by debuggers (GDB). Powerful and dangerous. Returns vary by request.
169 reboot RDI: int magic
RSI: int magic2
RDX: int cmd
R10: void *arg
int
(never on success)
Reboots or halts system. Requires CAP_SYS_BOOT capability (root). Cmd: LINUX_REBOOT_CMD_RESTART, HALT, POWER_OFF. For emergency use. Returns only on error.
23 select RDI: int nfds
RSI: fd_set *readfds
RDX: fd_set *writefds
R10: fd_set *exceptfds
R8: struct timeval *timeout
int
(ready fds or -1)
Monitors multiple file descriptors for I/O readiness. Returns when fd is ready or timeout. Used for non-blocking I/O multiplexing. Returns number of ready fds.
7 poll RDI: struct pollfd *fds
RSI: nfds_t nfds
RDX: int timeout
int
(ready fds or -1)
Like select but better API. Monitors file descriptors for events (POLLIN, POLLOUT, POLLERR). Timeout in milliseconds (-1=infinite). Returns number of ready fds.
213 epoll_create RDI: int size int
(epoll fd or -1)
Creates epoll instance for scalable I/O event notification. More efficient than select/poll for many file descriptors. Size is ignored (kept for compatibility). Returns epoll fd.
Complete Reference

This table contains 80+ essential Linux x86-64 system calls with full argument details and descriptions. For a complete list of all 300+ syscalls, run man syscalls or check /usr/include/asm/unistd_64.h. You can also use ausyscall --dump if auditd is installed.

Complete Syscall Example: Write to Console

Let's write a complete program that uses syscalls to print "Hello World":

Assembly - Complete "Hello World" using syscalls
section .data msg: db "Hello World", 0x0a len: equ $ - msg ; Calculate length section .text global _start _start: ; write syscall to print message mov rax, 1 ; syscall: write mov rdi, 1 ; fd: stdout mov rsi, msg ; buffer mov rdx, len ; length syscall ; exit syscall mov rax, 60 ; syscall: exit mov rdi, 0 ; status: 0 (success) syscall

Syscall Return Values & Error Handling

After a syscall, the kernel returns a value in RAX:

If RAX â‰Ĩ 0: Success, RAX contains the result
If RAX < 0: Error occurred, RAX contains negative error code

Error codes are typically in the range -1 to -4095. Common errors:

Error Code Constant Meaning
-1 EPERM Operation not permitted
-2 ENOENT No such file or directory
-13 EACCES Permission denied
-14 EFAULT Bad address
Example: Error Handling with Syscalls
mov rax, 2 ; open syscall mov rdi, filename mov rsi, 0 ; O_RDONLY syscall ; Check if error (RAX < 0) cmp rax, 0 jl error_handler ; Jump if less (negative) ; File opened successfully, RAX contains FD mov rbx, rax ; Save FD in RBX jmp continue error_handler: ; Handle error - RAX contains negative error code neg rax ; Convert to positive for easier reading ; Now RAX = positive error code
✓ System Calls Mastered!

You now understand how user programs communicate with the kernel. This is essential for writing assembly programs and understanding low-level program behavior.

Assembly Language - Complete Tutorial

Assembly language is the lowest-level programming language that directly corresponds to machine instructions. Each assembly instruction performs one CPU operation. To reverse engineer binaries, you must be able to read and understand assembly.

What is Assembly?

Assembly is a symbolic representation of machine code. Instead of writing binary (1s and 0s), you write mnemonics like MOV, ADD, JMP that are more readable. An assembler converts this into machine code.

Assembly vs Machine Code

Assembly: mov rax, 1

Machine code (hex): 48 c7 c0 01 00 00 00

They're the same instruction, just different representations.

Instruction Format

Most assembly instructions follow this format:

OPCODE destination, source Example: MOV RAX, RBX - OPCODE: MOV - destination: RAX (where to put the result) - source: RBX (where to get the data)

Important: In Intel syntax (which we use), the destination comes first, then the source. This is opposite to AT&T syntax.

Core Assembly Instructions

Memory Operations - Loading & Storing

To access memory, use square brackets []:

Memory Operations
mov rax, [rbx] ; Load 8 bytes from address in RBX mov [rax], 100 ; Store 100 at address in RAX mov rcx, [rax + 8] ; Load from address (RAX + 8) mov [rbx - 16], rax ; Store to address (RBX - 16)

Stack Operations - PUSH & POP

The stack is a Last-In-First-Out (LIFO) data structure. PUSH and POP manage it:

Complete Assembly Program Example

Let's combine everything into a complete program:

Complete Assembly - Print numbers using loop
section .text global _start _start: mov rax, 0 ; Counter = 0 loop: ; Print number (simplified) add rax, 1 ; Increment counter cmp rax, 10 ; Compare with 10 jl loop ; Jump if Less - repeat loop ; Exit mov rax, 60 mov rdi, 0 syscall
✓ Assembly Language Basics Mastered!

You now understand the fundamental instructions that make up all programs. These instructions are the building blocks of everything in reverse engineering.

â„šī¸ Continue Learning

In the next sections, we'll learn how to structure assembly code, use assemblers and linkers, debug with GDB, and analyze binaries with professional tools.

Assembly Code Structure

Every assembly program has a specific structure with defined sections for code and data. Understanding this structure is essential for writing and analyzing assembly.

The Three Main Sections

Complete Program Structure

Complete Assembly Program Structure
; ============================================ ; DATA SECTION - Initialized data ; ============================================ section .data msg: db "Hello", 0x0a msg_len: equ $ - msg ; ============================================ ; BSS SECTION - Uninitialized data (buffers) ; ============================================ section .bss buffer: resb 1024 ; ============================================ ; TEXT SECTION - Code (executable) ; ============================================ section .text global _start _start: ; Program entry point mov rax, 1 ; write syscall mov rdi, 1 ; fd = stdout mov rsi, msg ; buffer = msg mov rdx, msg_len ; count = length syscall ; Exit cleanly mov rax, 60 mov rdi, 0 syscall

Global Symbols & Labels

In assembly, you can define:

Labels: Mark positions in code (used for jumps)
Global symbols: Mark entry points that external code can jump to
Labels and Symbols
global _start ; _start is accessible from outside section .text _start: ; Program entry (global symbol) call my_function mov rax, 60 syscall my_function: ; Local label (not global) mov rax, 1 ret loop_start: ; Another label add rcx, 1 cmp rcx, 10 jl loop_start ; Jump back to label

Symbol Definition with EQU

Use EQU to define constants:

Using EQU for Constants
section .data msg: db "Hello World", 0x0a msg_len: equ $ - msg ; $ = current position, msg_len = length section .text global _start BUFFER_SIZE equ 256 _start: sub rsp, BUFFER_SIZE ; Allocate space on stack

Key insight: $ - msg calculates the distance between current position and msg, giving the string length.

✓ Program Structure Understood!

Now you can organize assembly programs properly with code and data sections.

Assembler, Compiler, Linker & ELF Format

To run an assembly program, you need to convert it from assembly language to machine code. This involves the assembler, linker, and understanding the ELF binary format.

The Compilation Pipeline

SOURCE CODE
program.asm (Assembly)
or program.c (C/C++)
↓
NASM / GCC
PREPROCESSOR (C/C++ only)
â€ĸ Expands #include directives
â€ĸ Processes #define macros
â€ĸ Handles conditional compilation (#ifdef)
Output: program.i (preprocessed source)
↓
Compiler (C) or Assembler
ASSEMBLER (NASM)
â€ĸ Converts assembly to machine code
â€ĸ Generates symbol table
â€ĸ Creates relocation entries
Command: nasm -f elf64 program.asm -o program.o
Output: program.o (object file)
↓
Linker (LD)
LINKER
â€ĸ Combines multiple object files
â€ĸ Resolves external symbol references
â€ĸ Assigns final memory addresses
â€ĸ Links against libraries (libc, etc.)
â€ĸ Sets entry point (_start)
Command: ld program.o -o program
Output: program (executable ELF binary)
↓
./program
EXECUTABLE BINARY
â€ĸ ELF64 executable format
â€ĸ Contains machine code
â€ĸ Ready to execute
Run: ./program
Quick Reference:
Assembly: nasm -f elf64 prog.asm -o prog.o && ld prog.o -o prog
C: gcc prog.c -o prog (all steps combined)
C (manual): gcc -E prog.c > prog.i → gcc -S prog.i → gcc -c prog.s → ld ...

Step 1: Assembly → Machine Code (NASM)

NASM (Netwide Assembler) converts assembly source code to machine code object files.

NASM Command - Assemble
nasm -f elf64 program.asm -o program.o
NASM Options Explained
-f elf64: Output format = ELF64 (64-bit Linux executable format)
program.asm: Input assembly file
-o program.o: Output object file

What happens:

NASM parses your .asm file
Converts each instruction to machine code bytes
Creates relocatable code (with placeholder addresses)
Produces program.o (object file)
âš ī¸ Important

The object file (.o) is NOT yet executable. It contains machine code but references to external symbols aren't resolved. We need the linker.

Step 2: Linking (LD)

The linker (ld) combines object files and resolves all symbol references to create the final executable.

Linker Command - Link Object File
ld program.o -o program
What the Linker Does
Combines all object files into one
Resolves symbol references (labels, external functions)
Assigns final memory addresses
Sets up entry point (_start symbol)
Creates the final executable binary

Complete Assembly Workflow Example

Complete Workflow - Shell Commands
# Step 1: Create assembly file cat > program.asm << 'EOF' section .text global _start _start: mov rax, 60 mov rdi, 0 syscall EOF # Step 2: Assemble (ASM → Object) nasm -f elf64 program.asm -o program.o # Step 3: Link (Object → Executable) ld program.o -o program # Step 4: Run ./program echo $? ; Exit code: 0 (success)

Understanding ELF64 Format

ELF (Executable and Linkable Format) is the standard binary format for Linux. All executables, libraries, and object files use this format.

ELF Structure

An ELF file contains:

ELF Header: Magic number (identifies as ELF), architecture, entry point
Program Headers: How to load the file into memory
Sections: .text (code), .data (data), .bss (uninit data), .symtab (symbols), etc.
Section Headers: Describe each section's location and properties
View ELF Header with `file`
$ file program program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Meaning of each part:

Term Meaning
64-bit 64-bit architecture (x86-64)
LSB Little-Endian Byte Order (least significant byte first)
executable Can be directly run as a program
x86-64 Intel x86-64 instruction set
statically linked All libraries compiled in (no external .so dependencies)
not stripped Symbol table intact (function names visible)

Viewing ELF Structure with readelf & objdump

✓ Build Pipeline Mastered!

You can now write assembly, assemble it with NASM, link with LD, and understand the ELF binary format.

Reverse Engineering with GDB

GDB (GNU Debugger) is the industry-standard debugger for Linux. It lets you execute programs step-by-step, inspect memory and registers, and understand exactly what code is doing.

GDB Installation

Install GDB
# Ubuntu/Debian sudo apt-get install gdb # Fedora/RHEL sudo dnf install gdb # macOS brew install gdb

Starting GDB

Launch GDB with a Binary
gdb ./program gdb -q ./program ; Quiet mode (no banner)

Essential GDB Commands

Complete GDB Debugging Walkthrough

Let's debug a real binary step-by-step:

Complete GDB Session Example
(gdb) gdb ./crackme (gdb) set disassembly-flavor intel ; Use Intel syntax (gdb) info functions ; List all functions (gdb) break main ; Set breakpoint at main (gdb) run secret123 ; Run with password argument Breakpoint 1 at 0x0010149a (gdb) disassemble main ; View main function code (gdb) si ; Step into first instruction (gdb) info registers ; Check all registers (gdb) x/s $rdi ; View command-line arg (1st arg in RDI) (gdb) continue ; Run to next breakpoint (gdb) quit ; Exit GDB

pwndbg - Enhanced GDB

pwndbg is an awesome GDB plugin that adds powerful reverse engineering features.

Install pwndbg
git clone https://github.com/pwndbg/pwndbg cd pwndbg ./setup.sh

pwndbg enhancements:

Better disassembly display (syntax highlighting)
Visual stack and register display
Memory map view
Additional commands: nearpc, telescope, vmmap
✓ GDB Mastery Achieved!

You can now debug binaries, inspect memory, and understand program execution flow in real-time.

Radare2 - Advanced Binary Analysis

Radare2 is a powerful, open-source framework for reverse engineering and analyzing binaries. It combines static analysis, dynamic analysis, and visualization in one tool.

Radare2 Installation

Install Radare2
# Linux git clone https://github.com/radareorg/radare2 cd radare2 sys/install.sh # Or via package manager sudo apt-get install radare2

Launching Radare2

Start Radare2
r2 ./binary ; Open binary for analysis r2 -w ./binary ; Write mode (can modify binary)

Essential Radare2 Commands

Complete Radare2 Workflow

Full Radare2 Analysis Session
$ r2 ./crackme [0x08048400]> aaa ; Analyze everything [0x08048400]> afl ; List functions [0x08048400]> iz ; Show strings [0x08048400]> pdf @ sym.main ; View main function [0x08048400]> V ; Visual mode to explore
✓ Radare2 Fundamentals Mastered!

You can now analyze binaries statically with Radare2 and visualize code flow.

Static Analysis - Professional Tools

Static analysis means examining a binary without running it. You analyze code structure, disassembly, and data flow to understand what a program does. Professional tools like Ghidra, IDA Pro, and Binary Ninja dominate this space.

Professional Static Analysis Tools

Command-Line Static Analysis Tools

Essential tools for quick binary inspection and analysis:

Advanced Static Analysis Tools

String Analysis & Pattern Matching

Beyond basic string extraction, pattern analysis helps identify functionality:

Advanced String Analysis
strings ./binary | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" ; Find emails strings ./binary | grep -E "https?://[^\s]+" ; Find URLs strings ./binary | grep -E "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$" ; Find IP addresses strings ./binary | grep -i "key\|password\|secret\|token\|api" ; Find credentials

Why strings are useful:

Often reveal hardcoded passwords or API keys
Show error messages that hint at program logic
Identify libraries and functions
Quickly find interesting areas to analyze
Discover hidden features or debug messages
Identify encryption algorithms by string constants

Control Flow Analysis

Understanding how code branches and jumps helps identify:

Conditional logic: If/else patterns in assembly
Loops: Repeated code sections
Function calls: External dependencies
Dead code: Unreachable branches

All professional tools (Ghidra, IDA, Binary Ninja) show control flow graphs that visualize this.

✓ Static Analysis Mastered!

You can now use professional tools to analyze binaries without running them.

Dynamic Analysis - Runtime Behavior

Dynamic analysis means running the binary in a controlled environment while monitoring its behavior. Watch system calls, library calls, memory modifications, and network traffic to understand what code actually does.

System Call Tracing with strace

strace intercepts and logs all system calls made by a process.

strace Examples
strace ./program ; Trace all syscalls strace -e trace=open,read ./program ; Trace specific syscalls strace -o trace.txt ./program ; Save to file strace -c ./program ; Summary (count syscalls) strace -p 1234 ; Attach to running process

What strace reveals:

Files being read/written
Network connections (socket, connect syscalls)
Environment variables being read
Memory mappings
Signal handling
Interpreting strace Output
open("/etc/passwd", O_RDONLY) = 3 read(3, "root:x:0:0:root:/root:/bin/bash\n", 32) = 32 write(1, "User found!\n", 12) = 12 exit_group(0) = ?

Meaning: Program opened /etc/passwd, read 32 bytes, wrote "User found!" to stdout (fd 1), then exited with status 0.

Library Call Tracing with ltrace

ltrace traces library function calls (libc, libcrypto, etc.).

ltrace Examples
ltrace ./program ; Trace library calls ltrace -c ./program ; Summary (count function calls) ltrace -o trace.txt ./program ; Save to file ltrace -e strcmp ./program ; Trace specific functions

Useful library functions to trace:

strcmp: String comparison (password checks)
strcpy: String copying (buffer overflow detection)
malloc/free: Memory allocation
printf: Output (what's being printed)
getenv: Environment variable access
ltrace Example - Password Check
strcmp("admin123", "password123") = -1 puts("Incorrect password") = 19 exit(1)

Insight: Program compared input with "password123". Now you know the password!

Combined strace + ltrace

Use together for complete picture:

Trace both syscalls and library calls
strace -f ltrace ./program ; Both (slower) strace -e trace=file ./program ; Focus on file operations

Advanced Dynamic Analysis - Frida

Frida is a powerful instrumentation framework. Inject code into running processes to hook functions and modify behavior in real-time.

Basic Frida Usage
# Install pip install frida frida-tools # List processes frida-ps # Attach to process frida -p 1234 # Spawn and trace frida -n ./program

Frida capabilities:

Hook any function (intercept and modify behavior)
Read/write process memory
Dump arguments and return values
Modify program flow in real-time
Works on binaries you don't have source for
✓ Dynamic Analysis Arsenal Complete!

You can now trace system calls, monitor library calls, and use advanced instrumentation.

Analyzing Stripped Binaries

A stripped binary has all debug symbols removed — function names, variable names, and type information are gone. This makes reverse engineering harder but not impossible.

Identifying Stripped Binaries

Check if binary is stripped
file ./program Output examples: not stripped - has symbols stripped - symbols removed file -i ./program ; MIME type info readelf -S ./program ; Show sections nm ./program ; Empty if stripped objdump -t ./program ; Symbol table

Techniques for Stripped Binaries

Dynamic Analysis of Stripped Binaries

Use runtime tracing to understand behavior without symbols:

Dynamic approach to stripped binaries
# Trace syscalls to understand behavior strace -o syscalls.txt ./program # Trace library calls ltrace -o libcalls.txt ./program # Use GDB to set breakpoints and inspect registers gdb ./program (gdb) break *0x401000 (gdb) run (gdb) info registers ; See actual values

Practical Example - Analyzing Stripped Binary

Complete workflow for stripped binary
# 1. Identify if stripped $ file ./crackme crackme: ELF 64-bit, stripped # 2. Extract strings - look for clues $ strings ./crackme | grep -i password Incorrect password Access granted # 3. Open in Ghidra - Window → Function ID → Load standard library - Many stdlib functions now identified - Search → For Strings → Find "password" references - Double-click string to see code using it # 4. Analyze the function using string as anchor - Look at function prologue/epilogue - Identify comparisons and jumps - Look for password check logic # 5. Use dynamic analysis if stuck $ ltrace ./crackme strcmp("myinput", "secretpass") = -37 puts("Incorrect password") = 19 Now you know the password!
✓ Stripped Binary Analysis Mastered!

You can identify functions, recover symbols, and analyze behavior even without debug information.

Binary Patching - Code Modification

Binary patching means modifying a binary's machine code to change its behavior. Used to bypass password checks, remove license verification, or modify logic flow.

Why Patch Binaries?

Bypass authentication/license checks
Change program behavior for analysis
Create custom versions without source
Remove anti-debugging code
Test vulnerability fixes

Three Patching Approaches

Real-World Patching Example

Complete patching workflow - Bypass password
# Binary: crackme - asks for password $ ./crackme Enter password: test Incorrect! # Step 1: Open in Ghidra, find password check 0x401234: mov rax, [rip + 0x2dc6] ; Load input 0x40123b: mov rbx, [rip + 0x2dc5] ; Load expected password 0x401242: cmp rax, rbx ; Compare 0x401245: jne 0x401260 ; Jump to fail if not equal 0x401247: call print_success ; Otherwise print success # Step 2: We want to skip the jne (jump to fail) # Option A: Replace jne with NOPs jne opcode at 0x401245: 75 19 (2 bytes) Replace with: 90 90 (2 NOPs) # Step 3: Use hex editor to patch Go to file offset 0x401245 Find bytes: 75 19 Replace with: 90 90 Save file # Step 4: Test $ ./crackme_patched Enter password: anything Success! # Password check bypassed! Any input works now

Common Patching Targets

What to Patch Pattern Replacement
Password check cmp; jne failure Replace jne with NOPs
License validation call validate_license; jne fail NOP out the jne
Anti-debug call is_debugged; jne exit Make function return 0
Trial expiration cmp rax, expiration_date Change expiration_date value
Error message lea rdi, [rip + error_str] Change string pointer/content
✓ Binary Patching Mastered!

You can modify binaries to change behavior, bypass checks, and test modifications.

Anti-Reversing Techniques & Bypasses

Software developers implement anti-reversing techniques to protect intellectual property and prevent cracking. Understanding these techniques helps you bypass them and analyze protected binaries.

Common Anti-Reversing Techniques

ASLR - Address Space Layout Randomization

ASLR randomizes memory addresses each run. Makes exploitation and analysis harder.

Disable ASLR for analysis
# Check ASLR status cat /proc/sys/kernel/randomize_va_space 0 = disabled, 1 = conservative, 2 = full # Disable ASLR (requires root) echo 0 | sudo tee /proc/sys/kernel/randomize_va_space # Or run single binary without ASLR setarch $(uname -m) -R ./program # In GDB (gdb) set disable-randomization on

Stack Canaries

Stack canaries detect buffer overflows by placing magic value before return address.

How Stack Canaries Work
1. Function prologue: Random canary value stored on stack
2. If buffer overflow: Overwrites canary
3. Before return: Check if canary still matches original
4. If mismatch: Crash/exit immediately
Check if binary has canaries
checksec ./program Output shows: Canary found = yes/no readelf -x .note.gnu.property ./program Look for 0x1 bit in CF_PROTECTION_BRANCH

DEP/NX - Data Execution Prevention

DEP/NX marks data pages as non-executable. Prevents shellcode execution.

Check if binary has NX
checksec ./program Output shows: NX enabled/disabled readelf -l ./program | grep GNU_STACK RWX = no NX protection, RW = NX enabled
✓ Anti-Reversing Techniques Mastered!

You understand how protections work and how to bypass them.

angr - Automated Symbolic Execution

angr is a powerful binary analysis framework that uses symbolic execution to find inputs that reach specific code paths. Instead of manually analyzing, angr explores all possible paths and solves constraints.

What is Symbolic Execution?

Instead of concrete values, variables are treated as symbolic — representing all possible values. Branches create constraints.

Symbolic vs Normal Execution
NORMAL EXECUTION: input = 5 if input > 10: print("big") else: print("small") ← This path taken SYMBOLIC EXECUTION: input = X (symbolic variable) if input > 10: ← Explores this path (constraint: X > 10) print("big") if input ≤ 10: ← Explores this path too (constraint: X ≤ 10) print("small") Result: angr finds values satisfying each constraint!

Installation & Setup

Install angr
pip install angr pip install angr[all] ; Install with optional dependencies

Basic angr Workflow

Simple angr Script - Crack Password
import angr # Load binary project = angr.Project("./crackme") # Create symbolic variable for input (stdin) initial_state = project.factory.entry_state( stdin=angr.SimFile(content_size=16) ; 16-byte input ) # Create simulation manager simgr = project.factory.simgr(initial_state) # Address of success message success_addr = 0x401234 failure_addr = 0x401256 # Explore until we find success or hit failure simgr.explore( find=success_addr, avoid=failure_addr ) # Get the solution if simgr.found: solution_state = simgr.found[0] solution = solution_state.posix.dumps(0) ; 0 = stdin print(f"Password found: {solution.decode()}") else: print("No solution found")

Key angr Concepts

Real-World Example - CTF Challenge

Complete angr Script - Solve CTF Crackme
import angr import claripy # Load the binary binary_path = "./crackme" project = angr.Project(binary_path, auto_load_libs=False) # Create initial state (execution starts at main) main_address = 0x401234 ; Address of main() state = project.factory.blank_state(addr=main_address) # Create symbolic argv[1] (16 bytes) password = claripy.BVS('password', 128) ; 16 bytes * 8 bits # Simulate program with symbolic input in argv[1] # (assumes binary reads argv[1] as password) # Create simulation manager simgr = project.factory.simgr(state) # Explore - find "Correct!" message at 0x401300 ; avoid "Incorrect!" at 0x401350 simgr.explore(find=0x401300, avoid=[0x401350]) # Check results if simgr.found: solution_state = simgr.found[0] password_value = solution_state.solver.eval(password, cast_to=bytes) print(f"[+] Password found: {password_value}") else: print("[-] No solution found") if simgr.avoided: print(f"[!] Hit avoided addresses: {simgr.avoided}")

Advanced Techniques

When angr Excels vs Struggles

Best For Struggles With
Finding password/key (simple comparison) Complex floating-point math
Reaching specific code path Cryptographic operations (very slow)
Constraint solving (small inputs) Large state spaces (too many branches)
CTF challenges (designed for automation) Real-world complex binaries
✓ angr Symbolic Execution Mastered!

You can now automate binary analysis and solve constraints to find inputs reaching target code paths.

🎮 Assembly Simulator & Practice

Learn assembly by writing and executing code in real-time. This interactive simulator lets you write assembly instructions, step through execution, and watch registers and memory change.

Assembly Code Editor

ASSEMBLY CODE
📊 REGISTERS
RAX 0x0
RBX 0x0
RCX 0x0
RDX 0x0
RSI 0x0
RDI 0x0
RIP 0x0
🚩 FLAGS
ZF (Zero) 0
CF (Carry) 0
SF (Sign) 0
OF (Overflow) 0

Execution Output

Practice Challenges