Welcome to CYB3RFY Reverse Engineering

I'm cyb3rfy — creator of the CYB3RFY YouTube channel. I publish CTF walkthroughs and TryHackMe rooms. I'm an intermediate CTF player — I often rank high in competitions, but for a long time I struggled with reverse engineering challenges. I didn't know where to start, what tools to use, or how to read assembly.

Because of that, I lost many potential top ranks.

After studying for weeks — assembly, registers, syscalls, GDB, IDA, Ghidra, radare2, strings, and more — I created notes that helped me finally understand everything.

This website is built from those notes — a complete beginner guide so anyone can start reverse engineering the right way.

What You'll Learn

This comprehensive guide covers everything from CPU architecture and assembly language to professional reverse engineering tools. Whether you're preparing for CTF competitions, bug bounty hunting, or building security expertise, you'll find detailed explanations, real code examples, practical exercises, and command references.

How to Use This Guide

Start with CPU & Registers to understand the foundation

Progress through Assembly Language basics

Learn debugging with GDB and Radare2

Master static and dynamic analysis techniques

Practice with real binaries and challenges

Use the searchable command database for quick lookups

ℹ️ Search Feature

Use the search bar at the top to instantly find any assembly instruction, GDB command, radare2 command, syscall, or reverse engineering concept. Every result includes detailed explanations and examples.

CPU & Registers

The CPU (Central Processing Unit) is the brain of your computer. To reverse engineer binaries, you must understand how CPUs work at a fundamental level. This chapter explains CPU architecture, registers, and why they matter for security.

What is a CPU?

A CPU executes instructions in sequence. It reads data from memory, processes it, and writes results back. The two main components of a CPU are:

Control Unit (CU)

The Control Unit directs traffic in the CPU. It reads instructions from memory, decodes them, tells other parts what to do, and manages the flow of data. Think of it as the conductor of an orchestra.

Execution Unit (EU)

The Execution Unit actually performs calculations and operations. It executes arithmetic (ADD, SUB), logical operations (AND, OR), comparisons (CMP), and memory operations. It's the worker that does the real work.

What Are Registers?

Registers are tiny, ultra-fast storage units inside the CPU. They hold data that the CPU is actively using. Unlike RAM (which is gigabytes), registers are measured in bits and are incredibly fast.

Why Registers Matter for Reverse Engineering

When you debug a binary with GDB or examine assembly code, you're watching data move through registers. Understanding registers is essential because:

Function arguments are passed through registers

Return values are stored in registers

Local variables are often kept in registers

System calls use specific registers

Many vulnerabilities involve register manipulation

Register Hierarchy (x86-64 Architecture)

On modern 64-bit x86 CPUs, registers come in different sizes. The main "General Purpose Registers" are:

64-bit (QWORD)	32-bit (DWORD)	16-bit (WORD)	8-bit HIGH	8-bit LOW
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RSI	ESI	SI	-
RDI	EDI	DI	-
RSP	ESP	SP	-
RBP	EBP	BP	-
R8	R8D	R8W	R8B
R9	R9D	R9W	R9B
R10	R10D	R10W	R10B
R11	R11D	R11W	R11B
R12	R12D	R12W	R12B
R13	R13D	R13W	R13B
R14	R14D	R14W	R14B
R15	R15D	R15W	R15B

When you use a 32-bit register (like EAX), it automatically zeros the upper 32 bits of the 64-bit register (RAX). This is important in reverse engineering because it affects what data is preserved.

Key x86-64 Registers

📌 RAX (Accumulator) - General Purpose ▼

RAX is the primary accumulator register. It's used for:

Arithmetic operations (ADD, SUB, MUL)

Return values from function calls

Syscall numbers in system calls

Division operations (quotient stored here)

Special I/O operations

Example: When a function returns an integer, it's in RAX.

Assembly - RAX in function return
mov rax, 1       ; Set RAX to 1 (often a success code)
ret              ; Return - RAX contains the return value

📌 RBX (Base) - General Purpose ▼

RBX is traditionally the base register for addressing. It's used for:

Addressing memory (calculating addresses)

General-purpose data storage

Preserved register (must be saved by functions)

Important: RBX is a "callee-saved" register, meaning if a function modifies RBX, it must restore it before returning.

📌 RCX (Counter) - General Purpose ▼

RCX is traditionally the counter register. It's used for:

Loop counters (REP instructions)

Fourth function argument (x86-64 calling convention)

Shift and rotate counts

General-purpose operations

Example: In a for loop, you might load the loop count into RCX and use the LOOP instruction.

📌 RDX (Data) - General Purpose ▼

RDX is traditionally the data register. It's used for:

Third function argument (x86-64 calling convention)

Division operations (remainder stored here)

I/O operations

General-purpose operations

Example: When dividing RAX by RBX using DIV RBX, the remainder is stored in RDX.

📌 RSI/RDI (Source/Destination Index) - General Purpose ▼

RSI (Source Index) and RDI (Destination Index) are used for:

RSI = second function argument

RDI = first function argument

String operations (MOVS, STOS, SCAS)

Memory operations

Important: In the x86-64 calling convention, RDI holds the first argument to a function. This is crucial for understanding function calls.

Assembly - Function call with arguments
mov rdi, 10      ; 1st argument: 10
mov rsi, 20      ; 2nd argument: 20
mov rdx, 30      ; 3rd argument: 30
call add_three   ; Call function - returns result in RAX

📌 RIP (Instruction Pointer) - Program Counter ▼

RIP (or EIP in 32-bit mode) is the Instruction Pointer. It always contains the address of the next instruction to execute.

Automatically incremented after each instruction

Jumps change RIP to branch to different code

Function calls push return address and change RIP

Can't directly modify RIP (use JMP or CALL)

Why important for reversing: When you see a breakpoint or a crash, RIP tells you exactly where in the code execution stopped.

⚠️ Important

You cannot directly modify RIP with MOV or other instructions. To change program flow, you use JMP (unconditional jump), conditional jumps (JE, JNE, etc.), or CALL (function call).

📌 RSP & RBP (Stack Pointers) ▼

RSP (Stack Pointer) and RBP (Base Pointer) manage the call stack:

RSP: Points to the top (most recent item) of the stack

RBP: Points to the base of the current stack frame (where local variables are)

PUSH decrements RSP and writes data

POP increments RSP and reads data

CALL pushes the return address onto the stack

Stack Layout Example:

STACK MEMORY LAYOUT (grows downward ↓)

                                    0x7fffffffe000
                                    ← Higher Address
                                
Previous Stack Frame

                                    Return Address
                                    ← RBP+8
                                

                                    Saved RBP
                                    ← RBP (Base Pointer)
                                
▶ RBP

                                    Local Variable 1
                                    RBP-8
                                

                                    Local Variable 2
                                    RBP-16
                                

                                    Local Variable 3
                                    RBP-24
                                

                                    Top of Stack
                                    ← RSP (Stack Pointer)
                                
▶ RSP
Unused Stack Space

                                    0x7fffffffd000
                                    ← Lower Address
                                
Key Points:
• Stack grows from high to low addresses (downward)
• PUSH decrements RSP, POP increments RSP
• RBP marks the base of the current function's frame
• Local variables accessed relative to RBP (RBP-8, RBP-16, etc.)
• Return address pushed by CALL instruction

Special Registers - Flags Register (RFLAGS)

The RFLAGS (or FLAGS in 32-bit) register contains condition flags — single bits that indicate the status of the last operation.

Flag	Name	Meaning	Set When
ZF	Zero Flag	Result is zero	Result of last operation = 0
CF	Carry Flag	Unsigned overflow	Addition/subtraction carries/borrows
SF	Sign Flag	Result is negative	MSB (most significant bit) = 1
OF	Overflow Flag	Signed overflow	Signed arithmetic overflow occurs
PF	Parity Flag	Even parity	Result has even number of 1 bits
AF	Adjust Flag	BCD carry	Carry in lower nibble

Why flags matter: Conditional jumps (JE, JNE, JZ, JG, etc.) check these flags to decide whether to jump. Understanding flags is essential for reading assembly code.

Example: Flags in Action

                    cmp rax, rbx     ; Compare RAX with RBX (subtract, discard result)
                        ; Sets ZF=1 if RAX==RBX, ZF=0 if different
je success         ; Jump if Equal (checks ZF flag)
mov rax, 0         ; If not equal, RAX = 0
jmp end
success:
mov rax, 1         ; If equal, RAX = 1
end:
                

x86-64 Calling Convention

When a function is called, arguments are passed through specific registers in a specific order. This is the calling convention. Understanding it is crucial for debugging.

📌 x86-64 System V ABI (Linux/Unix)

This is the calling convention used on 64-bit Linux systems:

Argument #	Register
1st argument	RDI
2nd argument	RSI
3rd argument	RDX
4th argument	RCX
5th argument	R8
6th argument	R9
Return value	RAX

Any arguments beyond the 6th are passed on the stack.

Example: Function Call with 4 Arguments

                    mov rdi, 10        ; arg1 = 10
mov rsi, 20        ; arg2 = 20
mov rdx, 30        ; arg3 = 30
mov rcx, 40        ; arg4 = 40
call my_function    ; Call function
                        ; RAX now contains the return value
                

Putting It Together

Now you understand:

CPUs have two main components: Control Unit and Execution Unit

Registers are ultra-fast CPU storage

Different registers have different purposes

The Flags register controls conditional jumps

Function arguments are passed through specific registers

RIP points to the next instruction

RSP and RBP manage the call stack

✓ You Now Understand CPU Architecture!

These fundamentals are essential. In the next section, you'll learn how to use this knowledge to understand system calls.

System Calls (Syscalls)

A system call is a request from a user program to the kernel to perform a privileged operation. When a program needs to read a file, write to the screen, or allocate memory, it can't do it directly — it must ask the kernel through a syscall.

User Space vs Kernel Space

Modern operating systems use a layered privilege model:

USER SPACE (Ring 3 - Unprivileged)

Applications

Firefox, Chrome

User Programs

./your_binary

Libraries

libc, libssl

Capabilities:

✓ Read/write own memory
✓ Perform calculations
✗ Cannot access hardware directly
✗ Cannot access other process memory
✗ Cannot perform privileged operations

SYSCALL INSTRUCTION

↓ Context Switch ↓

CPU transitions from Ring 3 → Ring 0
RAX = syscall number, RDI/RSI/RDX = arguments

KERNEL SPACE (Ring 0 - Privileged)

File System

open, read, write

Memory Mgmt

mmap, brk

Process Mgmt

fork, execve

Device Drivers

Hardware I/O

Full Capabilities:

✓ Access all memory
✓ Execute privileged instructions
✓ Control hardware devices
✓ Manage processes and resources
✓ Handle interrupts and exceptions

↑ RETURN ↑

Result in RAX, returns to user space

Why This Separation Matters

The user/kernel space separation is fundamental to operating system security. User programs cannot directly access hardware or other processes' memory. All privileged operations must go through the kernel via system calls, where permissions are checked and validated.

How System Calls Work

When your program executes a SYSCALL instruction:

1. Program sets up registers with syscall number and arguments

2. SYSCALL instruction transitions to kernel mode

3. Kernel executes the requested operation

4. Control returns to user program with result in RAX

x86-64 Linux Syscall ABI

On 64-bit Linux, syscalls follow a specific convention. Let's break it down:

📌 Syscall Number & Arguments

Every syscall has a number. You put that number in RAX, then set up arguments in specific registers:

Register	Purpose
RAX	Syscall number (which syscall to call)
RDI	1st argument
RSI	2nd argument
RDX	3rd argument
R10	4th argument (note: RCX for functions, but R10 for syscalls)
R8	5th argument
R9	6th argument

Return value: After the syscall, RAX contains the result (or an error code if negative).

Common Syscalls Explained

📌 exit (Syscall #60) - Terminate Program ▼

exit(int code) terminates the program with an exit status.

Signature

void exit(int status)

RDI: Exit status code (0 = success, non-zero = error)

Return: Never returns (program terminates)

Assembly - exit syscall
mov rax, 60        ; exit syscall number
mov rdi, 0         ; exit code = 0 (success)
syscall            ; Call kernel to exit

Equivalent C code:

exit(0); // Terminate with status 0

📌 write (Syscall #1) - Write to File/Console ▼

write() writes data to a file descriptor (like stdout for console output).

Signature

ssize_t write(int fd, const void *buf, size_t count)

RDI: File descriptor (1 = stdout/console, 2 = stderr)

RSI: Pointer to buffer (data to write)

RDX: Number of bytes to write

Return (RAX): Number of bytes written (or -1 on error)

Assembly - write "Hello" to console
mov rax, 1         ; write syscall number
mov rdi, 1         ; fd = 1 (stdout)
mov rsi, msg       ; rsi = pointer to message
mov rdx, 5         ; length = 5 bytes
syscall            ; Write to stdout

File Descriptor Reference:

FD	Name	Purpose
0	stdin	Standard input (keyboard)
1	stdout	Standard output (console)
2	stderr	Standard error (console)

📌 read (Syscall #0) - Read from File/Console ▼

read() reads data from a file descriptor into a buffer.

Signature

ssize_t read(int fd, void *buf, size_t count)

RDI: File descriptor (0 = stdin)

RSI: Pointer to buffer to store data

RDX: Maximum bytes to read

Return (RAX): Number of bytes read (0 = EOF, -1 = error)

Assembly - read 10 bytes from stdin
mov rax, 0         ; read syscall number
mov rdi, 0         ; fd = 0 (stdin)
mov rsi, buffer    ; rsi = pointer to buffer
mov rdx, 10        ; read up to 10 bytes
syscall            ; Read from stdin
                        ; RAX now contains number of bytes read

📌 open (Syscall #2) - Open a File ▼

open() opens a file and returns a file descriptor.

Signature

int open(const char *pathname, int flags, mode_t mode)

RDI: Pointer to filename string

RSI: Flags (O_RDONLY, O_WRONLY, O_RDWR, etc.)

RDX: Mode (permissions, e.g., 0644)

Return (RAX): File descriptor (or -1 on error)

Flag	Value	Meaning
O_RDONLY	0	Read only
O_WRONLY	1	Write only
O_RDWR	2	Read and write
O_CREAT	64	Create if doesn't exist
O_APPEND	1024	Append to file

📌 close (Syscall #3) - Close a File ▼

close() closes a file descriptor, freeing the resource.

Signature

int close(int fd)

RDI: File descriptor to close

Return (RAX): 0 on success, -1 on error

Complete System Call Database

Here's a comprehensive reference of Linux x86-64 system calls with arguments, return values, and detailed descriptions:

Filter by Category:

RAX	Name	Arguments (RDI, RSI, RDX, R10, R8, R9)	Return Value	Description
0	read	RDI: unsigned int fd RSI: char *buf RDX: size_t count	ssize_t (bytes read or -1)	Reads up to count bytes from file descriptor fd into buffer buf. Returns number of bytes read, 0 on EOF, or -1 on error. Commonly used with fd=0 (stdin) to read user input.
1	write	RDI: unsigned int fd RSI: const char *buf RDX: size_t count	ssize_t (bytes written or -1)	Writes up to count bytes from buffer buf to file descriptor fd. Returns number of bytes written or -1 on error. Use fd=1 (stdout) for console output, fd=2 (stderr) for error messages.
2	open	RDI: const char *filename RSI: int flags RDX: mode_t mode	int (file descriptor or -1)	Opens file specified by filename. Flags: O_RDONLY(0), O_WRONLY(1), O_RDWR(2), O_CREAT(64), O_APPEND(1024). Mode specifies permissions (e.g., 0644). Returns file descriptor on success.
3	close	RDI: unsigned int fd	int (0 or -1)	Closes file descriptor fd, freeing the resource. Returns 0 on success, -1 on error. Always close files when done to prevent resource leaks.
4	stat	RDI: const char filename RSI: struct stat statbuf	int (0 or -1)	Retrieves file information (size, permissions, timestamps) for filename and stores in statbuf. Returns 0 on success. Does not follow symlinks for lstat(6).
5	fstat	RDI: unsigned int fd RSI: struct stat *statbuf	int (0 or -1)	Like stat, but operates on an already-opened file descriptor instead of a filename. Useful when you already have the file open.
8	lseek	RDI: unsigned int fd RSI: off_t offset RDX: unsigned int whence	off_t (new position or -1)	Repositions file offset of fd. Whence: SEEK_SET(0)=absolute, SEEK_CUR(1)=relative, SEEK_END(2)=from end. Returns new offset from beginning of file.
9	mmap	RDI: void *addr RSI: size_t length RDX: int prot R10: int flags R8: int fd R9: off_t offset	void* (address or MAP_FAILED)	Maps file or device into memory. Prot: PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Flags: MAP_PRIVATE(2), MAP_ANONYMOUS(32). Returns pointer to mapped area. Critical for memory management.
11	munmap	RDI: void *addr RSI: size_t length	int (0 or -1)	Unmaps a previously mapped memory region starting at addr with length. Returns 0 on success. Always unmap when done to free memory.
12	brk	RDI: void *addr	void* (new break or -1)	Changes program break (end of data segment) to addr. Used by malloc() internally. Returns new program break on success. Rarely used directly in modern programs (prefer mmap).
16	ioctl	RDI: unsigned int fd RSI: unsigned int cmd RDX: unsigned long arg	int (varies)	Device-specific input/output control. Command and arguments vary by device. Used for operations that don't fit read/write model (terminal settings, disk operations, etc.).
22	pipe	RDI: int pipefd[2]	int (0 or -1)	Creates a pipe (unidirectional data channel). pipefd[0] is read end, pipefd[1] is write end. Returns 0 on success. Used for inter-process communication.
32	dup	RDI: unsigned int fildes	int (new fd or -1)	Duplicates file descriptor fildes using the lowest available fd number. Both fds refer to same file. Returns new fd on success.
33	dup2	RDI: unsigned int oldfd RSI: unsigned int newfd	int (new fd or -1)	Duplicates oldfd to newfd. If newfd is open, it's closed first. Commonly used to redirect stdin/stdout/stderr in child processes.
57	fork	(none)	pid_t (child PID or 0 in child)	Creates new process by duplicating calling process. Returns child PID to parent, returns 0 in child process. Child gets copy of parent's memory and file descriptors.
59	execve	RDI: const char filename RSI: char const argv[] RDX: char *const envp[]	int (never returns on success)	Executes program specified by filename, replacing current process image. argv is argument array, envp is environment. Only returns on error. Used with fork() to run new programs.
60	exit	RDI: int status	void (never returns)	Terminates calling process with exit status. 0 indicates success, non-zero indicates error. Flushes buffers, closes file descriptors, and returns status to parent.
61	wait4	RDI: pid_t pid RSI: int status RDX: int options R10: struct rusage rusage	pid_t (PID or -1)	Waits for child process to change state. Returns child PID on success. Status contains exit code. Used by parent to collect terminated children (prevent zombies).
39	getpid	(none)	pid_t (process ID)	Returns process ID (PID) of calling process. Always succeeds. Useful for logging, creating unique filenames, or process identification.
110	getppid	(none)	pid_t (parent PID)	Returns parent process ID of calling process. If parent has exited, returns 1 (init/systemd). Always succeeds.
102	getuid	(none)	uid_t (user ID)	Returns real user ID of calling process. Used for permission checks. Always succeeds.
104	getgid	(none)	gid_t (group ID)	Returns real group ID of calling process. Used for permission checks. Always succeeds.
105	setuid	RDI: uid_t uid	int (0 or -1)	Sets effective user ID. If privileged, sets real, effective, and saved UIDs. Returns 0 on success. Used for privilege dropping or SUID executables.
106	setgid	RDI: gid_t gid	int (0 or -1)	Sets effective group ID. If privileged, sets real, effective, and saved GIDs. Returns 0 on success.
62	kill	RDI: pid_t pid RSI: int sig	int (0 or -1)	Sends signal sig to process pid. Common signals: SIGTERM(15), SIGKILL(9), SIGUSR1(10). Returns 0 on success. If pid=0, sends to process group.
13	rt_sigaction	RDI: int sig RSI: const struct sigaction act RDX: struct sigaction oldact	int (0 or -1)	Examines or changes signal handler for signal sig. act specifies new action, oldact receives old action. Returns 0 on success. Modern signal handling interface.
34	pause	(none)	int (always -1)	Suspends process until signal is received. Always returns -1 with errno=EINTR after signal handler returns. Used for waiting on signals.
41	socket	RDI: int domain RSI: int type RDX: int protocol	int (socket fd or -1)	Creates communication endpoint. Domain: AF_INET(2)=IPv4, AF_INET6(10)=IPv6. Type: SOCK_STREAM(1)=TCP, SOCK_DGRAM(2)=UDP. Returns socket file descriptor.
49	bind	RDI: int sockfd RSI: const struct sockaddr *addr RDX: socklen_t addrlen	int (0 or -1)	Assigns address (IP and port) to socket sockfd. Must be called before listen() for servers. Returns 0 on success. Port numbers below 1024 require root.
50	listen	RDI: int sockfd RSI: int backlog	int (0 or -1)	Marks socket as passive (ready to accept connections). Backlog specifies maximum queue length for pending connections. Returns 0 on success.
43	accept	RDI: int sockfd RSI: struct sockaddr addr RDX: socklen_t addrlen	int (new socket fd or -1)	Accepts incoming connection on listening socket. Blocks until connection arrives. Returns new socket for the connection, original socket continues listening.
42	connect	RDI: int sockfd RSI: const struct sockaddr *addr RDX: socklen_t addrlen	int (0 or -1)	Initiates connection to remote address. For TCP, performs 3-way handshake. Blocks until connection established or timeout. Returns 0 on success.
44	sendto	RDI: int sockfd RSI: const void buf RDX: size_t len R10: int flags R8: const struct sockaddr dest_addr R9: socklen_t addrlen	ssize_t (bytes sent or -1)	Sends message on socket to specific address. For UDP sockets. Use send() for connected sockets. Returns number of bytes sent.
45	recvfrom	RDI: int sockfd RSI: void buf RDX: size_t len R10: int flags R8: struct sockaddr src_addr R9: socklen_t *addrlen	ssize_t (bytes received or -1)	Receives message from socket and captures sender's address. For UDP. Returns number of bytes received, 0 on connection close.
201	time	RDI: time_t *tloc	time_t (seconds since epoch)	Returns current time as seconds since Unix epoch (Jan 1, 1970). If tloc is non-NULL, also stores there. Simple but low precision (1 second).
96	gettimeofday	RDI: struct timeval tv RSI: struct timezone tz	int (0 or -1)	Gets current time with microsecond precision. tv contains seconds and microseconds since epoch. tz is obsolete (pass NULL). Returns 0 on success.
35	nanosleep	RDI: const struct timespec req RSI: struct timespec rem	int (0 or -1)	Suspends execution for time specified in req (seconds + nanoseconds). If interrupted by signal, remaining time stored in rem. Returns 0 on success.
228	clock_gettime	RDI: clockid_t clk_id RSI: struct timespec *tp	int (0 or -1)	Retrieves time from specified clock. clk_id: CLOCK_REALTIME(0)=wall clock, CLOCK_MONOTONIC(1)=monotonic time. Nanosecond precision. Returns 0 on success.
83	mkdir	RDI: const char *pathname RSI: mode_t mode	int (0 or -1)	Creates directory specified by pathname. Mode specifies permissions (e.g., 0755). Returns 0 on success, -1 if already exists or permission denied.
84	rmdir	RDI: const char *pathname	int (0 or -1)	Removes empty directory. Returns -1 if directory not empty or doesn't exist. Returns 0 on success.
87	unlink	RDI: const char *pathname	int (0 or -1)	Deletes file specified by pathname. Decrements link count; if 0 and no process has file open, file is deleted. Returns 0 on success.
82	rename	RDI: const char oldpath RSI: const char newpath	int (0 or -1)	Renames/moves file from oldpath to newpath. Atomic operation. If newpath exists, it's replaced. Returns 0 on success.
90	chmod	RDI: const char *pathname RSI: mode_t mode	int (0 or -1)	Changes file permissions. Mode is octal like 0644 (rw-r--r--) or 0755 (rwxr-xr-x). Returns 0 on success. Only owner or root can change permissions.
92	chown	RDI: const char *pathname RSI: uid_t owner RDX: gid_t group	int (0 or -1)	Changes file owner and/or group. Pass -1 to leave unchanged. Only root can change owner. File owner can change group to one they belong to. Returns 0 on success.
80	chdir	RDI: const char *path	int (0 or -1)	Changes current working directory to path. Affects relative path resolution. Returns 0 on success. Each process has its own working directory.
79	getcwd	RDI: char *buf RSI: size_t size	char* (buf or NULL)	Copies current working directory into buf (max size bytes). Returns buf on success, NULL on error. Size must be large enough for full path.
21	access	RDI: const char *pathname RSI: int mode	int (0 or -1)	Checks whether calling process can access file. Mode: F_OK(0)=exists, R_OK(4)=read, W_OK(2)=write, X_OK(1)=execute. Returns 0 if permitted.
86	link	RDI: const char oldpath RSI: const char newpath	int (0 or -1)	Creates hard link named newpath to existing file oldpath. Both names refer to same inode. Deleting one doesn't affect the other. Returns 0 on success.
88	symlink	RDI: const char target RSI: const char linkpath	int (0 or -1)	Creates symbolic link named linkpath containing string target. Symlink can point to non-existent files and cross filesystems. Returns 0 on success.
89	readlink	RDI: const char pathname RSI: char buf RDX: size_t bufsiz	ssize_t (bytes copied or -1)	Reads value of symbolic link pathname into buf. Does not null-terminate. Returns number of bytes placed in buf, -1 on error.
10	mprotect	RDI: void *addr RSI: size_t len RDX: int prot	int (0 or -1)	Changes memory protection for page(s) starting at addr. Prot: PROT_NONE(0), PROT_READ(1), PROT_WRITE(2), PROT_EXEC(4). Used for security and exploit mitigation. Returns 0 on success.
28	madvise	RDI: void *addr RSI: size_t length RDX: int advice	int (0 or -1)	Gives kernel advice about memory usage. Advice: MADV_NORMAL(0), MADV_SEQUENTIAL(2), MADV_DONTNEED(4). Hints for performance optimization. Returns 0 on success.
56	clone	RDI: unsigned long flags RSI: void child_stack RDX: int ptid R10: int *ctid R8: unsigned long newtls	pid_t (child PID or -1)	Creates new process/thread. More flexible than fork(). Flags control what is shared (memory, files, etc.). Used to implement threads. Returns child PID in parent, 0 in child.
231	exit_group	RDI: int status	void (never returns)	Terminates all threads in calling process's thread group. Like exit(), but affects all threads. Used by exit() in threaded programs. Never returns.
157	prctl	RDI: int option RSI: unsigned long arg2 RDX: unsigned long arg3 R10: unsigned long arg4 R8: unsigned long arg5	int (varies)	Process control operations. Options include PR_SET_NAME (set process name), PR_SET_DUMPABLE, PR_SET_SECCOMP. Highly versatile. Return value depends on option.
37	alarm	RDI: unsigned int seconds	unsigned int (previous alarm)	Arranges for SIGALRM to be delivered in seconds. Pass 0 to cancel. Returns number of seconds remaining from previous alarm. Only one alarm can be scheduled.
72	fcntl	RDI: unsigned int fd RSI: unsigned int cmd RDX: unsigned long arg	int (varies)	Performs various operations on file descriptor. Cmd: F_GETFL(3)=get flags, F_SETFL(4)=set flags, F_DUPFD(0)=duplicate. Returns depend on cmd.
76	truncate	RDI: const char *path RSI: off_t length	int (0 or -1)	Truncates file to specified length. If longer, extra data is discarded. If shorter, extended with null bytes. Returns 0 on success.
74	fsync	RDI: unsigned int fd	int (0 or -1)	Synchronizes file's in-memory state with storage device (flushes all modified data and metadata). Returns 0 when data is safely on disk. Critical for data integrity.
14	rt_sigprocmask	RDI: int how RSI: const sigset_t set RDX: sigset_t oldset	int (0 or -1)	Examines and changes blocked signals. How: SIG_BLOCK(0)=add, SIG_UNBLOCK(1)=remove, SIG_SETMASK(2)=replace. Returns 0 on success. Used to protect critical sections.
131	sigaltstack	RDI: const stack_t ss RSI: stack_t old_ss	int (0 or -1)	Sets or gets alternate signal stack. Used when main stack is compromised (stack overflow). Returns 0 on success. Important for robust signal handling.
101	ptrace	RDI: long request RSI: pid_t pid RDX: void addr R10: void data	long (varies)	Process trace and debug. Allows parent to control child execution, read/write memory and registers. Used by debuggers (GDB). Powerful and dangerous. Returns vary by request.
169	reboot	RDI: int magic RSI: int magic2 RDX: int cmd R10: void *arg	int (never on success)	Reboots or halts system. Requires CAP_SYS_BOOT capability (root). Cmd: LINUX_REBOOT_CMD_RESTART, HALT, POWER_OFF. For emergency use. Returns only on error.
23	select	RDI: int nfds RSI: fd_set readfds RDX: fd_set writefds R10: fd_set exceptfds R8: struct timeval timeout	int (ready fds or -1)	Monitors multiple file descriptors for I/O readiness. Returns when fd is ready or timeout. Used for non-blocking I/O multiplexing. Returns number of ready fds.
7	poll	RDI: struct pollfd *fds RSI: nfds_t nfds RDX: int timeout	int (ready fds or -1)	Like select but better API. Monitors file descriptors for events (POLLIN, POLLOUT, POLLERR). Timeout in milliseconds (-1=infinite). Returns number of ready fds.
213	epoll_create	RDI: int size	int (epoll fd or -1)	Creates epoll instance for scalable I/O event notification. More efficient than select/poll for many file descriptors. Size is ignored (kept for compatibility). Returns epoll fd.

Complete Reference

This table contains 80+ essential Linux x86-64 system calls with full argument details and descriptions. For a complete list of all 300+ syscalls, run man syscalls or check /usr/include/asm/unistd_64.h. You can also use ausyscall --dump if auditd is installed.

Complete Syscall Example: Write to Console

Let's write a complete program that uses syscalls to print "Hello World":

Assembly - Complete "Hello World" using syscalls
section .data
msg:        db "Hello World", 0x0a
len:        equ $ - msg        ; Calculate length

section .text
global _start

_start:
    ; write syscall to print message
    mov rax, 1         ; syscall: write
    mov rdi, 1         ; fd: stdout
    mov rsi, msg       ; buffer
    mov rdx, len       ; length
    syscall

    ; exit syscall
    mov rax, 60        ; syscall: exit
    mov rdi, 0         ; status: 0 (success)
    syscall

Syscall Return Values & Error Handling

After a syscall, the kernel returns a value in RAX:

If RAX ≥ 0: Success, RAX contains the result

If RAX < 0: Error occurred, RAX contains negative error code

Error codes are typically in the range -1 to -4095. Common errors:

Error Code	Constant	Meaning
-1	EPERM	Operation not permitted
-2	ENOENT	No such file or directory
-13	EACCES	Permission denied
-14	EFAULT	Bad address

Example: Error Handling with Syscalls

                    mov rax, 2         ; open syscall
mov rdi, filename  
mov rsi, 0         ; O_RDONLY
syscall

; Check if error (RAX < 0)
cmp rax, 0
jl error_handler    ; Jump if less (negative)

; File opened successfully, RAX contains FD
mov rbx, rax       ; Save FD in RBX
jmp continue

error_handler:
    ; Handle error - RAX contains negative error code
    neg rax            ; Convert to positive for easier reading
    ; Now RAX = positive error code
                

✓ System Calls Mastered!

You now understand how user programs communicate with the kernel. This is essential for writing assembly programs and understanding low-level program behavior.

Assembly Language - Complete Tutorial

Assembly language is the lowest-level programming language that directly corresponds to machine instructions. Each assembly instruction performs one CPU operation. To reverse engineer binaries, you must be able to read and understand assembly.

What is Assembly?

Assembly is a symbolic representation of machine code. Instead of writing binary (1s and 0s), you write mnemonics like MOV, ADD, JMP that are more readable. An assembler converts this into machine code.

Assembly vs Machine Code

Assembly: mov rax, 1

Machine code (hex): 48 c7 c0 01 00 00 00

They're the same instruction, just different representations.

Instruction Format

Most assembly instructions follow this format:

OPCODE destination, source Example: MOV RAX, RBX - OPCODE: MOV - destination: RAX (where to put the result) - source: RBX (where to get the data)

Important: In Intel syntax (which we use), the destination comes first, then the source. This is opposite to AT&T syntax.

Core Assembly Instructions

📌 MOV - Move/Copy Data ▼

MOV destination, source copies data from source to destination.

Syntax

mov rax, 1 - Put 1 into RAX
mov rax, rbx - Copy RBX into RAX
mov rax, [rbx] - Load value from memory address in RBX into RAX

Important Notes:

Cannot move between two memory addresses directly

Does NOT affect flags

32-bit operations zero the upper 32 bits (e.g., mov eax, 1 zeros RAX[63:32])

Square brackets [] indicate memory address dereferencing

MOV Examples
mov rax, 5             ; RAX = 5
mov rbx, rax           ; RBX = RAX (which is 5)
mov rcx, [rax]         ; RCX = value at address RAX
mov [rax], 10          ; Store 10 at address in RAX

📌 ADD - Addition ▼

ADD destination, source adds source to destination: destination = destination + source

Syntax

add rax, 5 - RAX = RAX + 5
add rax, rbx - RAX = RAX + RBX

Flags affected: ZF, CF, SF, OF

ADD Examples
mov rax, 10           ; RAX = 10
mov rbx, 20           ; RBX = 20
add rax, rbx           ; RAX = 30 (10 + 20)
add rax, 5             ; RAX = 35 (30 + 5)

📌 SUB - Subtraction ▼

SUB destination, source subtracts source from destination: destination = destination - source

Syntax

sub rax, 5 - RAX = RAX - 5
sub rax, rbx - RAX = RAX - RBX

Flags affected: ZF, CF, SF, OF

SUB Examples
mov rax, 50           ; RAX = 50
mov rbx, 30           ; RBX = 30
sub rax, rbx           ; RAX = 20 (50 - 30)

📌 CMP - Compare ▼

CMP destination, source compares two values by subtracting source from destination and setting flags (but doesn't store the result).

Syntax

cmp rax, rbx - Compare RAX with RBX
cmp rax, 10 - Compare RAX with 10

What CMP does: Performs RAX - RBX, sets flags, discards the result

Why use CMP: To check if two values are equal, which is greater, etc.

If RAX == RBX: ZF = 1 (zero flag set)

If RAX != RBX: ZF = 0

If RAX < RBX: CF = 1 (carry/borrow)

If RAX > RBX: CF = 0

CMP Examples
mov rax, 10
cmp rax, 10            ; Compare RAX with 10
je equal               ; Jump if Equal (checks ZF)

mov rax, 5
cmp rax, 10            ; 5 < 10
jl less_than           ; Jump if Less

📌 TEST - Logical AND ▼

TEST destination, source performs a bitwise AND but only affects flags (doesn't store result).

Common Usage

test rax, rax - Check if RAX is zero

Why: TEST rax, rax sets ZF=1 if RAX==0, ZF=0 if RAX!=0. This is faster than CMP rax, 0.

TEST Example - Check if value is zero
mov rax, 0
test rax, rax          ; ZF = 1 (result is zero)
jz zero_case           ; Jump if Zero

📌 JMP - Unconditional Jump ▼

JMP label unconditionally jumps to a label (changes program flow).

Syntax

jmp loop_start - Jump to loop_start label
jmp 0x08048400 - Jump to address 0x08048400

What it does: Sets RIP to the target address, so the next instruction executed is at the jump target.

JMP Example
mov rax, 1
jmp skip              ; Skip next instruction
mov rax, 2             ; This is never executed
skip:
mov rbx, rax           ; RBX = 1 (not 2)

📌 Conditional Jumps (JE, JNE, JZ, JG, JL, etc.) ▼

Conditional jumps jump only if certain flags are set. They check the result of a previous CMP or TEST instruction.

Instruction	Condition	Jumps When
JE / JZ	Jump if Equal / Jump if Zero	ZF = 1 (result was zero)
JNE / JNZ	Jump if Not Equal / Jump if Not Zero	ZF = 0 (result was non-zero)
JG	Jump if Greater	Destination > Source (signed)
JL	Jump if Less	Destination < Source (signed)
JGE	Jump if Greater or Equal	Destination ≥ Source (signed)
JLE	Jump if Less or Equal	Destination ≤ Source (signed)
JA	Jump if Above	Destination > Source (unsigned)
JB	Jump if Below	Destination < Source (unsigned)
JAE / JNC	Jump if Above or Equal / Jump if No Carry	Destination ≥ Source (unsigned) / CF = 0
JBE / JC	Jump if Below or Equal / Jump if Carry	Destination ≤ Source (unsigned) / CF = 1
JO	Jump if Overflow	OF = 1 (overflow occurred)
JNO	Jump if No Overflow	OF = 0

Conditional Jump Examples
mov rax, 10
cmp rax, 20
jl less_than           ; Jump if Less (10 < 20) - WILL jump
mov rbx, 1
jmp done

less_than:
mov rbx, 0
done:

📌 CALL & RET - Function Calls ▼

CALL label calls a function. RET returns from a function.

How CALL Works

1. Push current address (RIP) onto stack (return address)
2. Jump to function label
3. Function executes
4. RET pops return address from stack and jumps back

CALL and RET Example
global _start
_start:
    call my_function       ; Call function
    ; After function returns, we continue here
    mov rax, 60
    mov rdi, 0
    syscall

my_function:
    ; Function code here
    mov rax, 1            ; Put return value in RAX
    ret                   ; Return to caller

📌 SYSCALL - System Call ▼

SYSCALL transitions to kernel mode and executes a system call.

Setup Before SYSCALL

RAX = syscall number
RDI, RSI, RDX, R10, R8, R9 = arguments
After SYSCALL: RAX = return value

SYSCALL Example - exit
mov rax, 60           ; exit syscall number
mov rdi, 0            ; exit code = 0
syscall               ; Call kernel

Memory Operations - Loading & Storing

To access memory, use square brackets []:

Memory Operations
mov rax, [rbx]         ; Load 8 bytes from address in RBX
mov [rax], 100         ; Store 100 at address in RAX
mov rcx, [rax + 8]     ; Load from address (RAX + 8)
mov [rbx - 16], rax    ; Store to address (RBX - 16)

Stack Operations - PUSH & POP

The stack is a Last-In-First-Out (LIFO) data structure. PUSH and POP manage it:

📌 PUSH - Push Value onto Stack ▼

PUSH source decrements RSP and writes value to stack.

PUSH Example
mov rax, 123
push rax               ; RSP decreases by 8, [RSP] = 123

📌 POP - Pop Value from Stack ▼

POP destination reads value from stack and increments RSP.

POP Example
pop rbx                ; RBX = [RSP], RSP increases by 8

Complete Assembly Program Example

Let's combine everything into a complete program:

Complete Assembly - Print numbers using loop
section .text
global _start

_start:
    mov rax, 0             ; Counter = 0

loop:
    ; Print number (simplified)
    add rax, 1             ; Increment counter
    cmp rax, 10            ; Compare with 10
    jl loop                ; Jump if Less - repeat loop

    ; Exit
    mov rax, 60
    mov rdi, 0
    syscall

✓ Assembly Language Basics Mastered!

You now understand the fundamental instructions that make up all programs. These instructions are the building blocks of everything in reverse engineering.

ℹ️ Continue Learning

In the next sections, we'll learn how to structure assembly code, use assemblers and linkers, debug with GDB, and analyze binaries with professional tools.

Assembly Code Structure

Every assembly program has a specific structure with defined sections for code and data. Understanding this structure is essential for writing and analyzing assembly.

The Three Main Sections

📌 .text Section - Executable Code ▼

The .text section contains all executable code — the assembly instructions that the CPU actually runs.

Key Points

Read-only (usually)
Loaded at program start
Contains functions, loops, logic
Program execution starts at _start label

📌 .data Section - Initialized Data ▼

The .data section contains data with known values — strings, constants, arrays, etc.

.data Section Example
section .data
    msg:    db "Hello World", 0x0a
    num:    dq 12345
    array:  dd 1, 2, 3, 4, 5

Directive	Size	Meaning
db	1 byte	Define Byte
dw	2 bytes	Define Word
dd	4 bytes	Define Double-word
dq	8 bytes	Define Quad-word

📌 .bss Section - Uninitialized Data ▼

The .bss section reserves space for variables without initial values (like buffers, arrays, etc.)

.bss Section Example
section .bss
    buffer:     resb 256    ; Reserve 256 bytes (uninitialized)
    array:      resq 10     ; Reserve space for 10 quad-words

resb: Reserve bytes

resw: Reserve words (2 bytes each)

resd: Reserve double-words (4 bytes each)

resq: Reserve quad-words (8 bytes each)

Complete Program Structure

Complete Assembly Program Structure
; ============================================
; DATA SECTION - Initialized data
; ============================================
section .data
    msg:        db "Hello", 0x0a
    msg_len:    equ $ - msg

; ============================================
; BSS SECTION - Uninitialized data (buffers)
; ============================================
section .bss
    buffer:     resb 1024

; ============================================
; TEXT SECTION - Code (executable)
; ============================================
section .text
global _start

_start:
    ; Program entry point
    mov rax, 1             ; write syscall
    mov rdi, 1             ; fd = stdout
    mov rsi, msg           ; buffer = msg
    mov rdx, msg_len       ; count = length
    syscall

    ; Exit cleanly
    mov rax, 60
    mov rdi, 0
    syscall

Global Symbols & Labels

In assembly, you can define:

Labels: Mark positions in code (used for jumps)

Global symbols: Mark entry points that external code can jump to

Labels and Symbols
global _start            ; _start is accessible from outside

section .text

_start:                 ; Program entry (global symbol)
    call my_function

    mov rax, 60
    syscall

my_function:            ; Local label (not global)
    mov rax, 1
    ret

loop_start:             ; Another label
    add rcx, 1
    cmp rcx, 10
    jl loop_start             ; Jump back to label

Symbol Definition with EQU

Use EQU to define constants:

Using EQU for Constants
section .data
    msg:        db "Hello World", 0x0a
    msg_len:    equ $ - msg    ; $ = current position, msg_len = length

section .text
global _start
BUFFER_SIZE equ 256

_start:
    sub rsp, BUFFER_SIZE   ; Allocate space on stack

Key insight: $ - msg calculates the distance between current position and msg, giving the string length.

✓ Program Structure Understood!

Now you can organize assembly programs properly with code and data sections.

Assembler, Compiler, Linker & ELF Format

To run an assembly program, you need to convert it from assembly language to machine code. This involves the assembler, linker, and understanding the ELF binary format.

The Compilation Pipeline

SOURCE CODE

program.asm (Assembly)
or program.c (C/C++)

↓

NASM / GCC

PREPROCESSOR (C/C++ only)

• Expands #include directives

• Processes #define macros

• Handles conditional compilation (#ifdef)

Output: program.i (preprocessed source)

↓

Compiler (C) or Assembler

ASSEMBLER (NASM)

• Converts assembly to machine code

• Generates symbol table

• Creates relocation entries

Command: nasm -f elf64 program.asm -o program.o

Output: program.o (object file)

↓

Linker (LD)

LINKER

• Combines multiple object files

• Resolves external symbol references

• Assigns final memory addresses

• Links against libraries (libc, etc.)

• Sets entry point (_start)

Command: ld program.o -o program

Output: program (executable ELF binary)

↓

./program

EXECUTABLE BINARY

• ELF64 executable format

• Contains machine code

• Ready to execute

Run: ./program

Quick Reference:

Assembly: nasm -f elf64 prog.asm -o prog.o && ld prog.o -o prog
C: gcc prog.c -o prog (all steps combined)
C (manual): gcc -E prog.c > prog.i → gcc -S prog.i → gcc -c prog.s → ld ...

Step 1: Assembly → Machine Code (NASM)

NASM (Netwide Assembler) converts assembly source code to machine code object files.

NASM Command - Assemble
nasm -f elf64 program.asm -o program.o

NASM Options Explained

-f elf64: Output format = ELF64 (64-bit Linux executable format)

program.asm: Input assembly file

-o program.o: Output object file

What happens:

NASM parses your .asm file

Converts each instruction to machine code bytes

Creates relocatable code (with placeholder addresses)

Produces program.o (object file)

⚠️ Important

The object file (.o) is NOT yet executable. It contains machine code but references to external symbols aren't resolved. We need the linker.

Step 2: Linking (LD)

The linker (ld) combines object files and resolves all symbol references to create the final executable.

Linker Command - Link Object File
ld program.o -o program

What the Linker Does

Combines all object files into one

Resolves symbol references (labels, external functions)

Assigns final memory addresses

Sets up entry point (_start symbol)

Creates the final executable binary

Complete Assembly Workflow Example

Complete Workflow - Shell Commands
# Step 1: Create assembly file
cat > program.asm << 'EOF'
section .text
global _start
_start:
    mov rax, 60
    mov rdi, 0
    syscall
EOF

# Step 2: Assemble (ASM → Object)
nasm -f elf64 program.asm -o program.o

# Step 3: Link (Object → Executable)
ld program.o -o program

# Step 4: Run
./program
echo $?  ; Exit code: 0 (success)

Understanding ELF64 Format

ELF (Executable and Linkable Format) is the standard binary format for Linux. All executables, libraries, and object files use this format.

ELF Structure

An ELF file contains:

ELF Header: Magic number (identifies as ELF), architecture, entry point

Program Headers: How to load the file into memory

Sections: .text (code), .data (data), .bss (uninit data), .symtab (symbols), etc.

Section Headers: Describe each section's location and properties

View ELF Header with `file`
$ file program
program: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Meaning of each part:

Term	Meaning
64-bit	64-bit architecture (x86-64)
LSB	Little-Endian Byte Order (least significant byte first)
executable	Can be directly run as a program
x86-64	Intel x86-64 instruction set
statically linked	All libraries compiled in (no external .so dependencies)
not stripped	Symbol table intact (function names visible)

Viewing ELF Structure with readelf & objdump

📌 readelf - Display ELF Information ▼

readelf displays detailed information about ELF files.

Common readelf Commands
readelf -h program        ; Show ELF header
readelf -S program        ; Show sections
readelf -l program        ; Show program headers
readelf -s program        ; Show symbol table
readelf -d program        ; Show dynamic section

📌 objdump - Disassemble Binaries ▼

objdump displays disassembled machine code and detailed binary information.

Common objdump Commands
objdump -d program              ; Disassemble all code
objdump -M intel -d program      ; Disassemble in Intel syntax
objdump -s program               ; Show all sections (hex dump)
objdump -t program               ; Show symbol table
objdump -h program               ; Show section headers

✓ Build Pipeline Mastered!

You can now write assembly, assemble it with NASM, link with LD, and understand the ELF binary format.

Reverse Engineering with GDB

GDB (GNU Debugger) is the industry-standard debugger for Linux. It lets you execute programs step-by-step, inspect memory and registers, and understand exactly what code is doing.

GDB Installation

Install GDB
# Ubuntu/Debian
sudo apt-get install gdb

# Fedora/RHEL
sudo dnf install gdb

# macOS
brew install gdb

Starting GDB

Launch GDB with a Binary
gdb ./program
gdb -q ./program        ; Quiet mode (no banner)

Essential GDB Commands

📌 run - Execute the Program ▼

run [args] starts the program with optional command-line arguments.

run Command
(gdb) run
(gdb) run arg1 arg2     ; Pass arguments
(gdb) run < input.txt   ; Redirect stdin

📌 break - Set Breakpoints ▼

break location sets a breakpoint, pausing execution at that point.

break Command Examples
(gdb) break main           ; Break at function main
(gdb) break 0x08048400  ; Break at address
(gdb) break *0x08048400 ; Break at address (safer syntax)
(gdb) info break        ; List all breakpoints
(gdb) delete 1          ; Delete breakpoint 1
(gdb) disable 1         ; Disable (don't remove) breakpoint 1
(gdb) enable 1          ; Re-enable breakpoint 1

📌 continue - Resume Execution ▼

continue (or c) resumes execution until the next breakpoint.

continue Command
(gdb) continue
(gdb) c                 ; Shorthand

📌 info functions - List Functions ▼

info functions shows all functions in the binary.

info functions Example
(gdb) info functions
All defined functions:

File program.c:
void check_password(char*);
int main();
void hidden_function();

Why useful: Quickly find all functions in a binary, especially in stripped binaries.

📌 info registers - Show Register Values ▼

info registers (or i r) displays current register values.

info registers Output
(gdb) info registers
rax            0x1                 1
rbx            0x0                 0
rcx            0x7ffffffde7f8      140737488281592
rdx            0x7ffffffde8f8      140737488282360
rsi            0x7ffffffde8e8      140737488282344
rdi            0x1                 1
rbp            0x7ffffffde820      0x7ffffffde820
rsp            0x7ffffffde800      0x7ffffffde800
rip            0x401000            0x401000 <_start>

View Specific Register
(gdb) print $rax         ; Print RAX in decimal
(gdb) print/x $rax    ; Print RAX in hex
(gdb) print/d $rax    ; Print RAX in decimal
(gdb) print/s $rsi    ; Print RSI as string

📌 disassemble - Show Assembly Code ▼

disassemble function shows assembly code for a function.

disassemble Command
(gdb) disassemble main    ; Disassemble main function
(gdb) disassemble      ; Disassemble current function
(gdb) disassemble 0x401000 0x401050  ; Range of addresses

⚠️ Syntax Format

By default, GDB uses AT&T syntax. Switch to Intel syntax (more readable):

                            (gdb) set disassembly-flavor intel
(gdb) disassemble main  ; Now shows Intel syntax
                        

📌 x - Examine Memory ▼

x [/format] address displays memory at an address.

x Command Examples
(gdb) x/10x $rsp         ; View 10 hex values at RSP
(gdb) x/10w $rbp         ; View 10 words (4 bytes) at RBP
(gdb) x/20b $rax         ; View 20 bytes at RAX
(gdb) x/s $rsi           ; View string at RSI
(gdb) x/i $rip           ; View instruction at RIP

Format	Display As
x	Hex
d	Decimal (signed)
u	Unsigned decimal
s	String
i	Instruction
c	Character
o	Octal

📌 si/ni - Step Instructions ▼

si (step into) executes one instruction, stepping into function calls.

ni (next instruction) executes one instruction, stepping over function calls.

si/ni Commands
(gdb) si                  ; Step into next instruction
(gdb) si 5              ; Step 5 times
(gdb) ni                ; Next instruction (over calls)
(gdb) step              ; Source-level step into
(gdb) next              ; Source-level next instruction

Difference Between SI and NI

                            0x401000: call print_msg  ; CALL instruction
0x401005: mov rax, 0

Using SI: Execute CALL, jump into print_msg function
Using NI: Execute CALL, jump over it to 0x401005
                        

📌 set $register = value - Modify Registers ▼

set $register = value modifies register values during debugging.

Modify Register Values
(gdb) set $rax = 100      ; Set RAX to 100
(gdb) set $rdi = 0     ; Set RDI to 0
(gdb) info registers   ; Verify changes

Why useful: Bypass password checks, change comparison results, test alternative code paths.

Complete GDB Debugging Walkthrough

Let's debug a real binary step-by-step:

Complete GDB Session Example
(gdb) gdb ./crackme
(gdb) set disassembly-flavor intel   ; Use Intel syntax
(gdb) info functions                 ; List all functions
(gdb) break main                     ; Set breakpoint at main
(gdb) run secret123                  ; Run with password argument
Breakpoint 1 at 0x0010149a

(gdb) disassemble main               ; View main function code
(gdb) si                             ; Step into first instruction
(gdb) info registers                 ; Check all registers
(gdb) x/s $rdi                       ; View command-line arg (1st arg in RDI)
(gdb) continue                       ; Run to next breakpoint
(gdb) quit                           ; Exit GDB

pwndbg - Enhanced GDB

pwndbg is an awesome GDB plugin that adds powerful reverse engineering features.

Install pwndbg
git clone https://github.com/pwndbg/pwndbg
cd pwndbg
./setup.sh

pwndbg enhancements:

Better disassembly display (syntax highlighting)

Visual stack and register display

Memory map view

Additional commands: nearpc, telescope, vmmap

✓ GDB Mastery Achieved!

You can now debug binaries, inspect memory, and understand program execution flow in real-time.

Radare2 - Advanced Binary Analysis

Radare2 is a powerful, open-source framework for reverse engineering and analyzing binaries. It combines static analysis, dynamic analysis, and visualization in one tool.

Radare2 Installation

Install Radare2
# Linux
git clone https://github.com/radareorg/radare2
cd radare2
sys/install.sh

# Or via package manager
sudo apt-get install radare2

Launching Radare2

Start Radare2
r2 ./binary           ; Open binary for analysis
r2 -w ./binary           ; Write mode (can modify binary)

Essential Radare2 Commands

📌 aaa - Analyze All ▼

aaa performs full analysis on the binary — finds functions, data, and creates control flow graphs.

aaa Command
[0x08048400]> aaa    ; Full analysis
[0x08048400]> afl     ; List all functions (use after aaa)

📌 afl - List Functions ▼

afl lists all discovered functions with addresses.

afl Output Example
[0x08048400]> afl
0x08048400  1  42   entry0
0x08048432  1  37   sym.main
0x08048460  1  52   sym.check_password
0x08048495  1  25   sym.print_success

📌 pdf - Print Disassembly of Function ▼

pdf @address prints disassembled function at address.

pdf Command
[0x08048400]> pdf @ sym.main    ; Disassemble main
[0x08048400]> pdf                           ; Disassemble current function

📌 db - Debug Mode ▼

db enters debug mode to execute and trace the binary.

Debug Mode Commands
[0x08048400]> db main           ; Set breakpoint at main
[0x08048400]> dc               ; Continue execution
[0x08048400]> dr               ; Show registers
[0x08048400]> ds               ; Step instruction

📌 dc - Continue Execution ▼

dc continues binary execution until breakpoint.

📌 V - Visual Mode ▼

V opens visual/interactive mode with graphical display.

Visual Mode
[0x08048400]> V           ; Enter visual mode
                        ; Inside visual mode:
p                       ; Change view mode
j/k                     ; Move down/up
q                       ; Quit visual mode

📌 VV - Graph Mode ▼

VV shows control flow graph in visual mode.

Graph Mode Navigation
[0x08048400]> VV          ; Enter graph mode
j/k                     ; Navigate blocks
Enter                   ; Follow jump
Esc                     ; Go back
q                       ; Exit

📌 iz - Show Strings ▼

iz lists all strings found in the binary.

iz Command
[0x08048400]> iz
Strings
0x08049f00 11 Wrong password
0x08049f0c 10 Access granted
0x08049f17 15 Enter password:

Complete Radare2 Workflow

Full Radare2 Analysis Session
$ r2 ./crackme
[0x08048400]> aaa                ; Analyze everything
[0x08048400]> afl                ; List functions
[0x08048400]> iz                 ; Show strings
[0x08048400]> pdf @ sym.main     ; View main function
[0x08048400]> V                  ; Visual mode to explore

✓ Radare2 Fundamentals Mastered!

You can now analyze binaries statically with Radare2 and visualize code flow.

Static Analysis - Professional Tools

Static analysis means examining a binary without running it. You analyze code structure, disassembly, and data flow to understand what a program does. Professional tools like Ghidra, IDA Pro, and Binary Ninja dominate this space.

Professional Static Analysis Tools

📌 Ghidra - NSA Open-Source Reverse Engineering ▼

Ghidra is the free, open-source reverse engineering tool from the NSA. It's powerful enough to compete with commercial tools.

Ghidra Features

Decompilation: Converts assembly back to C-like pseudocode

Cross-platform: Windows, Mac, Linux, and supports multiple architectures

Collaborative: Multiple analysts can work on same binary simultaneously

Scripting: Python/Java API for automation

Free: Open-source, no licensing fees

Basic Ghidra Usage
# Launch Ghidra
ghidraRun

# Then:
1. File → New Project
2. Import → Select binary
3. Double-click binary to open
4. Let it analyze (auto analyze runs)
5. View → Functions to see all functions
6. Double-click function to decompile

Decompilation window shows:

Left: Function list

Center: Decompiled C-like code

Right: Assembly code

Bottom: Comments and cross-references

📌 IDA Pro - Industry Standard ▼

IDA Pro is the gold standard in reverse engineering. Used by security researchers worldwide.

IDA Pro Advantages

Best decompiler: Hex-Rays decompiler (separate license, worth it)

Architecture support: 70+ architectures

Scripting: Python, IDC (IDA's native language)

Professional plugins: Massive ecosystem

Cost: ~$900+ (professional license)

⚠️ Learning Curve

IDA is powerful but steep learning curve. For CTF and learning, Ghidra is often better. For professional work, IDA is industry standard.

📌 Binary Ninja - Modern Alternative ▼

Binary Ninja is a modern reverse engineering platform with excellent Python API and collaborative features.

Binary Ninja Highlights

Modern UI: Cleaner, more intuitive than IDA

Python API: Scriptable and extensible

Headless mode: Command-line analysis

Reasonable price: ~$400/year (personal) or $600 (professional)

Active development: Regular updates and improvements

Command-Line Static Analysis Tools

Essential tools for quick binary inspection and analysis:

📌 file - Identify File Type ▼

file determines file type by examining magic bytes and file structure.

file Command Usage
file ./program                    ; Identify binary type
file -i ./program                 ; Show MIME type
file -b ./program                 ; Brief mode (no filename)
file * | grep ELF                 ; Find all ELF files in directory

Example output:

program: ELF 64-bit LSB executable, x86-64, dynamically linked, stripped

When to use: First step in binary analysis to understand architecture, linking type, and whether symbols are present.

📌 strings - Extract Printable Strings ▼

strings extracts human-readable text from binary files, useful for finding hardcoded credentials, URLs, error messages, and function names.

strings Command Usage
strings ./program                    ; Extract ASCII strings (default min length: 4)
strings -n 10 ./program             ; Minimum string length 10
strings -a -t x ./program           ; Show all strings with hex offset
strings -e l ./program              ; Unicode strings (little-endian)
strings ./program | grep -i password ; Search for specific strings
strings ./program | grep "^/"       ; Find file paths

When to use: First step in binary analysis to quickly identify interesting text, function names, library paths, or hardcoded secrets.

Pro tip: Combine with grep to search for URLs, IP addresses, API keys, or specific keywords.

📌 hexedit / hexdump / xxd - Hex Editors & Viewers ▼

Hex editors allow viewing and modifying binary files at the byte level.

Hex Viewer Commands
# hexdump - View hex representation
hexdump -C ./program | head        ; Canonical hex+ASCII view
hexdump -C ./program | grep "ELF"  ; Find ELF magic bytes

# xxd - Hex dump tool
xxd ./program | head               ; Hex dump with ASCII
xxd -l 100 ./program               ; First 100 bytes only

# hexedit - Terminal hex editor (interactive)
hexedit ./program                  ; Edit binary files

When to use: Examine file headers, find magic bytes, analyze packed/obfuscated binaries, or patch binaries directly.

📌 nm - List Symbols ▼

nm lists symbols from object files and libraries. Fails gracefully on stripped binaries.

nm Command Usage
nm ./program                    ; List all symbols
nm -D ./program                 ; List dynamic symbols only
nm -g ./program                 ; List external symbols
nm -C ./program                 ; Demangle C++ symbols
nm -A *.o                       ; List symbols from all object files

Symbol types:

T: Text section (code)

D: Initialized data

B: Uninitialized data (BSS)

U: Undefined (external reference)

When to use: Check if binary is stripped, identify imported/exported functions, or find specific symbols.

📌 ldd - Print Shared Library Dependencies ▼

ldd prints shared libraries required by a dynamically linked binary.

ldd Command Usage
ldd ./program                   ; Show library dependencies
ldd -v ./program                ; Verbose (version information)
ldd -r ./program                ; Report missing symbols

Example output:

                        linux-vdso.so.1 => (0x00007fff...)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)
/lib64/ld-linux-x86-64.so.2 (0x00007f...)
                    

When to use: Understand binary dependencies, troubleshoot missing libraries, or identify which libc version is required.

⚠️ Security Warning

Never run ldd on untrusted binaries! It executes the binary's dynamic linker. Use objdump -p ./program | grep NEEDED instead for safe analysis.

📌 readelf - Display ELF File Information ▼

readelf displays detailed information about ELF files (covered in assembler section, but worth repeating here).

readelf Command Usage
readelf -h ./program           ; Show ELF header
readelf -S ./program           ; Show section headers
readelf -l ./program           ; Show program headers (segments)
readelf -s ./program           ; Show symbol table
readelf -d ./program           ; Show dynamic section
readelf -r ./program           ; Show relocations
readelf -n ./program           ; Show notes (build ID, etc.)

When to use: Deep dive into ELF structure, find entry points, analyze security features (NX, PIE, RELRO), or debug linking issues.

📌 objdump - Object File Dumper & Disassembler ▼

objdump is GNU's swiss-army knife for binary analysis and disassembly.

objdump Command Usage
objdump -d ./program              ; Disassemble executable sections
objdump -M intel -d ./program      ; Disassemble in Intel syntax
objdump -D ./program               ; Disassemble ALL sections
objdump -s ./program               ; Full hex dump of all sections
objdump -t ./program               ; Symbol table
objdump -T ./program               ; Dynamic symbol table
objdump -h ./program               ; Section headers
objdump -p ./program               ; Program headers
objdump -R ./program               ; Dynamic relocations

When to use: Quick disassembly, examine specific sections, or verify compiler output.

📌 radare2 / r2 - Reverse Engineering Framework ▼

radare2 is covered in detail in its own section, but deserves mention here as a powerful command-line static analysis tool.

Quick radare2 Static Analysis
r2 -A ./program            ; Auto-analyze on load
r2 -c "aaa; pdf @ main" ./program  ; Analyze and disassemble main

See the Radare2 section for comprehensive commands and usage.

📌 checksec - Check Binary Security Properties ▼

checksec checks security features enabled in a binary (RELRO, Stack Canary, NX, PIE, RPATH, RUNPATH).

checksec Command Usage
# Install checksec
sudo apt-get install checksec   ; Debian/Ubuntu
wget https://github.com/slimm609/checksec.sh/raw/master/checksec && chmod +x checksec

# Check single binary
checksec --file=./program

# Check all binaries in directory
checksec --dir=/bin

Security features explained:

RELRO (Relocation Read-Only): Makes GOT read-only after relocation

Stack Canary: Detects buffer overflows

NX (No Execute): Marks stack/heap non-executable

PIE (Position Independent Executable): Enables ASLR

When to use: Assess exploit difficulty, verify compiler flags, or check if binary was compiled with security hardening.

📌 binwalk - Firmware Analysis Tool ▼

binwalk analyzes, extracts, and reverse engineers firmware images and embedded files.

binwalk Command Usage
binwalk firmware.bin           ; Scan for embedded files/filesystems
binwalk -e firmware.bin         ; Extract embedded files
binwalk -E firmware.bin         ; Entropy analysis (detect encryption/compression)
binwalk -A firmware.bin         ; Scan for executable code

When to use: Analyze firmware images, extract embedded file systems (squashfs, cramfs), or identify packed/encrypted sections.

📌 exiftool - Extract Metadata ▼

exiftool reads and writes metadata in files. Useful for forensics and identifying compilation details.

exiftool Command Usage
exiftool ./program             ; Extract all metadata
exiftool -time:all ./program    ; Show timestamps
exiftool -Binary ./program      ; Show binary-specific metadata

When to use: Find compilation timestamps, compiler versions, or embedded metadata that may reveal development environment.

📌 ltrace - Library Call Tracer (Static Context) ▼

ltrace is primarily for dynamic analysis (covered in Dynamic Analysis section), but can reveal which library functions a binary uses.

See Dynamic Analysis section for comprehensive ltrace usage.

📌 strace - System Call Tracer (Static Context) ▼

strace is primarily for dynamic analysis (covered in Dynamic Analysis section), but understanding syscall usage is part of static analysis.

See Dynamic Analysis section for comprehensive strace usage.

Advanced Static Analysis Tools

📌 Cutter - GUI for Radare2 ▼

Cutter provides a modern Qt-based GUI for radare2 with decompilation support.

Install Cutter
# Download from https://cutter.re
sudo apt-get install cutter    ; Ubuntu 20.04+

Features: Graphical control flow, decompiler (Ghidra plugin), hex editor, debugger integration.

When to use: Modern alternative to IDA/Ghidra for free, visual binary analysis with radare2 backend.

📌 Hopper - macOS/Linux Disassembler ▼

Hopper is a commercial reverse engineering tool for macOS and Linux.

Price: ~$100 (personal), cheaper than IDA Pro.

Features: Disassembler, pseudo-code decompiler, Python scripting, x86/ARM/MIPS support.

When to use: Professional alternative to IDA at lower cost, especially on macOS.

String Analysis & Pattern Matching

Beyond basic string extraction, pattern analysis helps identify functionality:

Advanced String Analysis
strings ./binary | grep -E "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"  ; Find emails
strings ./binary | grep -E "https?://[^\s]+"  ; Find URLs
strings ./binary | grep -E "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$"  ; Find IP addresses
strings ./binary | grep -i "key\|password\|secret\|token\|api"  ; Find credentials

Why strings are useful:

Often reveal hardcoded passwords or API keys

Show error messages that hint at program logic

Identify libraries and functions

Quickly find interesting areas to analyze

Discover hidden features or debug messages

Identify encryption algorithms by string constants

Control Flow Analysis

Understanding how code branches and jumps helps identify:

Conditional logic: If/else patterns in assembly

Loops: Repeated code sections

Function calls: External dependencies

Dead code: Unreachable branches

All professional tools (Ghidra, IDA, Binary Ninja) show control flow graphs that visualize this.

✓ Static Analysis Mastered!

You can now use professional tools to analyze binaries without running them.

Dynamic Analysis - Runtime Behavior

Dynamic analysis means running the binary in a controlled environment while monitoring its behavior. Watch system calls, library calls, memory modifications, and network traffic to understand what code actually does.

System Call Tracing with strace

strace intercepts and logs all system calls made by a process.

strace Examples
strace ./program                   ; Trace all syscalls
strace -e trace=open,read ./program ; Trace specific syscalls
strace -o trace.txt ./program       ; Save to file
strace -c ./program                 ; Summary (count syscalls)
strace -p 1234                      ; Attach to running process

What strace reveals:

Files being read/written

Network connections (socket, connect syscalls)

Environment variables being read

Memory mappings

Signal handling

Interpreting strace Output

                    open("/etc/passwd", O_RDONLY)       = 3
read(3, "root:x:0:0:root:/root:/bin/bash\n", 32) = 32
write(1, "User found!\n", 12)        = 12
exit_group(0)                        = ?
                

Meaning: Program opened /etc/passwd, read 32 bytes, wrote "User found!" to stdout (fd 1), then exited with status 0.

Library Call Tracing with ltrace

ltrace traces library function calls (libc, libcrypto, etc.).

ltrace Examples
ltrace ./program                    ; Trace library calls
ltrace -c ./program                 ; Summary (count function calls)
ltrace -o trace.txt ./program       ; Save to file
ltrace -e strcmp ./program          ; Trace specific functions

Useful library functions to trace:

strcmp: String comparison (password checks)

strcpy: String copying (buffer overflow detection)

malloc/free: Memory allocation

printf: Output (what's being printed)

getenv: Environment variable access

ltrace Example - Password Check

                    strcmp("admin123", "password123")       = -1
puts("Incorrect password")             = 19
exit(1)
                

Insight: Program compared input with "password123". Now you know the password!

Combined strace + ltrace

Use together for complete picture:

Trace both syscalls and library calls
strace -f ltrace ./program         ; Both (slower)
strace -e trace=file ./program      ; Focus on file operations

Advanced Dynamic Analysis - Frida

Frida is a powerful instrumentation framework. Inject code into running processes to hook functions and modify behavior in real-time.

Basic Frida Usage
# Install
pip install frida frida-tools

# List processes
frida-ps

# Attach to process
frida -p 1234

# Spawn and trace
frida -n ./program

Frida capabilities:

Hook any function (intercept and modify behavior)

Read/write process memory

Dump arguments and return values

Modify program flow in real-time

Works on binaries you don't have source for

✓ Dynamic Analysis Arsenal Complete!

You can now trace system calls, monitor library calls, and use advanced instrumentation.

Analyzing Stripped Binaries

A stripped binary has all debug symbols removed — function names, variable names, and type information are gone. This makes reverse engineering harder but not impossible.

Identifying Stripped Binaries

Check if binary is stripped
file ./program
Output examples:
not stripped - has symbols
stripped - symbols removed

file -i ./program               ; MIME type info
readelf -S ./program            ; Show sections
nm ./program                    ; Empty if stripped
objdump -t ./program            ; Symbol table

Techniques for Stripped Binaries

📌 Function Identification via Signatures ▼

Even without names, you can identify common library functions by their machine code patterns.

Function Signature Databases

FLIRT (IDA): Fast Library Identification and Recognition Technology

Ghidra Function ID: Built-in pattern matching

YARA rules: Pattern matching for functions

How it works: Compiler generates same code patterns for common functions (strlen, malloc, etc.). Tools match these patterns and identify functions automatically.

Using Ghidra's Function ID
In Ghidra:
1. Window → Function ID
2. Load database → Select standard library
3. Search → Auto-identify known functions
Many libc functions automatically named

📌 Heuristic Analysis - Entry Points ▼

Without symbols, look for patterns that reveal function boundaries:

Function prologue: `push rbp; mov rbp, rsp` (function start)

Function epilogue: `pop rbp; ret` (function end)

Call patterns: `call` followed by function prologue = new function

Loops: Backwards jumps to earlier code

Data references: Addresses that reference strings or constants

📌 Cross-referencing & String Analysis ▼

Strings often identify function purposes:

Stripped binary analysis approach
# Step 1: Extract strings
strings ./program | grep -i error

# Step 2: Find where strings are referenced
In Ghidra: Search → For Strings...
Double-click string → Shows code that uses it

# Step 3: Identify surrounding function
Look at prologue/epilogue to find function bounds
Analyze logic based on string context

📌 Machine Learning-Based Symbol Recovery ▼

Modern research uses LLMs to recover function names from stripped binaries.

Recent Tools

ReSym (Purdue 2024): Recovers variable names and types from stripped binaries (56.4% accuracy)

SYMGEN: Uses domain-adapted LLMs for function name inference (outperforms SOTA by 400%+)

DeLink: Recovers source file information from binaries

How it works: Train ML models on decompiled code patterns. Given stripped binary, model predicts likely function names and variable types.

Dynamic Analysis of Stripped Binaries

Use runtime tracing to understand behavior without symbols:

Dynamic approach to stripped binaries
# Trace syscalls to understand behavior
strace -o syscalls.txt ./program

# Trace library calls
ltrace -o libcalls.txt ./program

# Use GDB to set breakpoints and inspect registers
gdb ./program
(gdb) break *0x401000
(gdb) run
(gdb) info registers   ; See actual values

Practical Example - Analyzing Stripped Binary

Complete workflow for stripped binary
# 1. Identify if stripped
$ file ./crackme
crackme: ELF 64-bit, stripped

# 2. Extract strings - look for clues
$ strings ./crackme | grep -i password
Incorrect password
Access granted

# 3. Open in Ghidra
- Window → Function ID → Load standard library
- Many stdlib functions now identified
- Search → For Strings → Find "password" references
- Double-click string to see code using it

# 4. Analyze the function using string as anchor
- Look at function prologue/epilogue
- Identify comparisons and jumps
- Look for password check logic

# 5. Use dynamic analysis if stuck
$ ltrace ./crackme
strcmp("myinput", "secretpass") = -37
puts("Incorrect password") = 19
Now you know the password!

✓ Stripped Binary Analysis Mastered!

You can identify functions, recover symbols, and analyze behavior even without debug information.

Binary Patching - Code Modification

Binary patching means modifying a binary's machine code to change its behavior. Used to bypass password checks, remove license verification, or modify logic flow.

Why Patch Binaries?

Bypass authentication/license checks

Change program behavior for analysis

Create custom versions without source

Remove anti-debugging code

Test vulnerability fixes

Three Patching Approaches

📌 Method 1: Hex Editor - Direct Modification ▼

Most direct method: Use hex editor to change machine code bytes.

Hex Editors Available

HxD: Windows (excellent)

Hex Fiend: macOS

hexdump/xxd: Linux command-line

vim: `:set binary` then edit hex

Hex Editor Patching Workflow
# Step 1: Find the instruction to patch in IDA/Ghidra
cmp eax, 0x12345
jne fail            ; This is at offset 0x1234

# Step 2: Convert jne to NOP (0x90)
jne opcode = EB 05 (jump) 
NOP opcode = 90

# Step 3: Open hex editor, go to offset 0x1234
Replace: EB 05 → 90 90 (2 NOPs to fill space)

# Step 4: Save and test
./patched_binary
                    


                    Key instruction to know:
                    NOP (0x90): No operation - does nothing, safe filler
                    Replace conditional jumps with NOPs to bypass checks



            
                
                    📌 Method 2: IDA/Ghidra Built-in Patching
                    ▼
                
                
                    Both IDA and Ghidra have native patching capabilities.
                    
                        IDA Patching
                        # In IDA hex view:
1. Right-click on byte
2. Select "Edit"
3. Type new hex values
4. Right-click → "Apply changes"

# Save patched binary:
File → Produce file → Create DIF file (diff/patch file)
                    

                    
                        Ghidra Patching
                        # In Ghidra disassembly view:
1. Window → Hex
2. Right-click byte → Edit (pencil icon)
3. Type replacement values
4. File → Export Program → Binary

# Now you have a modified binary
                    
                

            


            
                
                    📌 Method 3: Assembly Modification + Reassemble
                    ▼
                
                
                    For more complex changes, write assembly, assemble it, patch in.
                    
                        Advanced Patching - Replace Function
                        # Step 1: Identify function to replace (offset 0x401000, 50 bytes)

# Step 2: Write replacement assembly
mov rax, 1      ; Return 1 (success)
ret

# Step 3: Assemble it
nasm -f bin replacement.asm -o replacement.bin
hexdump -C replacement.bin
Output: 48 c7 c0 01 00 00 00 c3 (8 bytes)

# Step 4: Pad with NOPs to match original size (50 bytes)
Need 50 bytes total, have 8, so add 42 NOPs (0x90)

# Step 5: Patch hex in original binary at offset 0x401000
hex editor: Go to 0x401000, replace with new bytes
                    
                
            

            Real-World Patching Example

            
                Complete patching workflow - Bypass password
                # Binary: crackme - asks for password
$ ./crackme
Enter password: test
Incorrect!

# Step 1: Open in Ghidra, find password check
0x401234: mov rax, [rip + 0x2dc6]  ; Load input
0x40123b: mov rbx, [rip + 0x2dc5]  ; Load expected password
0x401242: cmp rax, rbx              ; Compare
0x401245: jne 0x401260              ; Jump to fail if not equal
0x401247: call print_success        ; Otherwise print success

# Step 2: We want to skip the jne (jump to fail)
# Option A: Replace jne with NOPs
jne opcode at 0x401245: 75 19 (2 bytes)
Replace with: 90 90 (2 NOPs)

# Step 3: Use hex editor to patch
Go to file offset 0x401245
Find bytes: 75 19
Replace with: 90 90
Save file

# Step 4: Test
$ ./crackme_patched
Enter password: anything
Success!
# Password check bypassed! Any input works now
            

            Common Patching Targets

            
                
                    
                        What to Patch
                        Pattern
                        Replacement
                    
                    
                        Password check
                        cmp; jne failure
                        Replace jne with NOPs
                    
                    
                        License validation
                        call validate_license; jne fail
                        NOP out the jne
                    
                    
                        Anti-debug
                        call is_debugged; jne exit
                        Make function return 0
                    
                    
                        Trial expiration
                        cmp rax, expiration_date
                        Change expiration_date value
                    
                    
                        Error message
                        lea rdi, [rip + error_str]
                        Change string pointer/content
                    
                
            

            
                ✓ Binary Patching Mastered!
                You can modify binaries to change behavior, bypass checks, and test modifications.

What to Patch	Pattern	Replacement
Password check	cmp; jne failure	Replace jne with NOPs
License validation	call validate_license; jne fail	NOP out the jne
Anti-debug	call is_debugged; jne exit	Make function return 0
Trial expiration	cmp rax, expiration_date	Change expiration_date value
Error message	lea rdi, [rip + error_str]	Change string pointer/content



        
        
            Anti-Reversing Techniques & Bypasses
            
            
                Software developers implement anti-reversing techniques to protect intellectual property and prevent cracking. Understanding these techniques helps you bypass them and analyze protected binaries.
            

            Common Anti-Reversing Techniques

            
                
                    📌 Anti-Debugging - Detect Debuggers
                    ▼
                
                
                    Anti-debug code detects if a debugger is attached and terminates or behaves differently.
                    
                        Detection Methods
                        ptrace check: Call ptrace(PTRACE_TRACEME) - fails if debugger attached
                        parent process check: Verify parent is shell, not debugger
                        Breakpoint detection: Check for INT3 (0xCC) bytes in code
                        Timing checks: Measure execution time - slower under debugger
                        /proc/self/status: Check TracerPid (non-zero if debugged)
                    
                    
                        Example: ptrace anti-debug check
                        int main() {
    if (ptrace(PTRACE_TRACEME, 0, 1, 0) == -1) {
        printf("Debugger detected! Exiting.\n");
        exit(1);
    }
    // Program continues if not debugged
}
                    
                    
                        Bypassing Anti-Debug
                        With GDB:
                        Set breakpoint BEFORE ptrace call: `break main`
                        Step into ptrace: `ni`
                        Modify return value: `set $rax = 0` (pretend success)
                        Continue execution
                    
                
            

            
                
                    📌 Code Obfuscation - Hide Logic
                    ▼
                
                
                    Code obfuscation makes code hard to understand without changing functionality.
                    
                        Obfuscation Techniques
                        Control flow obfuscation: Reorder code, add fake branches
                        Dead code injection: Add irrelevant but realistic-looking code
                        String encryption: Encrypt literal strings, decrypt at runtime
                        Instruction transformation: Replace simple instruction with complex equivalent
                        Virtualization: Interpret custom bytecode instead of native instructions
                    
                    
                        Example: Control Flow Obfuscation
                        ORIGINAL:
if (x > 10)
    print("big")
else
    print("small")

OBFUSCATED:
a = random()
if (a == 1)
    if (x > 10) print("big")
else if (a == 2)
    if (x <= 10) print("small")
else if (a == 3) ...
Same logic, much harder to follow!
                    
                    
                        Defeating Obfuscation
                        Use decompilers (Ghidra, IDA) to reconstruct logic
                        Dynamic analysis to see actual behavior
                        Symbolic execution (angr) to explore paths
                    
                
            

            
                
                    📌 Packing & Compression - Hide Code
                    ▼
                
                
                    Packers compress/encrypt the entire binary. Only decompressed in memory at runtime.
                    
                        Popular Packers
                        UPX: Open-source, compresses binaries
                        Themida: Commercial, strong obfuscation + packing
                        Code Virtualizer: Turns native code into VM bytecode
                    
                    
                        UPX Example
                        # Pack a binary
upx -9 ./program -o program.packed

# Detect if packed
file ./program.packed
Output: packed with UPX

# Unpack (if UPX)
upx -d ./program.packed

# If custom packer, must unpack manually:
1. Run in GDB
2. Find OEP (Original Entry Point)
3. Dump memory region
4. Analyze dumped binary
                    
                
            

            
                
                    📌 Anti-Tampering - Detect Modifications
                    ▼
                
                
                    Anti-tampering detects if binary or memory has been modified.
                    
                        Detection Methods
                        Checksum verification: Calculate CRC/hash of sections
                        Signature verification: Check digital signature
                        Self-modifying code checks: Verify code hasn't changed
                        Import table verification: Check if functions are hooked
                    
                    
                        Bypassing Anti-Tampering
                        Patch the check itself (common: `je failure` → `nop nop`)
                        Modify the checksum value if you know it
                        Hook the check function with Frida to return false
                    
                
            

            ASLR - Address Space Layout Randomization
            ASLR randomizes memory addresses each run. Makes exploitation and analysis harder.

            
                Disable ASLR for analysis
                # Check ASLR status
cat /proc/sys/kernel/randomize_va_space
0 = disabled, 1 = conservative, 2 = full

# Disable ASLR (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

# Or run single binary without ASLR
setarch $(uname -m) -R ./program

# In GDB
(gdb) set disable-randomization on
            

            Stack Canaries
            Stack canaries detect buffer overflows by placing magic value before return address.

            
                How Stack Canaries Work
                1. Function prologue: Random canary value stored on stack
                2. If buffer overflow: Overwrites canary
                3. Before return: Check if canary still matches original
                4. If mismatch: Crash/exit immediately
            

            
                Check if binary has canaries
                checksec ./program
Output shows: Canary found = yes/no

readelf -x .note.gnu.property ./program
Look for 0x1 bit in CF_PROTECTION_BRANCH
            

            DEP/NX - Data Execution Prevention
            DEP/NX marks data pages as non-executable. Prevents shellcode execution.

            
                Check if binary has NX
                checksec ./program
Output shows: NX enabled/disabled

readelf -l ./program | grep GNU_STACK
RWX = no NX protection, RW = NX enabled
            

            
                ✓ Anti-Reversing Techniques Mastered!
                You understand how protections work and how to bypass them.


            angr - Automated Symbolic Execution
            
            
                angr is a powerful binary analysis framework that uses symbolic execution to find inputs that reach specific code paths. Instead of manually analyzing, angr explores all possible paths and solves constraints.
            

            What is Symbolic Execution?
            Instead of concrete values, variables are treated as symbolic — representing all possible values. Branches create constraints.

            
                Symbolic vs Normal Execution
                
                    NORMAL EXECUTION:
input = 5
if input > 10:
    print("big")
else:
    print("small")     ← This path taken

SYMBOLIC EXECUTION:
input = X (symbolic variable)
if input > 10:
    ← Explores this path (constraint: X > 10)
    print("big")
if input ≤ 10:
    ← Explores this path too (constraint: X ≤ 10)
    print("small")

Result: angr finds values satisfying each constraint!
                
            

            Installation & Setup

Install angr
pip install angr
pip install angr[all]  ; Install with optional dependencies
            

Basic angr Workflow

Simple angr Script - Crack Password
import angr

# Load binary
project = angr.Project("./crackme")

# Create symbolic variable for input (stdin)
initial_state = project.factory.entry_state(
    stdin=angr.SimFile(content_size=16)  ; 16-byte input
)

# Create simulation manager
simgr = project.factory.simgr(initial_state)

# Address of success message
success_addr = 0x401234
failure_addr = 0x401256

# Explore until we find success or hit failure
simgr.explore(
    find=success_addr,
    avoid=failure_addr
)

# Get the solution
if simgr.found:
    solution_state = simgr.found[0]
    solution = solution_state.posix.dumps(0)  ; 0 = stdin
    print(f"Password found: {solution.decode()}")
else:
    print("No solution found")
            

Key angr Concepts

📌 State - Program Snapshot ▼

State represents a point in program execution - registers, memory, constraints.

Working with States
state = project.factory.entry_state()

# Access registers
print(state.regs.rax)

# Read memory
data = state.memory.load(address, size)

# Symbolic variable
sym_input = angr.BVS('input', 64)  ; 64-bit symbolic input
                    

📌 SimulationManager - Explore States ▼

SimulationManager (simgr) manages multiple execution states simultaneously.

SimulationManager Usage
simgr = project.factory.simgr(initial_state)

# Explore automatically
simgr.explore(find=success_address)

# Manual stepping
simgr.step()

# Check state categories
print(simgr.active)      ; Active (continuing)
print(simgr.found)       ; Found target address
print(simgr.avoided)     ; Hit avoided address
print(simgr.deadended)   ; Dead ends (no more branches)
                    

📌 Constraint Solving with Z3 ▼

angr uses Z3 solver to solve constraints and find satisfying values.

Solve Constraints
# Get concrete values from symbolic state
solution = state.solver.eval(sym_variable)  ; Get one solution
all_solutions = state.solver.eval_all(sym_variable)  ; Get all possible
                    

Real-World Example - CTF Challenge

Complete angr Script - Solve CTF Crackme
import angr
import claripy

# Load the binary
binary_path = "./crackme"
project = angr.Project(binary_path, auto_load_libs=False)

# Create initial state (execution starts at main)
main_address = 0x401234  ; Address of main()
state = project.factory.blank_state(addr=main_address)

# Create symbolic argv[1] (16 bytes)
password = claripy.BVS('password', 128)  ; 16 bytes * 8 bits

# Simulate program with symbolic input in argv[1]
# (assumes binary reads argv[1] as password)

# Create simulation manager
simgr = project.factory.simgr(state)

# Explore - find "Correct!" message at 0x401300
; avoid "Incorrect!" at 0x401350
simgr.explore(find=0x401300, avoid=[0x401350])

# Check results
if simgr.found:
    solution_state = simgr.found[0]
    password_value = solution_state.solver.eval(password, cast_to=bytes)
    print(f"[+] Password found: {password_value}")
else:
    print("[-] No solution found")
    if simgr.avoided:
        print(f"[!] Hit avoided addresses: {simgr.avoided}")

Advanced Techniques

📌 Function Hooking - Speed Up Analysis ▼

Hook slow functions to avoid symbolic execution overhead.

Hooking Example
# Hook strlen to avoid complex simulation
def hook_strlen(state):
    s = state.memory.load(state.regs.rdi, 256)
    length = claripy.Length(s)
    state.regs.rax = length

project.hook(0x401000, hook_strlen)  ; Hook at function address
                    

📌 Taint Analysis - Track Data Flow ▼

Track how user input flows through program to find sensitive operations.

Taint Input
# Mark input as tainted
state.memory.taint(input_addr, input_size)

# Later: check if value is tainted
if state.memory.is_tainted(rax):
    print("RAX contains tainted data (user input)")
                    

When angr Excels vs Struggles

Best For	Struggles With
Finding password/key (simple comparison)	Complex floating-point math
Reaching specific code path	Cryptographic operations (very slow)
Constraint solving (small inputs)	Large state spaces (too many branches)
CTF challenges (designed for automation)	Real-world complex binaries

✓ angr Symbolic Execution Mastered!

You can now automate binary analysis and solve constraints to find inputs reaching target code paths.

🎮 Assembly Simulator & Practice

Learn assembly by writing and executing code in real-time. This interactive simulator lets you write assembly instructions, step through execution, and watch registers and memory change.

Assembly Code Editor

ASSEMBLY CODE

📊 REGISTERS

RAX	0x0
RBX	0x0
RCX	0x0
RDX	0x0
RSI	0x0
RDI	0x0
RIP	0x0

🚩 FLAGS

ZF (Zero)	0
CF (Carry)	0
SF (Sign)	0
OF (Overflow)	0

Execution Output

Practice Challenges

🎯 Challenge 1: Simple Addition ▼

Objective: Add 15 and 25, store result in RAX

🎯 Challenge 2: Conditional Logic ▼

Objective: If RAX equals 10, set RBX to 1, else set RBX to 0

🎯 Challenge 3: Loop Counter ▼

Objective: Count from 1 to 5 in RAX using a loop