Peeking inside call stacks
Introduction
Lately, I’ve been spending my free time learning Rust in an attempt to [re-]learn a Systems Programming language. Having spent the last few years programming in interpreted languages like Python and JavaScript, I felt like it was time to get back to something close to the bare metals.
One important topic in Rust is the borrow checker and trying to understand it took me down a rabbit hole. It requires an understanding of dynamic memory allocation. I was left scratching my head after being unable to visualize how stacks vs heaps (not the data structures in this context) worked. I have authored this post documenting my research. Understanding what goes on stack vs heap is an important step towards understanding Rust’s borrow checker.
Note: While I mentioned Rust in the beginning, for this post, I have stuck to the C programming language.
Memory Layout
A traditional program loaded in memory will have these segments:
- Code/text segment
- Data segment
- Heap
- Stack
The first two segments will be outside of the scope of this post. However, the names should be explanatory and provide insights into what data is stored in them.
Stack
Stack is built upon the same idea as stack data structure i.e. last-in-first-out (LIFO). Each function corresponds to its own stack “frame” which stores local variables, parameters, and return address for the respective function call.
The memory is calculated and set aside during compile time. As an implication, we can only have data whose size is known at compile time to be allocated on the stack.
Heap
Heap or dynamic memory is the part of memory that can grow/shrink over the lifetime of the process. A program can execute system calls requesting the operating system to allocate or de-allocate a segment of memory to store data of variable sizes such as vectors.
This is the interesting piece where Rust’s ownership-based memory management shines over the manual memory management that we do in C/C++.
Diving in Deeper
What makes stacks different?
Stacks are very closely associated with functions - every function gets its frame in the call stack. A simple C program with 2 functions will help us understand this better:
#include<stdio.h>
int add(int a, int b, int c) {
return a + b + c;
}
int main() {
int a = 1;
int b = 2;
int c = 3;
printf("%d", add(a, b, c));
return 0;
}
Let’s compile this and use gdb
to dig in deeper. Let’s also take a little help from Claude to understand what is going on with the compiled assembly.
Sidebar: I don’t know if AI will take over jobs or not but it sure does an exceptional job explaining a piece of code.
I have set a breakpoint on main
and stepped through the code till the last return 0;
.
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000401142 <+0>: push rbp
0x0000000000401143 <+1>: mov rbp,rsp
0x0000000000401146 <+4>: sub rsp,0x10
0x000000000040114a <+8>: mov DWORD PTR [rbp-0x4],0x1
0x0000000000401151 <+15>: mov DWORD PTR [rbp-0x8],0x2
0x0000000000401158 <+22>: mov DWORD PTR [rbp-0xc],0x3
0x000000000040115f <+29>: mov edx,DWORD PTR [rbp-0xc]
0x0000000000401162 <+32>: mov ecx,DWORD PTR [rbp-0x8]
0x0000000000401165 <+35>: mov eax,DWORD PTR [rbp-0x4]
0x0000000000401168 <+38>: mov esi,ecx
0x000000000040116a <+40>: mov edi,eax
0x000000000040116c <+42>: call 0x401126 <add>
0x0000000000401171 <+47>: mov esi,eax
0x0000000000401173 <+49>: mov edi,0x402010
0x0000000000401178 <+54>: mov eax,0x0
0x000000000040117d <+59>: call 0x401030 <printf@plt>
0x0000000000401182 <+64>: mov eax,0x0
0x0000000000401187 <+69>: leave
0x0000000000401188 <+70>: ret
End of assembler dump.
(gdb) disassemble add
Dump of assembler code for function add:
0x0000000000401126 <+0>: push rbp
0x0000000000401127 <+1>: mov rbp,rsp
0x000000000040112a <+4>: mov DWORD PTR [rbp-0x4],edi
0x000000000040112d <+7>: mov DWORD PTR [rbp-0x8],esi
0x0000000000401130 <+10>: mov DWORD PTR [rbp-0xc],edx
0x0000000000401133 <+13>: mov edx,DWORD PTR [rbp-0x4]
0x0000000000401136 <+16>: mov eax,DWORD PTR [rbp-0x8]
0x0000000000401139 <+19>: add edx,eax
0x000000000040113b <+21>: mov eax,DWORD PTR [rbp-0xc]
0x000000000040113e <+24>: add eax,edx
0x0000000000401140 <+26>: pop rbp
0x0000000000401141 <+27>: ret
End of assembler dump.
One thing in common with both add
and main
subroutines are the prologue and epilogue i.e. we set the stack and base pointer registers to execute the callee function or return to the caller function. From main
(caller), we can see it pushes the arguments to assigned registers (based on the calling convention for the system) and then calls the add
subroutine.
Inside add
(callee), following the epilogue, the values are moved from the registers onto the stack frame with addressing relative to the base pointer. The values are then moved back to registers to perform ADD
assembly operation. The stack frame is then destroyed and the value is returned to the caller for it to continue its execution.
The interesting thing to note here was for all the local variable declarations, the compiler added a manual instruction in our object file to reserve the memory i.e. by subtracting from the stack pointer and moving those values in the reserved location. It is only possible for the compiler to do it because it knows the size needed to store the values on the stack frame.
Contrary to this, if the size can only be known at runtime, it cannot reserve the memory during compile time. In such cases, it will need to know the size of pointer to such objects instead and store that in the stack frame while the actual item will be stored onto the heap by dynamically requesting memory from the OS (yes, malloc
, looking at you).
In other words, it is the difference between sizeof(type)
vs sizeof(type *)
. In our example, we used int
which has a fixed size of 4 bytes.
(gdb) p $rsp
$5 = (void *) 0x7fffffffdd70
(gdb) p $rbp
$6 = (void *) 0x7fffffffdd80
(gdb) p a
$7 = 1
(gdb) p b
$8 = 2
(gdb) p c
$9 = 3
(gdb) p &a
$10 = (int *) 0x7fffffffdd7c
(gdb) p &b
$11 = (int *) 0x7fffffffdd78
(gdb) p &c
$12 = (int *) 0x7fffffffdd74
(gdb) p sizeof(int *)
$13 = 8
(gdb) p sizeof(int)
$14 = 4
In the above, we can see that the address of values a=1
, b=2
, and c=3
lies in the range of $rsp
and $rbp
and each of them is sizeof(int) = 4
bytes apart in memory.
Conclusion
In summary, digging deeper into the actual machine code was an enlightening experience. It also gave me the chance to appreciate the complexities handled by the OS and the wonder glibc abstracting away all the overwhelming details.
Low-level programming is a never-ending rabbit hole but for now, this answer is satisfactory enough for me to return to my caller aka Rust (pun intended).
References
[1] https://www.recurse.com/blog/7-understanding-c-by-learning-assembly
[2] https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
[3] https://jvns.ca/blog/2021/05/17/how-to-look-at-the-stack-in-gdb/
[4] https://www.youtube.com/watch?v=yOyaJXpAYZQ\