Peeking inside call stacks

2024-03-18

Introduction

Lately, I’ve been spending my free time learning Rust in an attempt to [re-]learn a Systems Programming language. Having spent the last few years programming in interpreted languages like Python and JavaScript, I felt like it was time to get back to something close to the bare metals.

One important topic in Rust is the borrow checker and trying to understand it took me down a rabbit hole. It requires an understanding of dynamic memory allocation. I was left scratching my head after being unable to visualize how stacks vs heaps (not the data structures in this context) worked. I have authored this post documenting my research. Understanding what goes on stack vs heap is an important step towards understanding Rust’s borrow checker.

Note: While I mentioned Rust in the beginning, for this post, I have stuck to the C programming language.

Memory Layout

A traditional program loaded in memory will have these segments:

Code/text segment
Data segment
Heap
Stack

The first two segments will be outside of the scope of this post. However, the names should be explanatory and provide insights into what data is stored in them.

Stack

Stack is built upon the same idea as stack data structure i.e. last-in-first-out (LIFO). Each function corresponds to its own stack “frame” which stores local variables, parameters, and return address for the respective function call.

The memory is calculated and set aside during compile time. As an implication, we can only have data whose size is known at compile time to be allocated on the stack.

Heap

Heap or dynamic memory is the part of memory that can grow/shrink over the lifetime of the process. A program can execute system calls requesting the operating system to allocate or de-allocate a segment of memory to store data of variable sizes such as vectors.

This is the interesting piece where Rust’s ownership-based memory management shines over the manual memory management that we do in C/C++.

Diving in Deeper

What makes stacks different?

Stacks are very closely associated with functions - every function gets its frame in the call stack. A simple C program with 2 functions will help us understand this better:

#include<stdio.h>

int add(int a, int b, int c) {
    return a + b + c;
}

int main() {
    int a = 1;
    int b = 2;
    int c = 3;

    printf("%d", add(a, b, c));

    return 0;
}

Let’s compile this and use gdb to dig in deeper. Let’s also take a little help from Claude to understand what is going on with the compiled assembly.

Sidebar: I don’t know if AI will take over jobs or not but it sure does an exceptional job explaining a piece of code.

I have set a breakpoint on main and stepped through the code till the last return 0;.

(gdb) disassemble main
Dump of assembler code for function main:
   0x0000000000401142 <+0>:	push   rbp
   0x0000000000401143 <+1>:	mov    rbp,rsp
   0x0000000000401146 <+4>:	sub    rsp,0x10
   0x000000000040114a <+8>:	mov    DWORD PTR [rbp-0x4],0x1
   0x0000000000401151 <+15>:	mov    DWORD PTR [rbp-0x8],0x2
   0x0000000000401158 <+22>:	mov    DWORD PTR [rbp-0xc],0x3
   0x000000000040115f <+29>:	mov    edx,DWORD PTR [rbp-0xc]
   0x0000000000401162 <+32>:	mov    ecx,DWORD PTR [rbp-0x8]
   0x0000000000401165 <+35>:	mov    eax,DWORD PTR [rbp-0x4]
   0x0000000000401168 <+38>:	mov    esi,ecx
   0x000000000040116a <+40>:	mov    edi,eax
   0x000000000040116c <+42>:	call   0x401126 <add>
   0x0000000000401171 <+47>:	mov    esi,eax
   0x0000000000401173 <+49>:	mov    edi,0x402010
   0x0000000000401178 <+54>:	mov    eax,0x0
   0x000000000040117d <+59>:	call   0x401030 <printf@plt>
   0x0000000000401182 <+64>:	mov    eax,0x0
   0x0000000000401187 <+69>:	leave
   0x0000000000401188 <+70>:	ret
End of assembler dump.

(gdb) disassemble add
Dump of assembler code for function add:
   0x0000000000401126 <+0>:	push   rbp
   0x0000000000401127 <+1>:	mov    rbp,rsp
   0x000000000040112a <+4>:	mov    DWORD PTR [rbp-0x4],edi
   0x000000000040112d <+7>:	mov    DWORD PTR [rbp-0x8],esi
   0x0000000000401130 <+10>:	mov    DWORD PTR [rbp-0xc],edx
   0x0000000000401133 <+13>:	mov    edx,DWORD PTR [rbp-0x4]
   0x0000000000401136 <+16>:	mov    eax,DWORD PTR [rbp-0x8]
   0x0000000000401139 <+19>:	add    edx,eax
   0x000000000040113b <+21>:	mov    eax,DWORD PTR [rbp-0xc]
   0x000000000040113e <+24>:	add    eax,edx
   0x0000000000401140 <+26>:	pop    rbp
   0x0000000000401141 <+27>:	ret
End of assembler dump.

One thing in common with both add and main subroutines are the prologue and epilogue i.e. we set the stack and base pointer registers to execute the callee function or return to the caller function. From main (caller), we can see it pushes the arguments to assigned registers (based on the calling convention for the system) and then calls the add subroutine.

Inside add (callee), following the epilogue, the values are moved from the registers onto the stack frame with addressing relative to the base pointer. The values are then moved back to registers to perform ADD assembly operation. The stack frame is then destroyed and the value is returned to the caller for it to continue its execution.

The interesting thing to note here was for all the local variable declarations, the compiler added a manual instruction in our object file to reserve the memory i.e. by subtracting from the stack pointer and moving those values in the reserved location. It is only possible for the compiler to do it because it knows the size needed to store the values on the stack frame.

Contrary to this, if the size can only be known at runtime, it cannot reserve the memory during compile time. In such cases, it will need to know the size of pointer to such objects instead and store that in the stack frame while the actual item will be stored onto the heap by dynamically requesting memory from the OS (yes, malloc, looking at you).

In other words, it is the difference between sizeof(type) vs sizeof(type *). In our example, we used int which has a fixed size of 4 bytes.

(gdb) p $rsp
$5 = (void *) 0x7fffffffdd70
(gdb) p $rbp
$6 = (void *) 0x7fffffffdd80
(gdb) p a
$7 = 1
(gdb) p b
$8 = 2
(gdb) p c
$9 = 3
(gdb) p &a
$10 = (int *) 0x7fffffffdd7c
(gdb) p &b
$11 = (int *) 0x7fffffffdd78
(gdb) p &c
$12 = (int *) 0x7fffffffdd74
(gdb) p sizeof(int *)
$13 = 8
(gdb) p sizeof(int)
$14 = 4

In the above, we can see that the address of values a=1, b=2, and c=3 lies in the range of $rsp and $rbp and each of them is sizeof(int) = 4 bytes apart in memory.

Conclusion

In summary, digging deeper into the actual machine code was an enlightening experience. It also gave me the chance to appreciate the complexities handled by the OS and the wonder glibc abstracting away all the overwhelming details.

Low-level programming is a never-ending rabbit hole but for now, this answer is satisfactory enough for me to return to my caller aka Rust (pun intended).

References

[1] https://www.recurse.com/blog/7-understanding-c-by-learning-assembly
[2] https://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64
[3] https://jvns.ca/blog/2021/05/17/how-to-look-at-the-stack-in-gdb/
[4] https://www.youtube.com/watch?v=yOyaJXpAYZQ\