Memory intro for devs. What you need to know
Published:
As a software engineer, you want to build fast and efficient programs. But on the fact, when you develop the program, you can influence mostly only memory access. That is why O() notation computes the memory access and usage. And O() is widely used estimation for program complexity. In this article I want to cover basics about the memory, every developer should know.
What is memory and what types of memory exist?
Memory is where you store all your data. It can be a function, a variable and a value.
There are couple types of memory (from hardware perspective). We will consider the following: CPU cache, RAM and I/O (Hard drive, GPU, USB-stick, Network, etc). CPU cache is very fast, RAM is fast and I/O is slow. So try to write program that use fast memory.
How much memory is available?
Let’s consider the case, you have a laptop with x86_64 CPU and 16 Gb of RAM. How much memory is available for your program. The answer is a lot, way more than your RAM. How can it be? Your program sees virtual address (fake address), that is mapped by the OS to the physical memory (real address).
x86_64 CPU means that every memory address defined by 64 bits. Consider a simple code. Where we create a variable and print the address:
#include <iostream>
int main() {
int shack_variable = 0;
std::cout << &shack_variable << std::endl;
int * heap_variable = new int(0);
std::cout << heap_variable << std::endl;
return 0;
}
In my case the outcome is 0x7ffde4eb95ec
and 0x563b41a852c0
. You may wander why 64 bit CPU gives only 48 bit address. Even 48 bit address allow to address 256 terabytes of memory. That is why infinite loop with memory allocation well drop your system in seconds. (In reality, only 47 bits is used for memory address. See for more kernel documentation)
What we can say from the address? First we know that the address is virtual, user space and stack. Lets clarify all these words.
Kernel space vs user space memory
OS runs not for free. And the OS needs some memory. The memory that is used by the OS called kernel space memory. The main take away, you, as a developer, will not have access to the kernel space. And this is great! Because it is very easy to do something wrong.
All programs that we develop are run in user space (only if you ot write kernel module). Amount of memory consumed by the OS may vary depends on the OS, version. But it should be around 1 Gb.
Stack vs heap and maybe something else?
So we know already that the program sees a virtual address (fake address). The reason behind this abstraction is to allow multiple programs run at the same time and do not override each other memory. How the program structured:
- Each program can have multiple processes
- Each process may have multiple threads.
Process is an abstraction level. Each process has owm address space. It means 2 processes may have the virtual address 0x0 that will point to absolutely different physical memory. So they can run safe in parallel.
How the process memory looks like:
Stack
|
v
--------
Memory Mapped Region
(Shared Libraries or anything else.
Example: /lib/libc.so)
--------
^
|
Heap
--------
Uninitialised Data (.bss)
--------
Initialised Data (.data)
--------
Program Text (.text)
0
Want to see it in real - run cat /proc/$PID/maps
to any running process.
A simple process:8
// main.cpp
#include <stdio.h>
int main(int argc, char **argv) {
printf("Hello world\n");
getchar(); // need to pause the run
return 0;
}
Compile it with: g++ -o main main.cpp
. And we can check any running process with pmap -X \
pidof main``. The output would very similar to the previous, but obviously way smaller.
117419: ./main
Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous LazyFree ShmemPmdMapped FilePmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked THPeligible Mapping
5626d4805000 r--p 00000000 103:02 57410083 4 4 4 4 0 0 0 0 0 0 0 0 0 0 main
5626d4806000 r-xp 00001000 103:02 57410083 4 4 4 4 0 0 0 0 0 0 0 0 0 0 main
5626d4807000 r--p 00002000 103:02 57410083 4 4 4 4 0 0 0 0 0 0 0 0 0 0 main
5626d4808000 r--p 00002000 103:02 57410083 4 4 4 4 4 0 0 0 0 0 0 0 0 0 main
5626d4809000 rw-p 00003000 103:02 57410083 4 4 4 4 4 0 0 0 0 0 0 0 0 0 main
5626d6427000 rw-p 00000000 00:00 0 132 4 4 4 4 0 0 0 0 0 0 0 0 0 [heap]
7f9489b9e000 r--p 00000000 103:02 7085308 148 140 0 140 0 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489bc3000 r-xp 00025000 103:02 7085308 1504 756 6 756 0 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489d3b000 r--p 0019d000 103:02 7085308 296 64 0 64 0 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489d85000 ---p 001e7000 103:02 7085308 4 0 0 0 0 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489d86000 r--p 001e7000 103:02 7085308 12 12 12 12 12 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489d89000 rw-p 001ea000 103:02 7085308 12 12 12 12 12 0 0 0 0 0 0 0 0 0 libc-2.31.so
7f9489d8c000 rw-p 00000000 00:00 0 24 24 24 24 24 0 0 0 0 0 0 0 0 0
7f9489da7000 r--p 00000000 103:02 7085092 4 4 0 4 0 0 0 0 0 0 0 0 0 0 ld-2.31.so
7f9489da8000 r-xp 00001000 103:02 7085092 140 140 0 140 0 0 0 0 0 0 0 0 0 0 ld-2.31.so
7f9489dcb000 r--p 00024000 103:02 7085092 32 32 0 32 0 0 0 0 0 0 0 0 0 0 ld-2.31.so
7f9489dd4000 r--p 0002c000 103:02 7085092 4 4 4 4 4 0 0 0 0 0 0 0 0 0 ld-2.31.so
7f9489dd5000 rw-p 0002d000 103:02 7085092 4 4 4 4 4 0 0 0 0 0 0 0 0 0 ld-2.31.so
7f9489dd6000 rw-p 00000000 00:00 0 4 4 4 4 4 0 0 0 0 0 0 0 0 0
7ffc60180000 rw-p 00000000 00:00 0 132 16 16 16 16 0 0 0 0 0 0 0 0 0 [stack]
7ffc601be000 r--p 00000000 00:00 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 [vvar]
7ffc601c2000 r-xp 00000000 00:00 0 8 4 0 4 0 0 0 0 0 0 0 0 0 0 [vdso]
ffffffffff600000 --xp 00000000 00:00 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 [vsyscall]
==== ==== === ========== ========= ======== ============== ============= ============== =============== ==== ======= ====== ===========
2500 1240 106 1240 88 0 0 0 0 0 0 0 0 0 KB
Another rule of thumb: stack allocated at compilation tie. Heap at run time. So want to have a fast program - use stack. But another rule: stack is limited. On Linux/x86-32, the default stack size for a new process is 2 megabytes. To see your stack limit, run: ulimit -s
. On my laptop it is 8 Mb.
Address, Offset, Device, Inode
So he know, everything has an address. Something like 0xdeadbeef
. Address is mapped by the Os to the physical address. What is offset of the address?
First we need to understand the address of what we have? Address of the variable has 2 values encoded in the address: page address and the page offset. From the OS perspective, memory is divided in pages. Page is just an abstraction of the memory.Very common page size is 4096 bits. To encode 4096 we need 3 bytes. So if we have the address of the variable:
0xdeadbeef001 -> variable address
deadbeef -> page number
001 -> page offset
Page address will be transferred to the OS for virtual address mapping to the physical address. Offset should stay as it is.
Device if the region was mapped from a file, this is major and minor device number in hex where the file lives, the major number points to a device driver, and the minor number is interpreted by the device driver, or the minor number is the specific device for a device driver, like multiple floppy drives.
Inode - if the region was mapped from a file, this is the file number.
What to take from this article
- CPU cache is vary fast, RAM is fast, IO is slow
- Stack allocated at compilation time, heap at runtime. So to run fast, use stack. But stack is limited. HEAP limited by amount of RAM.
- OS does a ot to make sure, your program is executed and not conflicting with other programs
- What you see in your program is a virtual address (fake address)
Used resources:
- https://gist.github.com/CMCDragonkai/10ab53654b2aa6ce55c11cfc5b2432a4
- https://simonis.github.io/Memory/
- Modern operating systems, Chapter 3
- https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt
- https://www.kernel.org/doc/html/v5.8/arm64/memory.html