The rising ubuquity of 64-bit architectures makes practical many possibilites for systems-level software design that were previously in the realm of theory. For example, the possibility of a single address space memory management scheme is a real possibility for kernel developers.
Both Windows and POSIX systems offer ways for user-mode processes to manipulate their address space in a low-level. In this post I'll focus on Linux, but as I understand, everything holds on Windows as well.
There's a trend in video games, brought to attention by Minecraft, of infinite, procedually generated worlds for the player to explore. Minecraft does not manage its own address space, opting to serialize and unserialize data from files rather than mapping a save file into memory. There are attractors and detractors for both of these options, but memory mapping a file is the more conceptually elegant.
Minecraft represents the world as a 2D array of 16x16x256 block "chunks". On disk, the chunks are stored in "region files" that contain a 32x32 chunk (512x512x256 block) section of the world. It only keeps in memory the chunks that are near a player, because keeping the entire world in memory is infeasible. I this post, however, I will show how it could be possible on 64-bit systems.
Memory mapping a file, for those who don't deal with system programming, is a process of assigning a file, or part of a file, to act as if it was stored in memory rather than wherever it actually is (probably on a disk). The program can read from this block of memory and will get back the contents of the file, or write to the memory to write to the file. A pseudocode example:
// contents of games.csv: // The Binding of Isaac: Rebirth,Nicalis,2014 // Minecraft,Mojang,2010 // Portal,Valve,2007 handle f = open_file("games.csv"); char *pointer = memory_map(handle); for (int i = 0; i < 20; i++) putc(pointer[i]); // => The Binding of Isaac memcpy(&pointer, "Edmund!", 7); // contents of games.csv: // The Binding of Isaac: Rebirth,Edmund!,2014 // Minecraft,Mojang,2010 // Portal,Valve,2007
The way memory mapping is implemented can vary. Some older machines, like the TI-83 and 84 series graphing calculators, implemented it in hardware. The memory management unit (MMU), a component of the CPU's memory pipeline, had a direct connection to the flash memory, and could physically connect pages of flash ROM to memory addresses.
Modern systems, however, implement file-mapped memory in software. The syscall that establishes the mapping creates a record somewhere in the kernel, but doesn't affect the virtual address table quite yet. When the program accesses the mapped region, the MMU complains that the page is not mapped to any physical medium, and triggers an interrupt in the kernel, and gives it the address that caused the fault. Normally this is where the kernel would terminate the process with a segmentation fault, but before doing so it checks for a few things, like if it's part of a memory mapped file. If it is, then the kernel copies the contents of that page from the mapped file to a physical page of RAM, maps the page into the virtual address table, and resumes the process.
Whenever the kernel deems necessary, it can also synchronize the contents of that RAM page back to the file, possibly repuposing the RAM for some other use.
Those familiar with virtual memory will notice that this is essentially swapping to disk, except with a swap file specified by the process.
In real life, the function used to memory map a file is called
mmap, and it has the following signature:
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off). That's obviously a lot more parameters than just a file handle.
int fildes is what specifies the file to use, via the numeric descriptor of the file in the table the kernel keeps of files opened by the process.
size_t len and
off_t off specify the range of data in the file to map into memory (the first len bytes starting off bytes from the start of the file). These parameters should be multiples of the size of a page of memory (usually 4KiB).
int prot specifies the protections on the memory mapped by mmap, allowing the program to specify whether the memory should be readable, writable, and/or executable.
int flags specifies how the memory is synchronized with the backing file, where the flag
MAP_SHARED specifies the behavior demonstrated in the example above.
flags parameter also controls how the
void *addr parameter in interpreted. If the flag
MAP_FIXED is unset,
mmap will map the memory wherever it feels, presumably making an attempt to keep the start of the block near
addr (for the sake of cache optimization). If
MAP_FIXED is set,
mmap will map the memory so that it starts exactly at
addr. However, if
addr is not a multiple of the page size, or if allocating the block a that position would collide with previously allocated address space,
mmap will fail.
In a 32-bit system with only 4 GiB of address space, which has to contain some memory reserved for the kernel, blocks of memory for the program code and data block, as well as the code and data blocks for all shared libraries, a stack for each thread, and any memory
malloc allocates, address space is precious. It would be impractical to designate a sizable chunk of address space to always allow fixed mappings in.
But there's a huge advantage to fixed mapping. If a program can guarantee that it can always map a file to the same address space, then it can store pointers in that file. One could write a program that allocates all of its memory within a
mmap'd file, and then can resume exactly where it left off the next time it's run.
In a 64-bit system, where processes have 16 EiB (17179869184 GiB) of address space to work with, making a guarantee that
MAP_FIXED will always succeed within a certain address range is trivial.
However the Linux kernel makes no such guarantees, in the name of security. The kernel tries to randomize the locations that things are mapped into address space in order to mitigate certain kinds of attacks. If an attacker can take advantage of a buffer overrun to modify the return address on the call-stack, they can convince the processor to jump to any location in memory, such as libc's
system function which executes a shell command. This attack, however, hinges upon the attacker knowing where the
system function is in memory.
One approach they could take would be to run their own program on the same system (presumably as a user with fewer privileges than the target process), and use
dlsym to find the location of the
system function in memory. This obviously only works if libc is loaded into the same address space in both processes. On older systems, this is not as unrealistic as it sounds, but fortunately this approach won't work on modern OSs.
However, brute forcing the location of libc is quite possible in some circumstances. Address space layout randomization is an attempt to make brute forcing unfeasible. By randomly dispersing the stack, heap, program, and libraries in address space, the chance of any given attempt in a brute-force attack succeeding is greatly reduced. On 32-bit systems, the address space is too small for sufficient entropy and in best-case conditions (for the attacker) they can be brute forced in a matter of minutes. With the larger address space of 64-bit systems, the best-case expected time for brute forcing measures in the days, more than enough time for a sysadmin to notice the anomalous behavior.
The kernel takes full advantage of the large address space to create entropy, so there are no unused address ranges to be reserved for fixed mappings.
I'd like to argue that OSs should reserve a huge, constant range of addresses for programs to manage as they see fit. Specifically I want the entire upper half of the address space, all 8 exbibytes of it. This would mean that address space layout randomization loses one bit of entropy, halving the time of brute force attacks. But even in the best conditions, these attacks take days — is 2 days instead of 4 really any more of a vulnerability?
But having this much reserved address space would be incredibly useful for games like Minecraft. The game could map a file into the entire reserved address space. This wouldn't require having 8 EiB of disk space because files created with
mmap are sparse, and disk space is only allocated for those sections that are actually modified. This address space could support a minecraft world that is at least 226x226x256 blocks, which is infinite from the player's perspective.
This implementation would offload several behaviors from the game to the operating system, making for a simpler implementation. Serializing and unserializing would be eliminated, persisting and deallocating inactive chunks would be handled by the same highly tuned system that implements swap memory, multiple save files would be replaced by a single sparse file. Indirect references within the save file would be actual pointers rather than array offsets, increasing performance.
It's worth mentioning that these benefits aren't limited to just a certain genre of video game. Databases are an example of a "serious" application that would benefit from reserved address ranges. Mongodb uses
mmap to improve file IO performance, but if it could map the entire database into memory, references between records could be implemented pointers rather than IDs, reducing the number of calculations involved in queries and increasing performance.
As I mentioned in the beginning, I am not an expert in this area. Please comment if something is in need of correction.