Querying memory available resources

The amount of memory available in the system and the way it is organized determines oftentimes the way programs can and have to work. For functions like mmap it is necessary to know about the size of individual memory pages and knowing how much memory is available enables a program to select appropriate sizes for, say, caches. Before we get into these details a few words about memory subsystems in traditional Unix systems will be given.

Overview about traditional Unix memory handling

Unix systems normally provide processes virtual address spaces. This means that the addresses of the memory regions do not have to correspond directly to the addresses of the actual physical memory which stores the data. An extra level of indirection is introduced which translates virtual addresses into physical addresses. This is normally done by the hardware of the processor.

Using a virtual address space has several advantage. The most important is process isolation. The different processes running on the system cannot interfere directly with each other. No process can write into the address space of another process (except when shared memory is used but then it is wanted and controlled).

Another advantage of virtual memory is that the address space the processes see can actually be larger than the physical memory available. The physical memory can be extended by storage on an external media where the content of currently unused memory regions is stored. The address translation can then intercept accesses to these memory regions and make memory content available again by loading the data back into memory. This concept makes it necessary that programs which have to use lots of memory know the difference between available virtual address space and available physical memory. If the working set of virtual memory of all the processes is larger than the available physical memory the system will slow down dramatically due to constant swapping of memory content from the memory to the storage media and back. This is called "thrashing". A final aspect of virtual memory which is important and follows from what is said in the last paragraph is the granularity of the virtual address space handling. When we said that the virtual address handling stores memory content externally it cannot do this on a byte-by-byte basis. The administrative overhead does not allow this (leaving alone the processor hardware). Instead several thousand bytes are handled together and form a page. The size of each page is always a power of two byte. The smallest page size in use today is 4096, with 8192, 16384, and 65536 being other popular sizes.

How to get information about the memory subsystem?

The page size of the virtual memory the process sees is essential to know in several situations. Some programming interface (e.g., mmap, the section called “Memory-mapped I/O”) require the user to provide information adjusted to the page size. In the case of mmap is it necessary to provide a length argument which is a multiple of the page size. Another place where the knowledge about the page size is useful is in memory allocation. If one allocates pieces of memory in larger chunks which are then subdivided by the application code it is useful to adjust the size of the larger blocks to the page size. If the total memory requirement for the block is close (but not larger) to a multiple of the page size the kernel's memory handling can work more effectively since it only has to allocate memory pages which are fully used. (To do this optimization it is necessary to know a bit about the memory allocator which will require a bit of memory itself for each block and this overhead must not push the total size over the page size multiple.

The page size traditionally was a compile time constant. But recent development of processors changed this. Processors now support different page sizes and they can possibly even vary among different processes on the same system. Therefore the system should be queried at runtime about the current page size and no assumptions (except about it being a power of two) should be made.

The correct interface to query about the page size is sysconf (the section called “Definition of sysconf”) with the parameter _SC_PAGESIZE. There is a much older interface available, too.

int function>getpagesize/function> (void) The getpagesize function returns the page size of the process. This value is fixed for the runtime of the process but can vary in different runs of the application.

The function is declared in unistd.h.

Widely available on System V derived systems is a method to get information about the physical memory the system has. The call

  sysconf (_SC_PHYS_PAGES)

returns the total number of pages of physical the system has. This does not mean all this memory is available. This information can be found using

  sysconf (_SC_AVPHYS_PAGES)

These two values help to optimize applications. The value returned for _SC_AVPHYS_PAGES is the amount of memory the application can use without hindering any other process (given that no other process increases its memory usage). The value returned for _SC_PHYS_PAGES is more or less a hard limit for the working set. If all applications together constantly use more than that amount of memory the system is in trouble.

The GNU C library provides in addition to these already described way to get this information two functions. They are declared in the file sys/sysinfo.h. Programmers should prefer to use the sysconf method described above.

long int function>get_phys_pages/function> (void) The get_phys_pages function returns the total number of pages of physical the system has. To get the amount of memory this number has to be multiplied by the page size.

This function is a GNU extension.

long int function>get_avphys_pages/function> (void) The get_phys_pages function returns the number of available pages of physical the system has. To get the amount of memory this number has to be multiplied by the page size.

This function is a GNU extension.