Tuesday, January 26, 2010

Memory management (part 0)

As promised I will get into the details of some of the base module components. I'll start with one of the most important ones: memory management.
Memory is (at least on current consoles) a very limited resource and has to be managed efficiently. The main properties of a block of Memory are:

  • the size of the block
  • the alignment of the first memory element
  • the access granularity that can be used to access the memory
  • the access speed (both throughput and latency) in both directions
  • the presence of a address translation unit that maps virtual addresses to physical addresses
  • the connection to other processors in the computer (speed, restrictions)

In nearly all consoles (and pcs) you have multiple distinct blocks of memory that have different properties. One common scheme for memory architectures is to have local graphics memory "near" the graphics processor.
Another option is to have a (usually small) "scratchpad" memory block that has lower latency and/or higher bandwidth to one of the processors and that can be used for certain algorithms.

Due to the growing discrepancy between the processing speed of modern processors and the speed of memory nearly every architecture has some form of cache that is a copy of some data in a smaller and faster block of memory that will be kept in sync with the original location in the 'main' memory. Important attributes of caches are:

  • all of the above (because it's just another block of memory)
  • the associativity of the cache. (wikipedia)
  • the presence of special cache-control instructions in the processor

The last point is pretty important for explicit control of the cache memory. Most of the time the cache works transparently to the programmer, but sometimes you want to control the cache behavior explicitly to gain a bit more performance. Examples for that are:

  • If you know exactly that you don't need the data that you are about to write anytime soon you can prevent 'cache-pollution' (wikipedia) by writing around the cache directly into main memory. This is normally done through some king of write-combiner (wikipedia) hardware and/or writing in cache-line granularity.
  • If your memory access pattern is known it advance you can give the cache a hint what you will access next. This 'prefetching' will on most platforms lead to one (or more) cache lines to be read into the cache while the next instructions are executed. It depends on the platform what operations are allowed while a cache line is read in the background.
  • on some platforms caches can be used to synchronize multiple processors. This commonly needs special operations to work.

On current hardware you should generally be memory bandwidth limited in your processing (in games at least) and therefore any kind of compression can be very beneficial.

to be continued...

posted while being intoxicated with Long Island Ice Tea.