Edited to reflect comments from Sebastian.
Let's revisit an old school demo maker classic and write a Mandelbrot fractals generator in LLVM assembly.
But first, what's LLVM?
The Low Level Virtual Machine (LLVM) is a compiler tool chain designed for compile-time, link-time & run-time optimization of programs written in arbitrary programming languages. Everything in LLVM compiles down to an efficient target independent specialized bitcode where each instruction is in static single assignment form.
Fortunately, we don't have to read binary bitcode (at least not right now ...) as LLVM defines its own assembly language that can be translated into bitcode & vise versa.
Bitcode can then be either compiled down to machine code on a variety of platforms or distributed as-is and JIT (Just in Time) compiled at runtime.
Now what about Mandelbrot fractals?
Here's the wikipedia definition:
In mathematics the Mandelbrot set, named after Benoît Mandelbrot, is a set of points in the complex plane, the boundary of which forms a fractal. Mathematically the Mandelbrot set can be defined as the set of complex values of c for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. That is, a complex number, c, is in the Mandelbrot set if, when starting with z0 = 0 and applying the iteration repeatedly, the absolute value of zn never exceeds a certain number (that number depends on c) however large n gets.
More details on the Mandelbrot set.
Well that didn't help a lot, did it?
A picture is worth a thousands words, here's a fractal generated with this tutorial code:
Hard do believe, this comes from that complex definition!
Here's the pseudo code to get an idea of the assembly knowledge required:
So, what do we need?
- Arithmetics & type conversions
- A big pixel array to store image data
- A way to export an image in memory into a readable image format
The complete code is on available on github.
We need a few constants to play with our fractals, here's the definition section:
Constants are defined like this:
@NAME = internal constant
@NAME is actually a pointer to the constant and not the constant itself, it took me some time to get used to it. It means that you need to load it into a register before the constant value can be used!
%w = load i32* @W
Now %w is a register holding @W constant value that can be used in arithmetics operations.
You can play with xc, yc & zoom to render different images, be careful as little changes can have very large repercussions on the fractal.
Arithmetics & type conversions
First thing to know is that LLVM integers and floats are signless, which is a big difference compared to C for example. Instructions on the other hand are signed, for example signed integer division and unsigned integer division are distinct operations,
Integers can be of any size, i1, i2, i3, ... i8, ... i16, ... i32, ... i64, ..., floats are different and fixed in size, 32 bits, doubles are 64 bits and there's also a fp128 type but we won't be needing that one today.
Here's a list of common Arithmetics operations:
There are specific operations for converting types, ie floats <-> integers or integers <-> integers of different size.
Operations happen on registers, not on constants or pointers so values must be first loaded & then stored like this:
%xc = load double* @XC %xc_div_2 = fdiv double %xc, 2 store double %xc_div_2, double * %some.address
This code loads XC into a register %xc, then divides it by two into %xc_div_2 and proceeds to store it at some address in memory.
Here's what a typical loop looks like:
Why not simply an i++ somewhere? Well LLVM assembly uses SSA form, this means that there are an infinite number of registers but they cannot be modified. This is very useful as it creates a straightforward AST, [Abstract Syntax Tree][AST] that LLVM optimization passes can easily work with. On the other hand this adds complexity to the code and requires much more instructions for simple things like just increment a loop counter.
Well this actually is not a concern for auto generated LLVM assembly, only when coding assembly code by hand so that's a minor gripe.
What's harder to follow is the need for the initial
br label %loop_over_t, if you remove it, the code will not fall back to the beginning of the loop and will actually not compile.
You'll get an obscure message:
llvm-as: mandelbrot.ll:72:1: error: expected instruction opcode loop_over_t: ^ make: *** [mandelbrot.bc] Error 1
The answer is coming from the fact that LLVM sees a function as a succession of blocks:
Every basic block in a program ends with a "Terminator" instruction, which indicates which block should be executed after the current block is finished. These terminator instructions typically yield a 'void' value: they produce control flow, not values (the one exception being the 'invoke' instruction).
There are seven different terminator instructions: the 'ret' instruction, the 'br' instruction, the 'switch' instruction, the ''indirectbr' Instruction, the 'invoke' instruction, the 'unwind' instruction, and the 'unreachable' instruction.
In this case, the beginning of the function definition is an implicit block and
loop_over_t: marks the beginning of a new block. That new block can't start before the previous is closed implicitly and to do that a "Terminator" instruction is required. This is where
br label %loop_over_t comes into play, it closes the current block and jumps to the
%loop_over_tblock. It doesn't matter if the block is just below or somewhere further in the code, you need to close a block before entering a new one.
Storing pixel colors
LLVM ships with a built-in
array type so we're going to make good use of it, here's the in memory image definition:
@IMG = global [2048 x [2048 x i8]] zeroinitializer
This defines a flat array of 2048 * 2048 bytes (i8 = 8 bits, equivalent to C
char) that will store a 8 bits pixel. I decided to handle colors when exporting the image only which gives more flexibility to play with palettes and mapping between this mandelbrot iteration number and final colors.
An easy mistake is to think that this is equivalent to a char ** pointer in C:
No! The equivalent to our LLVM definition in C would be a
char IMG[2048*2048] or char IMG
This is very very important and easily leads to segfault or worse, silent memory corruption.
Indexing this array is done by using the built-in
getelementptr function like this:
%pixel.a = getelementptr [2048 x [2048 x i8]]* @IMG, i32 0, i32 %i, i32 %j
Here's this function definition:
Overview: The 'getelementptr' instruction is used to get the address of a subelement of an aggregate data structure. It performs address calculation only and does not access memory.
Arguments: The first argument is always a pointer, and forms the basis of the calculation. The remaining arguments are indices that indicate which of the elements of the aggregate object are indexed. The interpretation of each index is dependent on the type being indexed into. The first index always indexes the pointer value given as the first argument, the second index indexes a value of the type pointed to (not necessarily the value directly pointed to, since the first index can be non-zero), etc. The first type indexed into must be a pointer value, subsequent types can be arrays, vectors, and structs. Note that subsequent types being indexed into can never be pointers, since that would require loading the pointer before continuing calculation.
The first argument to getelementptr is a pointer with type information, remember than constant are always pointers. Then the second element is to be considered an index to an array of pointers starting at @IMG, 0 in our case as we only have one IMG array. Then the following two arguments are our x and y coordonates, effectively returning a pixel address in %pixel.a.
We can then set a pixel value using the simple
store i8 %color, i8* %pixel.a
Exporting to bmp
Bmp is a pretty straight format image format, it does not compress well but it's supported on all platforms. Here I am calling some utility C code that's compiled with clang. It's a great example of how powerful LLVM is, I am using standard C code that can be fed to clang and generate bitcode and benefit from all optimization passes, in particular inter procedural techniques, across any language that can target LLVM.
Note that the bitcode can be interpreted or JIT compiled even though it started as C code.
Here's how to declare an external C function:
declare void @write_bmp(i8* %img, i32 %width, i32 %height) nounwind
write_bmp takes an i8 (unsigned char) array with width & height as input and writes image.bmp in the current working directory.
Here's the utils.c glue code:
See how easily native LLVM types blend in with C, it's one of the reasons why LLVM is such a good platform for both scripting & high performance projects.
Putting it all together
How it felt?
I've done a lot of x86 assembly in the pentium mmx era and surprisingly LLVM assembly is stricter on syntax (see the block start/stop issue for loops) but more high level at the same time with built-in arrays and complex types.
SSA took some getting used to. I understood the basic idea before but had never worked with it directly. The surprising thing is that you don’t get mutable registers, which means that to do something as simple as “i++” in a for loop is a lot of code, load/inc/store repeat.
Running the example
You will need LLVM of course and clang too. I've been using LLVM 2.6 but it should work on older versions since I am not using any special instruction here.
Well this is a pretty naive implementation although it actually performs OK, next step is to put it through llvm optimization passes, profile it and study the results. If you look at the Makefile, there's a mandelbrot_opt binary and an auto generated mandelbrot_opt.ll, take a peek if you're curious and stay tuned for part II, SSA & Optimization.
Source code on Github.