## Fixing old bugs, without the source

Once upon a time, I made a special kind of demoscene production: a wedtro. Which is a kind of small demo, made as a present to some other member of the demoscene, who is getting married. This wedtro, turned out to be the buggiest piece of shit I’ve ever released, and it’s been bugging me for the past decade. Until today I decided to fix it.

I revisited recently a dormant project of mine, for which I unfortunately need to write a 3dsmax exporter plugin.

Now, I’m always pissed off from the start when I have to write code on windows and visual studio, but having to deal with 3dsmax on top of that, really just adds insult to injury. It’s not just that maxsdk is a convoluted mess. Or that it needs a very specific version of visual studio to write plugins for it (which is really Microsoft’s fault, to be fair). No, my biggest issue so far is that 3dsmax takes about 3 years to start up, and there is no way to unload, or reload a plugin without restarting it.

Whenever I fix a tiny thing in the exporter plugin I’m writting, and I want to try it out and see if it does the buissiness, I have to shut down 3dsmax, start it up again (which takes forever), load my test scene, then try to export again and see what happens. This is obviously unacceptable, so I really had to do something about it.

## Generating multiple sample positions per pixel

Anti-aliasing in ray tracing, requires casting multiple rays per pixel, to sample the whole solid angle subtended by the imaginary surface of each pixel, if we consider it to be a small rectangular part of the view plane (see diagram).

It’s obvious that to be able to generate multiple primary rays for each pixel, we need to have an algorithm that given the sample number, calculates a sample position within the area of the a pixel. Since it’s trivial to map points in the unit square, onto the actual area of an arbitrary pixel, it makes sense to write this sample position generation function, so that it calculates points in the unit square.

The easiest way to write such a function would be to generate random points in the unit square, like this:

void get_sample_pos(float *pos)
{
pos[0] = (float)rand() / (float)RAND_MAX - 0.5;
pos[1] = (float)rand() / (float)RAND_MAX - 0.5;
}

The problem with this approach is that, even if our random number generator really has a perfectly uniform distribution, any finite number of sample positions generated, especially if that number is in the low tens, will probably not cover the area of the pixel in anything resembling a uniform sampling. Clusters are very likely to occur, leaving large areas of the pixel space unexplored by our rays.
As the number of samples gets larger and larger, this problem is somewhat mitigated, but especially if we're not writting a path tracer, we're usually dealing with anything between 4 to 20 rays per pixel, no more.

The following animation shows random sample positions generated by the code above. Even at about 40 samples, the left part of the pixel is inadequately sampled.

Another approach is to avoid randomness. The following function gets the sample number as input, and calculates its position by recursively subdividing the pixel area, taking care to spread the samples of each recursion level around as much as possible instead of focusing on one quadrant at a time.

void get_sample_pos(int sidx, float *pos)
{
pos[0] = pos[1] = 0.0f;
get_sample_pos_rec(sidx, 1.0f, 1.0f, pos);
}

static void get_sample_pos_rec(int sidx, float xsz, float ysz, float *pos)
{
static const float subpt[4][2] = {
{-0.25, -0.25}, {0.25, -0.25}, {-0.25, 0.25}, {0.25, 0.25}
};

/* base case: sample 0 is always in the middle, do nothing */
if(!sidx) return;

/* determine which quadrant to recurse into */
quadrant = ((sidx - 1) % 4);

get_sample_pos_rec((sidx - 1) / 4, xsz / 2, ysz / 2, pos);
}

And here's the animation showing that code in action (colors denote the recursion depth):

This sampling is perfectly uniform, but it's still not ideal. The problem is that whenever we're sampling in a regular grid, no matter how fine that grid is, we will introduce aliasing. By breaking up each pixel into multiple subpixels like this we effectively increase the cutoff frequency after which aliasing occurs, but we do not eliminate it.

The best solution is to combine both techniques. We need randomness to convert aliasing into noise, which is much less perceptible by human brains trained by evolution to recognize patterns. But we also need uniform sampling to properly explore the whole area of each pixel.

So, we'll employ a technique known as jittering: first we uniformly subdivide the pixel into subpixels, and then we randomly perturb the sample position of each subpixel inside the area of that subpixel. The following code implements this algorithm:

void get_sample_pos(int sidx, float *pos)
{
pos[0] = pos[1] = 0.0f;
get_sample_pos_rec(sidx, 1.0f, 1.0f, pos);
}

static void get_sample_pos_rec(int sidx, float xsz, float ysz, float *pos)
{
static const float subpt[4][2] = {
{-0.25, -0.25}, {0.25, -0.25}, {-0.25, 0.25}, {0.25, 0.25}
};

if(!sidx) {
/* we're done, just add appropriate jitter */
pos[0] += (float)rand() / (float)RAND_MAX * xsz - xsz / 2.0;
pos[1] += (float)rand() / (float)RAND_MAX * ysz - ysz / 2.0;
return;
}

/* determine which quadrant to recurse into */
quadrant = ((sidx - 1) % 4);

get_sample_pos_rec((sidx - 1) / 4, xsz / 2, ysz / 2, pos);
}

And here's the animation showing the jittered sample position generator in action:

## Color space linearity and gamma correction

Ok so it’s a well known fact among graphics practitioners, that pretty much every game does rendering incorrectly. Since performance, and not correctness is always the prime consideration in game graphics, usually we tend to turn a blind eye towards such considerations. However with todays ultra-high performance programmable shading processors, and hardware LUT support for gamma correction, excuses for why we continue doing it the wrong way, become progressively more and more lame. :)

The gist of the problem with traditional real-time rendering, is that we’re trying to do linear operations, in non-linear color spaces.

Let’s take lighting calculations for example, when light hits a plane with 60 degrees incidence angle from the normal vector of the plane, Lambert’s cosine law states that the intensity of the diffusely reflected light off the plane (radiant exitance), is exactly half of the intensity of the incident light (irradiance) from that light source. However the monitor, responsible for taking all those pixel values and sending them rushing into our retinas, does not play along with our assumptions. That half intensity grey light we expect from that surface, becomes much darker due to the exponential response curve of the electron gun.

Simply put, when half the voltage of the full input range is applied to the electron gun, much less than half the possible electrons hit the phosphor in the glass, making it emmit lower than half-intensity light to the user. That’s not a defect of the CRT monitors; all kinds of monitors, tv screens, projectors, or other display devices work the same way.

So how do we correct that? We need to use the inverse of the monitor response curve, to correct our output colors, before they are fed to the monitor, so that we can be sure that our linear color space where we do our calculations, does not get bent out of shape before it reaches our eyes. Since the monitor response curve is approximately a function of the form: $x^\gamma$ where $\gamma = 2.2$ usually, it mostly suffices to do the following calculation before we write the color value to the framebuffer: $x^\frac{1}{\gamma}$. Or in a pixel shader:

gl_FragColor.rgb = pow(color.rgb, vec3(1.0 / 2.2));

That's not entirely correct, because if we are doing any blending, it happens after the pixel shader writes the color value, which means it would operate after this gamma correction, in a non-linear color space. It would be fine if this shader is a final post-processing shader which writes the whole framebuffer without any blending operations, but there is a better and more efficient way. If we just tell OpenGL that we want to output a gamma-corrected framebuffer, or more precisely a framebuffer in the sRGB color space, it can do this calculation using hardware lookup tables, after any blending takes place, which is efficient and correct. This fucntionality is exposed by the ARB_framebuffer_sRGB extension, and should be available on all modern graphics cards. To use it we need to request an sRGB-capable framebuffer during context creation (GLX_FRAMEBUFFER_SRGB_CAPABLE_ARB / WGL_FRAMEBUFFER_SRGB_CAPABLE_ARB), and enable it with glEnable(GL_FRAMEBUFFER_SRGB).

Now if we do just that, we're probably going to see the following ghastly result:

The problem is that our textures are already gamma-corrected with a similar process, which makes them now completely washed out when we apply gamma correction in the end a second time. The solution is to make color values looked up from textures linear before using them, by raising them to the power of 2.2. This can either be done in the shader simply by: pow(texture2D(tex, tcoord).rgb, vec3(2.2)), or by using the GL_SRGB_EXT internal texture format instead of GL_RGB (EXT_texture_sRGB extension), to let OpenGL know that our textures aren't linear and need conversion on lookups.

The result is correct rendering output, with all operations in a linear color space:

A final pitfall we may encounter is if we use intermediate render targets during rendering, with 8 bit per color channel resolution, we will observe noticable banding in the darker areas. That is because our 8bit/channel textures are now raised to a power and the result is again placed in an 8bit/channel render target, which obviously wastes color resolution and loses details, which cannot be replaced later on when we gamma correct the values again. Bottom-line is that we need higher precision intermedate render targets if we are going to work in a trully linear color space. The following screenshots show a dark area of the game when using a regular GL_RGBA intermediate render target (top), and when using a half-float GL_RGBA16F render target (bottom):

Color artifacts are clearly visible in the first image, around the dark unlit area.

## Kernel Development from Scratch

I’ve started a series of articles for the new Greek magazine: Linux Inside. The articles aim to introduce the reader to kernel development, by starting from scratch and building up a small working kernel.

The first issue of Linux Inside, which should be available by now, includes the first part of the series which covers setting up the development tools for compiling and running the kernel both on the real machine and in a simulator, bootstrapping with multiboot boot loaders, and text output by driving the VGA in text mode.

For obvious reasons the full source code supplementing each article could not fit in the pages of the magazine, only the most relevant snippets are included. So make sure to follow the URL at the start of the article to download the full source code and have it handy while you read. The source code for the first article can be found here.

So go grab a copy of the Linux Inside magazine and let me know what you think. Each article will also contain exercises for the readers, so have a go at them and send me your code! I’ll try to include it in the next issue.

Oh and of course each article will be released under a free license (I’m thinking either GNU FDL or CC BY-SA) when each issue goes out of circulation.

edit: Just made a website for the series.

## Third OpenGL article

The third part of my OpenGL graphics programming series for the greek linux format magazine is available for a few days now. So make sure you don’t miss the January-February issue of linux format, if you wish to learn about texture mapping. A technique which, when used properly, can greatly increase the realism of our 3D objects, at virtually no extra processing cost on modern graphics hardware.

This time the accompanying source code for the examples is not included in the magazine’s dvd, so go grab a copy from my website.

Finally, I forgot to mention that the first of these articles is now available for download in PDF format.

## Introductory OpenGL tutorials continued

Just a short notice, the second part of my “introduction to 3D graphics with OpenGL” series should be available as we speak. This time, we’ll perform the full set of transformations that we described while discussing the rendering pipeline in the previous issue. We’ll use the matrix stack to separate the model from the view parts of the modelview matrix, and render multiple objects properly. And finally we’re going to explain the mathematical model of shading and illumination, and we’ll apply lighting to our object in order to increase the realism of our simple 3D environment tremendously.

So, go and grab a copy of the november-december issue of the greek linux format magazine, and let me know what you think. As always I look forward to your comments, suggestions, corrections, etc.

By the way, due to popular demand, I will upload the first tutorial of the series in a couple of weeks, after the previous issue of linux format goes out of circulation.

## Introductory OpenGL tutorials

I recently started writing a series of introductory tutorials about graphics programming with OpenGL, for the greek linux format magazine.

The articles are written for the complete begginer, who hasn’t had any previous exposure to graphics programming. However, familiarity with the C programming language is definitely required.

What I’m aiming for, is to thoroughly explain the underlying theory, in order to provide a stepping stone for someone who would like to eventually delve deeper into graphics algorithms, rather than just present raw examples for doing this and that with OpenGL.

In any case, the first article of the series will be published in the september-october issue of the greek linux format magazine, which should be available during the next few days. Any feedback, is greatly appreciated.