Initial thoughts on Vulkan

Vulkan logo

I just finished watching the video of the Khronos GDC talk on Vulkan, and the corresponding slides, and I decided to share my initial thoughts on the new API, mostly in an attempt to consolidate them.

For those who missed the announcements of the last couple of days, Vulkan is supposed to be the successor to OpenGL, previously referred to as: glNext. The idea is to provide a low level API, basically a thin abstraction over the GPU hardware, along the lines of Apple Metal and AMD Mantle (as far as I’ve heard at least; I have no personal experience with either of the them).

I’ve been back and forth for a while, on whether I like or dislike this development. Until I realized that all my objections where centered around the idea that Vulkan is going to replace OpenGL. And by that, I mean that somehow everyone using OpenGL today, would switch over to Vulkan, something which is clearly not true. It is certainly talked about as a replacement for OpenGL, however in my opinion, it both is and isn’t. Let me explain…

OpenGL, from its inception, had a multiple personality disorder. This is clear just by reading the introductory chapter of the OpenGL specification document, which requires 4 consecutive sections, looking at OpenGL from different perspectives over the span of 3 pages, to attempt to define what OpenGL is:

  • 1.1 What is the OpenGL Graphics System?
  • 1.2 Programmer’s View of OpenGL
  • 1.3 Implementor’s View of OpenGL
  • 1.4 Our View

For me, and my daily hacking, OpenGL plays two distinct roles:

  • A way to access and utilize the GPU hardware, to take advantage of its capability to accelerate the various graphics algorithms required for real-time rendering.
  • A convenient 3D rendering library which is low level enough to not introduce obstacles in the form of cumbersome abstractions, while hacking graphics algorithms, yet high-level enough to make experimentation possible without necessarily having to write frameworks and abstractions for every task.

Based on that, it is now clear to me that Vulkan is indeed a very good replacement for the first use case, but completely unsuitable for the second. That’s why I think that Vulkan is not really a replacement of OpenGL. Vulkan and OpenGL can, and should coexist!

Vulkan looks like a perfect target for the reusable frameworks and engines. It provides some great opportunities for optimizations. I’m particularly excited about finally being able to multi-thread rendering code, without jumping through hoops (and performance bottlenecks) to serialize all API interactions through “the context thread”. I love the idea of having the API only deal with a concise intermediate representation of the shading language, which facilitates experimentation with new shading language frontends, off-line shader compilers, and avoids having to work around multiple vendors’ parser bugs and idiosyncrasies. I also like the idea of being able to manage the allocation and use of GPU resources, and the flow of data.

But, at the same time, I want the ease of use of OpenGL for my day to day hacks and experiments. I want immediate mode, matrix stacks, and the option of not having to care whether a texture or vertex buffer is currently backed by GPU memory or not.

I’ll go one step further, and this is really the gist of my thoughts on the matter: Not only should there be an OpenGL implementation over Vulkan, to use for experimental programs and one-off hacks, but ideally both should be seamlessly interoperable within the same program. Think of how powerful it would be to be able to start with an OpenGL prototype, and then go to the metal with Vulkan wherever you want to optimize! This way, to put it in 90s terms: Vulkan could be the assembly language of OpenGL.

Anyway, this pretty much sums up my initial thoughts on the Vulkan announcement. It’s still quite early and Vulkan itself is still in flux, so I guess we’ll have to wait and see how the whole thing turns out. But one thing is clear to me: Vulkan is very exciting, but OpenGL isn’t going to go away any time soon.

OculusVR SDK and simple oculus rift DK2 / OpenGL test program

Edit: I have since ported this test program to use LibOVR 0.4.4, and works fine on both GNU/Linux and Windows.

Edit2: Updated the code to work with LibOVR 0.5.0.1, but unfortunately they removed the handy function which I was using to disable the obnoxious “health and safety warning”. See the oculus developer guide on how to disable it system-wide instead.

I’ve been hacking with my new Oculus Rift DK2 on and off for the past couple of weeks now. I won’t go into how awesome it is, or how VR is going to change the world, and save the universe or whatever; everybody who cares, knows everything about it by now. I’ll just share my experiences so far in programming the damn thing, and post a very simple OpenGL test program I wrote last week to try it out, that might serve as a baseline.

OculusDK2 OpenGL test program

Read the rest of this entry »

Calculating hierarchical animation transformation matrices

This is a short post with my thoughts on what’s the best way to design a transformation hierarchy for animation, and how the “ideal design” changes over time.

Bottom-up lazy evaluation magic

For a long time, I was a big fan of backwards (bottom-up) lazy evaluation of transformation hierarchies. Essentially having an XFormNode class for each node in the hierarchy with a get_matrix(long msec) function which calculates the current node’s transformation matrix for the current time, then calls parent->get_matrix(msec) and returns the concatenated matrix.

Matrix4x4 XFormNode::get_matrix(long msec) const
{
    Matrix4x4 xform;
    calc_matrix(&xform, msec);

    if(parent) {
        xform = parent->get_xform(msec) * xform;
    }
    return xform;
}

Of course, such a scheme would be wasteful if these matrices where calculated every time get_matrix functions are called. For instance if a node is part of a hierarchy, then its get_matrix will be called when we need to draw the object corresponding to this node, and also every time the get_matrix of any one of its descendants is called, due to the recursive bottom-up evaluation of matrices outlined previously. If one considers the posibility of drawing an object multiple times for various multi-pass algorithms the problem gets worse, with the limiting worse case scenario being if we’re doing ray-tracing which would require these functions to be called at the very least once per ray cast.

It follows then, that such a design goes hand in hand with lazy evaulation and caching of calculated node matrices. The XFormNode class would hold the last requested time and corresponding matrix, and when get_matrix is called, if the requested time matches the last one, we just return the cached matrix instead of recalculating it.

const Matrix4x4 &XFormNode::get_matrix(long msec) const
{
    if(msec == cached_msec) {
        return cached_matrix;
    }

    calc_matrix(&cached_matrix, msec);
    cached_msec = msec;

    if(parent) {
        cached_matrix = parent->get_xform(msec) * cached_matrix;
    }
    return cached_matrix;
}

This worked nicely for a long time, and it worked like magic. At any point I could just ask for the transform at time X and would get the matrix automatically, including any effects of hierarchy, keyframe interpolations, etc. It’s all good… until suddenly processors stopped getting faster any more, moore’s law went belly up, and after the shock passed we all sooner or later realised that single-threaded graphics programs are a thing of the past.

Multithreading pissed on my rug

In the brave new multithreaded world, lazy evaluation becomes a pain in the ass. The first knee-jerk reaction is to add a mutex in XFormNode, and lock the hell out of the cached matrices. And while that might be ok for an OpenGL program which won’t have more than a couple of threads working with the scene database at any point (since rendering itself can only be done safely from a single thread), it throws out of the window a lot of concurrency that can take place in a raytracer where at any point there could be 8 or more threads asking for the matrix of any arbitrary node.

A second way to deal with this issue is to have each thread keep its own copy of the matrix cache, keeping it in thread-specific memory. I’m shamed to admit I never got around to doing any actual performance comparisons on this, though I’ve used it for quite some time in my programs. In theory it avoids having to wait for any other thread to access the cache, so it should be faster in theory, but it needs a pthread_getspecific call in every get_matrix invocation which comes with its own overhead.

const Matrix4x4 &XFormNode::get_matrix(long msec) const
{
    MatrixCache *cache = pthread_getspecific(cache_key); // cache_key created in the constructor for each node
    if(!cache) {
        // first time we need a cache for this thread we'll have to create it
        cache = new MatrixCache;
        pthread_mutex_lock(&cache_list_lock);
        cache_list.push_back(cache);
        pthread_mutex_unlock(&cache_list_lock);
        pthread_setspecific(cache_key, cache);
    }

    if(msec == cache->msec) {
        return cache->matrix;
    }

    calc_matrix(&cache->matrix, msec);
    cache->msec = msec;

    if(parent) {
        cache->matrix = parent->get_xform(msec) * cache->matrix;
    }
    return cache->matrix;
}

This works fine, and although we managed to avoid blocking concurrent use of get_matrix, we had to add some amount of overhead for thread-specific storage calls, and the code became much more complex all over the place: invalidations must also access this thread-specific storage, we need cleanup for the per-thread MatrixCache objects, etc.

Return of the top-down evaluation

So nowadays I’m starting to lean more towards the simpler, less automagic design of top-down evaluation. It boils down to just going through the hierarchy once to calculate all the matrices recursively, then at any point we can just grab the previously calculated matrix of any node and use it.

void XFormNode::eval(long msec)
{
    calc_matrix(&matrix, msec);

    if(parent) {
        matrix = parent->matrix * matrix;
    }

    for(size_t i=0; i<children.size(); i++) {
        children[i]->eval(msec);
    }
}

The simplicity of this two-pass approach is hard to overlook, however it’s just not as good for some things as my original ideal method. It works fine for OpenGL programs where it suffices to calculate transformations once per frame, it even works fine for simple raytracers where we have again a single time value for any given frame. However it breaks down for ray-tracers doing distribution ray tracing for motion blur.

No rest for the wicked

The best way to add motion blur to a ray tracer is through a monte-carlo method invented by Cook, Porter and Carpenter, called “distribution ray tracing“. In short, when spawining primary rays we have to choose a random time in the interval centered around the frame time and extending to the past and future, as far as dictated by the shutter speed of the camera. This time is then used both to calculate the position and direction of the ray, which thus might differ between sub-pixels if the camera is moving, and to calculate the positions of the objects we’re testing for intersections against. Then if we cast many rays per pixel and average the results, we’ll get motion blur on anything that moves significantly while the virtual shutter is open (example from my old s-ray renderer).

It’s obvious that calculating matrices once per frame won’t cut it with advanced ray-tracers, so there’s no getting rid of the complexity of the lazy bottom-up scheme in that case. Admittedly, however, caching won’t do much for us either because every sub-pixel will request the matrix at a different time anyway, so we might as well just calculate matrices from scratch all the time, and skip the thread-specific access overhead. The jury is still out on that one.

Do you have a favourite design for hierarchical animation code? Feel free to share it by leaving a comment below!

OpenGL stereoscopic anaglyphs and patents

An anaglyph is a combination of two images into one, in such a way that they can later be separated by viewing the image through appropriately colored transparent filters. The objective is to present slightly shifted views of the same 3D environment to each eye, in order to achieve depth perception (i.e. really perceive the 3rd dimension).

anaglyph glasses

I’ve never dealt with anaglyphs in the past, but during my recent week-old obsession with stereoscopy, I’ve stumbled upon a pair of free anaglyph viewing glasses (made out of cardboard and cellophane of course). So I couldn’t help but try to find out how I can use them with my own programs.
Read the rest of this entry »

Raytracing Anamorphic Images

A long time ago, I stumbled upon a couple of strikingly odd images on Jim Arvo’s web site, which are apparently called “anamorphic”. The idea behind an anamorphic image, is that it’s distorted in such a way, that its true shape can be seen only when viewed in a particular manner. In the case of these images, you’re supposed to print the image and place a highly reflective cylindrical object, such as a chrome pipe, at a specific marked location in order to see the geometric shapes correctly.

I kept the images back then, with the purpose of either finding an appropriate cylindrical object, or raytracing them to see what they look like, but for some reason I’ve forgotten all about them until I accidentally found them again yesterday, in a dusty corner of my filesystem.

So I decided to hack some code to add perfect (x^2 + y^2 = r^2) cylinder primitives to my raytracer and do a couple of renderings with those images texture-mapped onto a ground quad (I could just do the same thing with another raytracer such as pov-ray but where’s the fun in that?).

So anyway here are the anamorphic images along with the renderings (click on the images for the full rendering):

Introductory OpenGL tutorials continued

Just a short notice, the second part of my “introduction to 3D graphics with OpenGL” series should be available as we speak. This time, we’ll perform the full set of transformations that we described while discussing the rendering pipeline in the previous issue. We’ll use the matrix stack to separate the model from the view parts of the modelview matrix, and render multiple objects properly. And finally we’re going to explain the mathematical model of shading and illumination, and we’ll apply lighting to our object in order to increase the realism of our simple 3D environment tremendously.

So, go and grab a copy of the november-december issue of the greek linux format magazine, and let me know what you think. As always I look forward to your comments, suggestions, corrections, etc.

By the way, due to popular demand, I will upload the first tutorial of the series in a couple of weeks, after the previous issue of linux format goes out of circulation.

Introductory OpenGL tutorials

I recently started writing a series of introductory tutorials about graphics programming with OpenGL, for the greek linux format magazine.

The articles are written for the complete begginer, who hasn’t had any previous exposure to graphics programming. However, familiarity with the C programming language is definitely required.

What I’m aiming for, is to thoroughly explain the underlying theory, in order to provide a stepping stone for someone who would like to eventually delve deeper into graphics algorithms, rather than just present raw examples for doing this and that with OpenGL.

In any case, the first article of the series will be published in the september-october issue of the greek linux format magazine, which should be available during the next few days. Any feedback, is greatly appreciated.