Bradys Blog

Ray-tracer Optimisation Using Multi-threading


This trimester I’ve progressed through my course into Studio 3. During studio 3 we will be learning about networking, optimisation, software security and licensing plus much more. I plan to post about these different topics throughout the trimester.

The first week of this trimester we were shown an optimisation technique called multi-threading. You may have heard about multi-threading in the past regarding CPU cores. In summary what multi-threading allows you to do is run multiple processes on the same CPU core to increase performance. However there are some rules to this.

If you want to know more about multi-threading I wont be going into too much detail here as I will mainly be outlining the process I used on this example to increase its performance.

For the activity we were given a ray tracer that renders an image to screen and displays the time it took to render. The goal of the activity was to use multi-threading to decrease the render time of the ray tracer. For this example I decided to use the default c++ threading library as it is simple to use and it was readily available.  

For reference later, here’s what the ray tracer looks like and the time it took to render on my computer without multi-threading implemented yet.


The first step is to identify what part of the program's taking the longest to run. In this case the two for loops in the image above loop over each pixel of the window and runs the ray tracer calculation for that pixel, which takes a long time. This is where we will distribute the calculations between all CPU cores to achieve an increase in performance.

Now that we have identified where the problem is we can move to the code. The first step here is to move the ray tracer calculations to a separate function which will be called later using a thread. For the functions parameters we have the start and end locations for what section of the screen it will render and a pointer to the image data which is needed for the ray tracer calculation. To test that everything still works we can call this function and see if it still renders everything fine. At this stage we shouldn't see any performance increase as nothing has changed, we just moved stuff around. The last step here is to set the y for loop to loop between the start and end parameters.


Now this is where we get to multi-threading. First, to set everything up we want to include the thread library (#include "thread") at the top of the class we’re working in, create a vector array which will hold the threads created (std::vector<std::thread*> threads;) and lastly we need to get the amount of CPU cores (int cores = std::thread::hardware_concurrency();). Different computers may have different core counts and more modern CPU’s could be using synthetic cores as well so they will be included in this number as well. In our case we have four cores which will split the image into quarters. We can now tell each core to render a specific quarter of the screen. We do this by looping over each core to create and store a new thread for that core.


std::thread *t = new std::thread(Width, i * (windowHeight / cores), (i + 1) * (windowHeight / cores), &image);

The above line of code creates a new thread which requires the "Width" function we created at the start and the parameters the "Width" function needs (int start, int end, CImg<float>* image). We then push back the thread to the thread array we created so we can now loop over the threads and tell them to run using threads[i]->join();

We can now run the app again and see if there were any increase in performance. In my case we got a time decrease of about 32 seconds. I made sure to run both of these tests on the same computer as the times will be different between each computer. This is a useful and simple process to improve situations where there needs to be a lot of processes happening at the same time. There's also more sophisticated ways to do this which will yield better results but in this case this process achieved the results required for this task. It's also a good starting point to learn the key concepts of threading. 

A way this could be improved is by queuing threads so if a thread finishes before the rest the core can continue doing work instead of sitting there not doing anything after it has finished. I also want to try splitting the process using spatial partitioning techniques.