You're blocking ads, which pay for BlenderNation. Read about other ways to support us.

Cycles Performance Improvements for CPU + GPU renders


Lukas Stockner just committed a patch implementing tile-stealing to Cycles, improving render performance on CPU + GPU renders.

Tile stealing works by allocating already started CPU render tiles to the GPU, with the GPU usually being done with its workload earlier, allowing the GPU to quickly finish the remaining tiles instead of waiting for the CPU to finish. Lukas Provides the following benchmarks, with it being up to 14% faster on Open CL.

OpenCL (AMD Radeon Pro W5700 + AMD Ryzen 3900X):

16x16 32x32 64x64 128x128
GPU only 174.44 95.29 82.11 78.52
No stealing 73.24 56.95 69.01 126.08
Stealing 73.45 53.05 49.85 53.30


CUDA (Nvidia GTX 1080Ti + AMD Ryzen 3900X):

16x16 32x32 64x64 128x128
GPU only 61.8 58.3 57.98 60.18
No stealing 42.92 45.12 59.58 127.9
Stealing 42.00 40.37 39.84 44.29


Bonestudio also shared some benchmarks showing the new tile stealing feature shaving an extra 20 seconds of render time in the most optimal tile size configuration.

This feature has already landed in the latest Blender 2.92 Builds, which you can get at the link below:

About the Author

Mario Hawat

Mario Hawat is a Lebanese 3D artist, writer, and musician currently based in Paris. He is a generalist with a special focus on environments, procedural and generative artworks. Open to freelance work.


  1. Interesting, I'll have to check it out and see how it works, in earlier versions the cpu+gpu option would either crash or cause extremely slow renders, like unbelievably slow, much slower than cpu alone.

  2. I've just tried 2.92.0 and I'm really impressed with the GPU+CPU rendering. I'm using a Dell XPS-15 9560 laptop with GTX 1050 and core i7. In Blender 2.90.1, GPU+CPU was barely any improvement on GPU, and in 2.91.0 it was actually about 10% slower - presumably the CPU grabbed tiles leaving the GPU idling at the end of the render. But on 2.92.0 alpha, with tile stealing, as expected GPU+CPU improves the speed dramatically, typically by about 30%, though it's very dependant on tile size of course.

    For a fairly complex scene with a smoke domain and volume shader, 1920x1080 with 100 samples, the optimum tile size was 128x128, with the GPU taking 262s and GPU+CPU taking 198s, a 33% improvement. 240x216 gave an even better relative improvement of 36%, but was still slightly slower overall at 202s. Other frames in the animation without the smoke gave the fastest render of 45s at 240x216 tile size, a 42% improvement over the GPU alone at 64s. So, for my entire animation (9,000+ frames!) I think 240x216 will be the best compromise, with tile-stealing saving me potentially days of render time.

    All these were with Optix denoising, which is absolutely amazing :-) My GTX also supports Optix rendering, but I found this to be about 40% slower than CUDA :-(
    I'll try updating my nVidia drivers just in case, but I think the GTX Optix just isn't up to it, and I really need to upgrade to an Alienware with RTX!

Leave A Reply

To add a profile picture to your message, register your email address with To protect your email address, create an account on BlenderNation and log in when posting a message.