Lukas Stockner just committed a patch implementing tile-stealing to Cycles, improving render performance on CPU + GPU renders.
https://twitter.com/s_koenig/status/1322824579866394624
Tile stealing works by allocating already started CPU render tiles to the GPU, with the GPU usually being done with its workload earlier, allowing the GPU to quickly finish the remaining tiles instead of waiting for the CPU to finish. Lukas Provides the following benchmarks, with it being up to 14% faster on Open CL.
OpenCL (AMD Radeon Pro W5700 + AMD Ryzen 3900X):
16x16 | 32x32 | 64x64 | 128x128 | |
GPU only | 174.44 | 95.29 | 82.11 | 78.52 |
No stealing | 73.24 | 56.95 | 69.01 | 126.08 |
Stealing | 73.45 | 53.05 | 49.85 | 53.30 |
CUDA (Nvidia GTX 1080Ti + AMD Ryzen 3900X):
16x16 | 32x32 | 64x64 | 128x128 | |
GPU only | 61.8 | 58.3 | 57.98 | 60.18 |
No stealing | 42.92 | 45.12 | 59.58 | 127.9 |
Stealing | 42.00 | 40.37 | 39.84 | 44.29 |
Bonestudio also shared some benchmarks showing the new tile stealing feature shaving an extra 20 seconds of render time in the most optimal tile size configuration.
https://twitter.com/BoneStudioAnim/status/1323052186687950851
This feature has already landed in the latest Blender 2.92 Builds, which you can get at the link below:
2 Comments
Interesting, I'll have to check it out and see how it works, in earlier versions the cpu+gpu option would either crash or cause extremely slow renders, like unbelievably slow, much slower than cpu alone.
I've just tried 2.92.0 and I'm really impressed with the GPU+CPU rendering. I'm using a Dell XPS-15 9560 laptop with GTX 1050 and core i7. In Blender 2.90.1, GPU+CPU was barely any improvement on GPU, and in 2.91.0 it was actually about 10% slower - presumably the CPU grabbed tiles leaving the GPU idling at the end of the render. But on 2.92.0 alpha, with tile stealing, as expected GPU+CPU improves the speed dramatically, typically by about 30%, though it's very dependant on tile size of course.
For a fairly complex scene with a smoke domain and volume shader, 1920x1080 with 100 samples, the optimum tile size was 128x128, with the GPU taking 262s and GPU+CPU taking 198s, a 33% improvement. 240x216 gave an even better relative improvement of 36%, but was still slightly slower overall at 202s. Other frames in the animation without the smoke gave the fastest render of 45s at 240x216 tile size, a 42% improvement over the GPU alone at 64s. So, for my entire animation (9,000+ frames!) I think 240x216 will be the best compromise, with tile-stealing saving me potentially days of render time.
All these were with Optix denoising, which is absolutely amazing :-) My GTX also supports Optix rendering, but I found this to be about 40% slower than CUDA :-(
I'll try updating my nVidia drivers just in case, but I think the GTX Optix just isn't up to it, and I really need to upgrade to an Alienware with RTX!