Bachelor Thesis
Efficient GPU Implementation of Stencils on Unstructured Grids
For my bachelor thesis, I implemented and benchmarked three stencil computations — local calculations applied to every cell in a three-dimensonal grid — using the Nvidia CUDA programming model on regular and unstructured grids. I explored five methods to access a grid’s cells in a stencil code and tested four strategies for storing the grid’s structural information (neighborship table). Notably, I conceived a compression scheme for the neighborship table, which improves runtimes compared to uncompressed implementations by 30% in some cases. Most of the other optimizations make use of an assumed regular structure in one of the dimensions.
You can download a PDF of my bachelor thesis here:
Some figures from the paper
Figure from the paper showing median runtimes for varying number of threads and different stencil implementations. This is for the "Laplace-of-Laplace" stencil.
Overview graph showing the performance of different configurations of the Laplace-of-Laplace stenci algorithm, using different representations of the neighborship table (pointer chasing vs explicit storage, and compressed vs uncompressed neighborship table, using different representations of the neighborship table (pointer chasing vs explicit storage, and compressed vs uncompressed neighborship table).
Different memory layouts that were tested, left to right: Row-major, Z-curves, stretched Z-curves.