Optimizing Strassen's multiplication algorithm for modern processors: A study in optimizing matrix multiplications for large matrices on modern CPUs
Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
This paper examines how to write code to gain high performance on modern computers as well as the importance of well planned data structures. The experiments were run on a computer with an Intel i5-5200U CPU, 8GB of RAM running Linux Mint 17.For the measurements Winograd's variant of Strassen's matrix multiplication algorithm was implemented and eventually compared to Intel's math kernel library (MKL). A quadtree data structure was implemented to ensure good cache locality. Loop unrolling and tiling was combined to improve cache performance on both L1 and L2 cache taking into regard the out of order behavior of modern CPUs. Compiler hints were partially used but a large part of the time critical code was written in pure assembler. Measurements of the speed performance of both floats and doubles were performed and substantial differences in running times were found.While there was a substantial difference between the best implementation and MKL for both doubles and floats at smaller sizes, a difference of only 1\% in execution time was achieved for floats at the size of 2^14. This was achieved without any specific tuning and could be expected to be improved if more time was spent on the implementation.
Place, publisher, year, edition, pages
IdentifiersURN: urn:nbn:se:kth:diva-186418OAI: oai:DiVA.org:kth-186418DiVA: diva2:927258
Subject / course
Bachelor of Science in Engineering - Computer Engineering