Spark for HPC: a comparison with MPI on compute-intensive applications using Monte Carlo method
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
With the emergence of various big data platforms in recent years, Apache Spark - a distributed large-scale computing platform, is perceived as a potential substitute for Message Passing Interface (MPI) in High Performance Computing (HPC). Due to the limitations in fault-tolerance, dynamic resource handling and ease of use, MPI, as a dominant method to achieve parallel computing in HPC, is often associated with higher development time and costs in enterprises such as Scania IT. This thesis project aims to examine Apache Spark as an alternative to MPI on HPC clusters and compare their performance in various aspects. The test results are obtained by running a compute- intensive application on both platforms to solve a Bayesian inference problem of a extended Lotka-Volterra model using particle Markov chain Monte Carlo methods. As is confirmed by the tests, Spark is demonstrated to be superior in fault tolerance, dynamic resource handling and ease of use, whilst having its shortcomings in performance and resource consumption compared with MPI. Overall, Spark proves to be a promising alternative of MPI on HPC clusters. As a result, Scania IT continues to explore Spark on HPC clusters for use in different departments.
Place, publisher, year, edition, pages
2018. , p. 65
Series
IT ; 18048
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-392311OAI: oai:DiVA.org:uu-392311DiVA, id: diva2:1347863
Educational program
Master Programme in Computer Science
Supervisors
Examiners
2019-09-022019-09-022019-09-02Bibliographically approved