Spark for HPC: a comparison with MPI on compute-intensive applications using Monte Carlo method
2018 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hp
Oppgave
Abstract [en]
With the emergence of various big data platforms in recent years, Apache Spark - a distributed large-scale computing platform, is perceived as a potential substitute for Message Passing Interface (MPI) in High Performance Computing (HPC). Due to the limitations in fault-tolerance, dynamic resource handling and ease of use, MPI, as a dominant method to achieve parallel computing in HPC, is often associated with higher development time and costs in enterprises such as Scania IT. This thesis project aims to examine Apache Spark as an alternative to MPI on HPC clusters and compare their performance in various aspects. The test results are obtained by running a compute- intensive application on both platforms to solve a Bayesian inference problem of a extended Lotka-Volterra model using particle Markov chain Monte Carlo methods. As is confirmed by the tests, Spark is demonstrated to be superior in fault tolerance, dynamic resource handling and ease of use, whilst having its shortcomings in performance and resource consumption compared with MPI. Overall, Spark proves to be a promising alternative of MPI on HPC clusters. As a result, Scania IT continues to explore Spark on HPC clusters for use in different departments.
sted, utgiver, år, opplag, sider
2018. , s. 65
Serie
IT ; 18048
HSV kategori
Identifikatorer
URN: urn:nbn:se:uu:diva-392311OAI: oai:DiVA.org:uu-392311DiVA, id: diva2:1347863
Utdanningsprogram
Master Programme in Computer Science
Veileder
Examiner
2019-09-022019-09-022019-09-02bibliografisk kontrollert