Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Performance Characterization of In-Memory Data Analytics on a Scale-up Server
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. KTH Royal Institute of Technology.ORCID iD: 0000-0002-7510-6286
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big data analytics platforms for (i) exploiting data-flow and in-memory computing and (ii) for exhibiting superior scale-out performance on the commodity machines, little effort has been devoted at understanding the performance of in-memory data analytics with Spark on modern scale-up servers. This thesis characterizes the performance of in-memory data analytics with Spark on scale-up servers.

Through empirical evaluation of representative benchmark workloads on a dual socket server, we have found that in-memory data analytics with Spark exhibit poor multi-core scalability beyond 12 cores due to thread level load imbalance and work-time inflation. We have also found that workloads are bound by the latency of frequent data accesses to DRAM. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10% better instruction retirement rate (due to lower L1 cache misses and higher core utilization).

For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average, (ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14%. For GC impact, we match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x. and recommend to use multiple small executors that can provide up-to 36% speedup over single large executor.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. , 111 p.
Series
TRITA-ICT, 2016:07
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-185581ISBN: 978-91-7595-926-9 OAI: oai:DiVA.org:kth-185581DiVA: diva2:922539
Presentation
2016-05-23, Ka-210, Electrum 229, Kista, Stockholm, 09:15 (English)
Opponent
Supervisors
Note

QC 20160425

Available from: 2016-04-25 Created: 2016-04-22 Last updated: 2017-03-02Bibliographically approved
List of papers
1. Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
Open this publication in new window or tab >>Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
2015 (English)In: Proceedings - 2015 IEEE 5th International Conference on Big Data and Cloud Computing, BDCloud 2015, IEEE Computer Society, 2015, 1-8 p., 7310708Conference paper, Published paper (Refereed)
Abstract [en]

In last decade, data analytics have rapidly progressed from traditional disk-based processing tomodern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015
Keyword
cloud chambers, cloud computing, data analysis, resource allocation, storage management, Apache Spark framework, Spark workload, data analysis workload, disk-based processing, in-memory data analytics, in-memory processing, memory bound latency, microarchitecture level performance, modern cloud server, performance characterization, single node NUMA machine, thread level load imbalance, work time inflation, workload scalability, Benchmark testing, Big data, Data analysis, Instruction sets, Scalability, Servers, Sparks, Data Analytics, NUMA, Spark Performance, Workload Characterization
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-179403 (URN)10.1109/BDCloud.2015.37 (DOI)000380444200001 ()2-s2.0-84962757128 (Scopus ID)978-1-4673-7182-7 (ISBN)
Conference
Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on, Dalian, China, 26-28 Aug. 2015
Note

QC 20160118 QC 20160922

Available from: 2015-12-16 Created: 2015-12-16 Last updated: 2016-09-22Bibliographically approved
2. How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
Open this publication in new window or tab >>How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
2015 (English)In: Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 6th Workshop, BPOE 2015, Kohala, HI, USA, August 31 - September 4, 2015. Revised Selected Papers, Springer, 2015, Vol. 9495, 81-92 p.Conference paper, Published paper (Refereed)
Abstract [en]

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not well understood. We present a deep-dive analysis of Spark based applications on a large scale-up server machine. Our analysis reveals that Spark based data analytics are DRAM bound and do not benefit by using more than 12 cores for an executor. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10 % better instruction retirement rate (due to lower L1 cache misses and higher core utilization). We match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x.

Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-181325 (URN)10.1007/978-3-319-29006-5_7 (DOI)2-s2.0-84958073801 (Scopus ID)978-3-319-29005-8 (ISBN)
Conference
6th International Workshop on Bigdata Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with 41st International Conference on Very Large Data Bases (VLDB),Kohala, HI, USA, August 31 - September 4, 2015
Note

QC 20160224

Available from: 2016-02-01 Created: 2016-02-01 Last updated: 2016-04-25Bibliographically approved
3. Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study
Open this publication in new window or tab >>Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study
(English)Manuscript (preprint) (Other academic)
Abstract [en]

While cluster computing frameworks are contin-uously evolving to provide real-time data analysis capabilities,Apache Spark has managed to be at the forefront of big data an-alytics for being a unified framework for both, batch and streamdata processing. However, recent studies on micro-architecturalcharacterization of in-memory data analytics are limited to onlybatch processing workloads. We compare micro-architectural per-formance of batch processing and stream processing workloadsin Apache Spark using hardware performance counters on a dualsocket server. In our evaluation experiments, we have found thatbatch processing are stream processing workloads have similarmicro-architectural characteristics are bounded by the latency offrequent data access to DRAM. For data accesses we have foundthat simultaneous multi-threading is effective in hiding the datalatencies. We have also observed that (i) data locality on NUMAnodes can improve the performance by 10% on average and(ii)disabling next-line L1-D prefetchers can reduce the executiontime by up-to 14% and (iii) multiple small executors can provideup-to 36% speedup over single large executor

Keyword
Performance Characterization, Apache Spark, Micro-architecture
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-185580 (URN)
Note

QC 20160425

Available from: 2016-04-22 Created: 2016-04-22 Last updated: 2016-04-25Bibliographically approved

Open Access in DiVA

Licentiate_Thesis_AJA(2571 kB)239 downloads
File information
File name FULLTEXT01.pdfFile size 2571 kBChecksum SHA-512
00bd876477b1a2e9c3aa2a98f596d904bf0e205117add6d7f25d1679772177ba1f0cf97763ec3e833ab1b7aa0a834a98989228e228b7d6c8941ce3743e255392
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Awan, Ahsan Javed
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 239 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 3243 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf