Change search
ReferencesLink to record
Permanent link

Direct link
Performance Prediction and Improvement Techniques for Parallel Programs in Multiprocessors
Responsible organisation
2002 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The performance of a computer system is important. One way of improving performance is to use multiprocessors with several processors that can work in parallel. Where multiprocessors are used, the programs must also be parallel in order to achieve high performance. However, it is not always easy to write parallel programs for multiprocessors; program developers need support in this area. Such support includes, for example, information regarding how well the parallel program scales-up when the number of processors increases and identification of performance bottlenecks; ideally, the result should be presented graphically. Bottlenecks arise both as a result of parallelization as well as traditional (sequential) code. Further, the developer may need to predict performance on other systems than the one used for development, since the development environment often is the (uni-processor) workstation on the developer's desk. One way of increasing the performance may be to bind threads on processors statically. Finding the optimal allocation is NP-hard and it is necessary to resort to heuristic algorithms. When heuristic algorithms are used we do not know how near/far we are from the optimal allocation. Finding a bound for the program's completion time shows what should be achievable using a heuristic algorithm. In this thesis, I present techniques how to simulate a multiprocessor execution of a parallel program based on a monitored execution on a uni-processor. The result of the (simulated) multiprocessor execution is graphically presented in order to give feedback to the developer. The techniques can be used for heuristic algorithms to find an allocation of threads to processors. Further, I show an algorithm that identifies the critical path of the parallel program on a multiprocessor, thereby identifying the segments that are worthwhile optimizing. I also show how to calculate a tight bound on the minimal completion time for the optimal allocation of threads to processors. Finally, I discuss the implications of the choice of simulation model. The techniques and algorithms described have been manifested in a prototype tool which I have used to perform empirical studies. The tool has been validated using a real multiprocessor.

Place, publisher, year, edition, pages
Ronneby: Blekinge Institute of Technology , 2002. , 162 p.
Blekinge Institute of Technology Dissertation Series, ISSN 1650-2159 ; 3
National Category
Computer Science
URN: urn:nbn:se:bth-00188ISBN: 91-7295-009-9OAI: diva2:838231
Available from: 2012-09-18 Created: 2002-04-11 Last updated: 2016-02-15Bibliographically approved

Open Access in DiVA

fulltext(1462 kB)21 downloads
File information
File name FULLTEXT01.pdfFile size 1462 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 18 hits
ReferencesLink to record
Permanent link

Direct link