A tool for monitoring resource usage in large scale supercomputing clusters
Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Large scale computer clusters have during the last years become dominant for making computations in applications where extremely high computation capacity is required. The clusters consist of a large set of normal servers, interconnected with a fast network. As each node runs its own instance of the operating system, and each node is working, in that sense autonomously, supervising the whole cluster is a challenge.
To get an overview of the efficency and utilization of the system, one cannot only look at one computer. It is necessary to monitor all nodes to get a good view on how the cluster behaves. Monitoring performance counters in a large scale computation cluster implies many difficulties. How can samples of performance metrics be made available for an operator? How can samples of performance metrics be stored? How can a large set of samples of performance metrics be visualized in a meaningful way?
In this thesis it will be discussed how such a monitoring system can be implemented, what problems one may encounter and possible solutions.
Place, publisher, year, edition, pages
2012. , 39 p.
IdentifiersURN: urn:nbn:se:liu:diva-75435ISRN: LIU-IDA/LITH-EX-G-12/002-SEOAI: oai:DiVA.org:liu-75435DiVA: diva2:506838
Subject / course
Computer and information science at the Institute of Technology