Please use this identifier to cite or link to this item: http://hdl.handle.net/1946/28907
With high demand of computation power the popularity of clusters of computers has risen throughout the years. Clusters are reactive systems where, using some resource management system, the resources of the cluster are allocated to user submitted jobs. Model checking has proven to be an effective way for analysing reactive systems. This thesis aims to analyse the performance of deCODE Genetics’s cluster using UPPAAL , a model checking tool, and to create a framework to compare the efficiency of different scheduling algorithms on differently sized clusters using SLURM as their resource management system. The framework is parameterized so other scheduling algorithms and cluster sizes can be added as well as changing the setting of SLURM. Although others have created such frameworks for analysing schedulability, they do not provide sufficient control for parameters concerning to the resource management systems. The framework presented in this thesis offers a more
specialized solution for analysing clusters using SLURM as their resource management system.
Five scheduling algorithms are analysed using the framework proposed in the thesis; they are First-Come-First-Served (FCFS), Shortest-Job-First (SJF), Round-Robin (RR), Shortest-Remaining-Time-First (SRTF) and the SLURM Backfilling algorithm. The scheduling algorithms were modelled and verified using synthetic analysis in UPPAAL.
After verifying the behaviour of the models, dynamic spawning of jobs was added to the models, where resource requirement and job duration were set using data collected from former jobs in deCODE’s cluster. Two analyses were then performed using these models. The first analysis compared the different algorithms, with focus on resource utilization, throughput and waiting time, on three different sizes of clusters under three different job loads. The results of the analysis confirm the effectiveness of SLURM’s backfilling algorithm. In the second analysis a job trace was collected from deCODE’s SLURM database and compared to the same job trace run in the models. The results of the second analysis show that certain abstractions made to the models do not conform with the behaviour of SLURM. These abstractions were concerned with the time SLURM uses to perform scheduling attempts and other computations and must be added in future work.