Consider a computational cluster composed of a job dispatcher device and
each one equipped with its own processing and memory resources.
The goal is to develop a dispatching algorithm and scheduling algorithms to achieve the best mean job response time.
Baseline algorithms:
Least Work Left
dispatching (LWL) and First Come First Served
scheduling for servers (FCFS)
- Each server has the same processing power µ, expressed in GNCU.
Hence, the service time of a task is$x = CPU/µ$ . - Each server has a memory amount of 1 GNMU.
- Pre-emption is allowed as well as server sharing, if deemed useful.
- It is not allowed to kill a job or a task, i.e., all tasks must be worked out eventually.
- The dispatcher and servers have knowledge of the exact service time of a task upon its arrival.
- Delay and loss are negligible for message exchange among the dispatcher and the servers.
- Each running task is assigned the memory space it requires as long as it is running.
Swapping a task from running to standby and back requires negligible time. - At any given time, the sum of all assigned memory workspaces to running tasks on a given server shall not exceed the overall memory of that server.
Constraints : µ = 0.1 , N = 64
Metrics used to evaluate the performance of the computational cluster:
-
Job Response Time (R):
Job response time is defined as the time elapsed from the arrival of the first task of a job until all tasks belonging to that job have been fully served. The mean job response time ($\bar{R}$ ) is obtained by averaging the response times of all jobs. -
Job Slowdown (R):
Job slowdown is calculated as the ratio of the response time of a job to the sum of the service times of all tasks belonging to that job. The mean job slowdown ($\bar{S}$ ) is obtained by averaging the slowdown values of all jobs. -
Utilization Coefficient (ρk):
The utilization coefficient of server k (ρk) represents the fraction of time that server k is busy serving tasks. The overall mean utilization coefficient (ρ) is calculated as the average of ρk values for all servers: ρ = (ρ1 + ρ2 + ... + ρN) / N. -
Messaging Load (L):
Messaging load refers to the number of messages exchanged between the dispatcher and servers for a given task dispatching. The mean message load ($\bar{L}$ ) is obtained by averaging the message load values of all tasks.
The workload for the computational cluster is described by a dataset obtained from measurements on a production data center of Google, which is publicly available and can be downloaded from here.
The dataset is a five-column table in CSV format, comprising
Each column represents the following information:
-
Job_ID:
An integer number representing the identifier of a job. -
Task_ID:
An integer number between$0$ and nj representing the identifier of tasks belonging to job j. -
tₐ:
Arrival time of a task measured in microseconds. -
CPU:
Running time in seconds required to run the task on a Google Normalized Computing Unit (GNCU). -
Memory:
Amount of memory required to run the task, expressed in Google Normalized Memory Unit (GNMU).