WHOLEPRO: An Online, Holistic Job Scheduling and Resource Provisioning Framework for Datacenter Architectures and Applications
WHOLWPRO is a top-down, holistic approach, aiming at meet diverse user Quality-of-Experience (QoE) requirements for individual customers, while allowing for user-utility-aware fair resource allocation, called the dual goal in this project. It starts from the very top, i.e., the QoE or user utility requirements at the user level. It then works its way downward, by mapping user-level requirements into datacenter system-level resource demands, which, together with the datacenter computing and networking resource constraints, set the boundary conditions for a user-utility-aware optimization framework. The solution to this framework is be composed of a set of host-based computing and flow controllers, and in-network load balancers. These controllers and load balancers not only ensure that QoE requirements for individual users are met, but also work in concert to enable user-utility-aware fair resource allocation among all the users, hence achieving the dual goal.interconnected distributed datacenter.
The WHOLEPRO is a resource allocation framework, upon which job schedulers can be developed to provide tail-latency SLO guarantee,
high resource utilization, and fair resource sharing. It is a top-down approach, taking the entire system stack into account.
It decouples the upper job-level design from the lower task-level or runtime-system design. The solution is composed of a set of fully distributed,
host-based task compute and flow controllers.
The above figure shows the job scheduling framework, applicable to both centralized and distributed job scheduling. A job scheduler, J-S, runs in a master node and distributed task schedulers, T-S’s, run in individual workers in the cluster, each of which is mainly composed of a computing controller, C-C, and a flow scheduler, F-C, per flow emitted from the worker.
Consider job j with fanout Nj and the n-th tail latency SLO Tn arriving at a job master. The master calculates the corresponding task response time budget ( Ej , Vj ) for each task. The task response time budget is divided into networking budget ( Efj , Vfj ) and compute budget ( Ecj , Vcj ). Then a network flow control and a task scheduling control scheme are developed to meet the networking and computimg time budgets.
Utility Maximization (UM) Framework
An optimization network utility maximization (NUM) framework is to maximize the total user utility while maintaining minimum
user utility as a flow contraint. For each task with a networking utility finction u
HOLNET: Holistic, user-utility-based flow rate allocation framework
HOLNET introduces a HOLNET network utility maximization(NUM),a soft minimum user-utility guaranteed, center-of-utility-fairness-based
NUM. It covers a large solution design space, allowing for integrated, multi-Class-of-Service(CoS) enabled, host-basedsingle/multipath
congestion control and in-network loadbalancing.
The different user utilities are mapped to weighted base user utility. The weight for different user utilies are based on the center-of-utility, and hence achieving a center-of-utility fair flow rate allocation, provided that the minimum flow rates to sustain the minimum user utilities are satisfied.
TLG: tail-latency-SLO-guaranteed job scheduler
In TLG, a decomposition technique is employed to translate the jobtail-latency SLO into task-level performance budgets
for individual tasks spawned by the job. This effectively decomposes a hard job-level cotask/coﬂow resource allocation
problem into distributed task-level resource allocation subproblems for individual tasks and task ﬂows. At the task level,
a utility-maximization framework(UM) isproposed to enable joint task compute and ﬂow fair resource sharing, subject to task
budget constraints. The solution to this UM is in the form of distributed task compute and ﬂow controllers that work in
concert to achieve the three design objectives.