Multiple Schedulers

In this tutorial we will discuss about multiple schedulers in Kubernetes.

We mainly discuss on different ways of manually scheduling a POD on a node, how to view scheduler related events.

Running Multiple Schedulers in Kubernetes

As we know, the default scheduler has an algorithm that describes PODs across nodes evenly as well as takes into consideration, the various conditions we specify through taints and tolerations and node affinity etc.

But what if none of these satisfies your needs?

Say you have a specific application that requires its components to be placed on nodes after performing some additional checks.

So you decide to have your own scheduling algorithm to place PODs on nodes. So that you can add your own custom conditions and checks in it.

Kubernetes is highly extensible. You can write your own Kubernetes scheduler program, package it and deploy it as the default scheduler or an additional scheduler in the Kubernetes cluster.

That way all of the other applications can go through the default scheduler, however one specific application can use your custom scheduler.

Your Kubernetes Cluster can have multiple schedulers at the same time. When creating a POD or a deployment you can instruct Kubernetes to have the POD scheduled by a specific scheduler.

Deploy Additional Scheduler

We need to download the kube scheduler binary and run it as a service with a set of options.

$ wget https://storage.googleapis.com/kubernetes-release/release/v1.18.6/bin/linux/amd64/kube-scheduler

One of the option is scheduler name. If not specified it assumes in name of default scheduler.

$ sudo vi /etc/systemd/system/kube-scheduler.service 

[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
  --config=/etc/kubernetes/config/kube-scheduler.yaml \\
  --scheduler-name=default-scheduler

To deploy an additional scheduler, you can use the same kube-scheduler binary or use one that you might have built for yourself, which makes more sense.

In this case we are going to use same binary to deploy the additional scheduler. This time we set the scheduler name to a custom name.

$ sudo vi /etc/systemd/system/my-custom-kube-scheduler.service 

[Service]
ExecStart=/usr/local/bin/kube-scheduler \\
  --config=/etc/kubernetes/config/kube-scheduler.yaml \\
  --scheduler-name=my-custom-kube-scheduler

This is important to differentiate the two schedulers and this is the name that we will be specifying in the POD. definition file later on.

Lets a take a look at how it works with the kubeadm tool.

The kubeadm tool deploys the scheduler as a POD. You can find the definition file. It uses under the manifest folder.

$ cat /etc/kubernetes/manifests/kube-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command: 
    - kube-scheduler
    - --address=127.0.0.127
    - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig
    - --leader-elect=true
    image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6
    name: kube-scheduler

Please note that I have removed all the other details from the file. We only focus on key parts of the configuration.

We can create a custom scheduler by making a copy of the same file, and by changing the name of the scheduler.

$ cat /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command: 
    - kube-scheduler
    - --address=127.0.0.127
    - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig
    - --leader-elect=true
    - --scheduler-name=my-custom-kube-scheduler
    image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6
    name: kube-scheduler

Finally an important option to look here is leader-elect option. The leader- elect option is used when you have multiple copies of the scheduler running on different master nodes.

If multiple copies of the same scheduler are running on different nodes, only one can be active at a time. That’s where the leader-elect option helps in choosing a leader who will lead scheduling activities.

To get multiple schedulers working, you must either set the leader-elect option to false, in case where you don’t have multiple masters.

In case you do have multiple masters, you can pass in an addition parameter to set a lock object name.

$ cat /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml

apiVersion: v1
kind: Pod
metadata:
  name: kube-scheduler
  namespace: kube-system
spec:
  containers:
  - command: 
    - kube-scheduler
    - --address=127.0.0.127
    - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig
    - --leader-elect=true
    - --scheduler-name=my-custom-kube-scheduler
    - --lock-object-name=my-custom-kube-scheduler
    image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6
    name: kube-scheduler

Once done, create a POD using the kubectl create command. Run the get pods command in the kube-system name space and look for the new custom scheduler.

$ kubectl create -f /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml


$ kubectl get pods --namespace=kube-system

Make sure your custom scheduler pod is running state.

The next step is to configure a new POD or a deployment to use the new scheduler.

apiVersion: v1
kind: Pod
metadata:
  name: nginx

spec:
  containers:
    - name: nginx-container
      image: nginx
      
  schedulerName: my-custom-kube-scheduler

When the POD is created, the right scheduler picks it up to schedule. If the scheduler was not configured correctly, then the POD will continue to remain in Pending state.

So how do we know which scheduler picked up? View the events using the kubectl get events command and look for the scheduled events.

Multiple Schedulers