Multiple Schedulers
In this tutorial we will discuss about multiple schedulers in Kubernetes.
We mainly discuss on different ways of manually scheduling a POD on a node, how to view scheduler related events.
As we know, the default scheduler has an algorithm that describes PODs across nodes evenly as well as takes into consideration, the various conditions we specify through taints and tolerations and node affinity etc.
But what if none of these satisfies your needs?
Say you have a specific application that requires its components to be placed on nodes after performing some additional checks.
So you decide to have your own scheduling algorithm to place PODs on nodes. So that you can add your own custom conditions and checks in it.
Kubernetes is highly extensible. You can write your own Kubernetes scheduler program, package it and deploy it as the default scheduler or an additional scheduler in the Kubernetes cluster.
That way all of the other applications can go through the default scheduler, however one specific application can use your custom scheduler.
Your Kubernetes Cluster can have multiple schedulers at the same time. When creating a POD or a deployment you can instruct Kubernetes to have the POD scheduled by a specific scheduler.
Deploy Additional Scheduler
We need to download the kube scheduler binary and run it as a service with a set of options.
$ wget https://storage.googleapis.com/kubernetes-release/release/v1.18.6/bin/linux/amd64/kube-scheduler
One of the option is scheduler name. If not specified it assumes in name of default scheduler.
$ sudo vi /etc/systemd/system/kube-scheduler.service [Service] ExecStart=/usr/local/bin/kube-scheduler \\ --config=/etc/kubernetes/config/kube-scheduler.yaml \\ --scheduler-name=default-scheduler
To deploy an additional scheduler, you can use the same kube-scheduler binary or use one that you might have built for yourself, which makes more sense.
In this case we are going to use same binary to deploy the additional scheduler. This time we set the scheduler name to a custom name.
$ sudo vi /etc/systemd/system/my-custom-kube-scheduler.service [Service] ExecStart=/usr/local/bin/kube-scheduler \\ --config=/etc/kubernetes/config/kube-scheduler.yaml \\ --scheduler-name=my-custom-kube-scheduler
This is important to differentiate the two schedulers and this is the name that we will be specifying in the POD. definition file later on.
Lets a take a look at how it works with the kubeadm tool.
The kubeadm tool deploys the scheduler as a POD. You can find the definition file. It uses under the manifest folder.
$ cat /etc/kubernetes/manifests/kube-scheduler.yaml apiVersion: v1 kind: Pod metadata: name: kube-scheduler namespace: kube-system spec: containers: - command: - kube-scheduler - --address=127.0.0.127 - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig - --leader-elect=true image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6 name: kube-scheduler
Please note that I have removed all the other details from the file. We only focus on key parts of the configuration.
We can create a custom scheduler by making a copy of the same file, and by changing the name of the scheduler.
$ cat /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml apiVersion: v1 kind: Pod metadata: name: kube-scheduler namespace: kube-system spec: containers: - command: - kube-scheduler - --address=127.0.0.127 - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig - --leader-elect=true - --scheduler-name=my-custom-kube-scheduler image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6 name: kube-scheduler
Finally an important option to look here is leader-elect option. The leader- elect option is used when you have multiple copies of the scheduler running on different master nodes.
If multiple copies of the same scheduler are running on different nodes, only one can be active at a time. That’s where the leader-elect option helps in choosing a leader who will lead scheduling activities.
To get multiple schedulers working, you must either set the leader-elect option to false, in case where you don’t have multiple masters.
In case you do have multiple masters, you can pass in an addition parameter to set a lock object name.
$ cat /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml apiVersion: v1 kind: Pod metadata: name: kube-scheduler namespace: kube-system spec: containers: - command: - kube-scheduler - --address=127.0.0.127 - --kubeconfig=/etc/kubernetes/scheduler.--kubeconfig - --leader-elect=true - --scheduler-name=my-custom-kube-scheduler - --lock-object-name=my-custom-kube-scheduler image: k8s.gcr.io/kube-scheduler-amd64:v1:18.6 name: kube-scheduler
Once done, create a POD using the kubectl create command. Run the get pods command in the kube-system name space and look for the new custom scheduler.
$ kubectl create -f /etc/kubernetes/manifests/my-custom-kube-scheduler.yaml $ kubectl get pods --namespace=kube-system
Make sure your custom scheduler pod is running state.
The next step is to configure a new POD or a deployment to use the new scheduler.
apiVersion: v1 kind: Pod metadata: name: nginx spec: containers: - name: nginx-container image: nginx schedulerName: my-custom-kube-scheduler
When the POD is created, the right scheduler picks it up to schedule. If the scheduler was not configured correctly, then the POD will continue to remain in Pending state.
So how do we know which scheduler picked up? View the events using the kubectl get events command and look for the scheduled events.