Fault Injection
In this tutorial, we are going to discuss about fault injection and how to test the resiliency of any application by injecting a fault dynamically within the application. To test as well as to inject the fault I don’t need to make the code change If I have Istio deployed within the application. I can dynamically inject the fault by deploying virtual services as a part of the proxy.
Let me go ahead and make the change to this specific application where I am going to make the delay for rating service to 7 seconds. This is the sample application where all the required destination rules, virtual services and gateway is deployed.
So let me access the application and check the result within the Jaeger UI. Let me search the traces.
So this the trace and I do have the response in 24.4 milliseconds. Now, I’m going to make changes to this specific application where the services can use only the version 1 within the subset.
At present all the versions are being used for example, within the reviews version 1, version 2 and version 3 will be used equally within the service reviews. I can confirm that by accessing the application n number of times.
I am getting the red star. Now, let me go ahead and refresh the screen.
I do not have any star. Let me go ahead and refresh the screen.
Now I am getting black color stars. So the request are equally getting distributed to all the 3 subsets.
Deploy Virtual Services
Now let me go ahead and apply following particular change where all the virtual services will be using only the version one within the subset.
root@cluster-node:~/istio-1.10.0# kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml
virtualservice.networking.istio.io/productpage created
virtualservice.networking.istio.io/reviews created
virtualservice.networking.istio.io/ratings created
virtualservice.networking.istio.io/details created
Let me allow few seconds for the change to get propagated to the proxy. Let me access the application.
I’m not getting any reviews. That means the request is getting into the version 1 of the reviews. So what ever the time that I am accessing always it will get into version 1.
Now let me go ahead and add the reviews to test version 2.
Deploy Virtual Services based on headers
Now, I’m going to make the change to the application so that the user with the name jason will be in a position to access version 2 of the reviews, all other user will be accessing version 1.
root@cluster-node:~/istio-1.10.0# cat samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
end-user:
exact: jason
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
So only for the user jason, that is when the user jason logged in automatically it is going to add a header with the key end-user and the value jason or otherwise it will route to the version one subset within the reviews. Let me go ahead and apply this particular YAML file.
root@cluster-node:~/istio-1.10.0# kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml
virtualservice.networking.istio.io/reviews configured
So it is applied. I can verify that within the Kiali UI. Let me get into the reviews, virtual service and get into the yaml file.
So here I do have the change where the header is having json will be routed to reviews with subset v2. All other users will be routed to version 1 of reviews. Now let me go ahead and access the application.
I have not logged in, so I am not getting any reviews because the request will get routed to version 1. Now, let me log in as the user jason, no password required. Here automatically it’s going to add the header.
So I’m getting the black color star that means version 2. So what ever the time that I’m accessing always I will be getting version 2. Now I’m going to test by injecting a delay to the ratings.
Inject Delay in ratings
I will be executing following specific YAML file where within the ratings I will be injecting a fault where it is going to inject a delay of 7 seconds.
root@cluster-node:~/istio-1.10.0# cat samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- match:
- headers:
end-user:
exact: jason
fault:
delay:
percentage:
value: 100.0
fixedDelay: 7s
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
And that will be injected only for the request having the header end user as jason. Otherwise it’s going to send the response from version 1 from ratings. Of course, within ratings we do have only one version.
So all I’m going to concentrate is this particular time delay that I am going to inject. Already we executed this specific scenario as a part of a request timeout use cases.
What’s going to happen?
Now, let me go ahead and verify what’s going to happen. So all I’m going to do is I’m going to inject a time delay of 7 seconds for ratings. If reviews from version 2 is accessing it, I will be waiting for 10 seconds. So within seven seconds, the ratings will give the response and of course, the response will be given back to the product page.
But the product page is having a timeout of 3 seconds. So what will happen immediately after 3 seconds, it’s going to give the response back by timing out. And it will give another try. In the next try also, it will not be in a position to get the response.
So it’s going to give the error message. In case if the user is not jason then the reviews version one will be accessing the ratings and that’s going to have a time out of 2.5 seconds. In that case also the ratings will not be position to get the response and reviews will timeout.
Simple Demo
Let’s go ahead and do the demo and verify within the jaeger UI. Let me go ahead and execute the above particular yaml file where it is going to inject a fault of 7 seconds delay only for the user jason. Let me execute the yaml file and allow couple of seconds.
root@cluster-node:~/istio-1.10.0# kubectl apply -f samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml
virtualservice.networking.istio.io/ratings configured
Now I can verify the change within the virtual services from kiali web UI. Let me get into the ratings. And open the yaml file.
Here I do have the change. Now let me go ahead and access the application. I’m accessing the application and it took Around 6 seconds because the product page it’s going to try 2 times. And it will get timed out so the entire review got errored out.
So the fix for this is either I need to change the wait time for the product page to wait for more duration from the reviews or make the code change so that the response time can be less or I can go ahead and make the version 3 which is working fine to respond to all the users.
And in the meantime when I’m doing the fix what I can do I can inject a specific fault from the ratings so that the other functionality can continue to work. And it will not impact the microservices so that the impact on the microservices will be less till I am making a fix for this particular bug.
Inject Fault
Let me go ahead and apply this particular YAML file where I’m going to inject a fault where It is going to abort 100 percent of the request with the http error 500.
root@cluster-node:~/istio-1.10.0# cat samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- match:
- headers:
end-user:
exact: jason
fault:
abort:
percentage:
value: 100.0
httpStatus: 500
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
Let me go ahead and apply this particular virtual service for the ratings.
root@cluster-node:~/istio-1.10.0# kubect apply -f samples/bookinfo/networking/virtual-service-ratings-test-abort.yaml
virtualservice.networking.istio.io/ratings created
Now let me go ahead and verify the same within kiali UI getting into the ratings virtual service view the yaml file.
Here I do have the change where it is going to provide 500 error message. Let me go ahead and access the page.
Now I’m getting the response from the reviews but ratings it provided the response saying rating service is unavailable. Earlier just because the ratings was taking longer time it was impacting the reviews as well.
Now, I have injected a fault within the ratings virtual services. So that I can get the response from the reviews and once the ratings micro service is fixed with the error I can go ahead and remove this particular fault that we had injected.
So this particular fault injection is going to be very helpful till we are getting a fix for the actual micro services as well as to test the resiliency of the application, how the application behaves when a fault is injected into the system.
Summary
So in a quick summary, we learnt about how to inject the faults dynamically into the virtual services to test the resiliency of the application.