What is Kubernetes HPA and How Can It Help You Save on the Cloud?
Autoscaling is a core capability of Kubernetes. The tighter you configure the scaling mechanisms - HPA, VPA, and Cluster Autoscaler - the lower the waste and costs of running your application.
Kubernetes comes with three types of autoscaling mechanisms: Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler. Each of these adds a unique ingredient to your overarching goal of autoscaling for cloud cost optimization.
In this article, we focus on horizontal pod autoscaling. You can control how many pods run based on various metrics by configuring the Horizontal Pod Autoscaler (HPA) settings on your cluster. This gives you the ability to scale up or down according to demand.
Keep reading to learn what Kubernetes HPA is and how it works in a hands-on example.
Lets start with a quick recap of Kubernetes autoscaling
Before diving into Horizontal Pod Autoscaler (HPA), lets look at Kubernetes autoscaling mechanisms.
Kubernetes supports three types of autoscaling:
- Horizontal Pod Autoscaler (HPA), which scales the number of replicas of an application.
- Vertical Pod Autoscaler (VPA), which scales the resource requests and limits of a container.
- Cluster Autoscaler, which adjusts the number of nodes of a cluster.
These autoscalers work on one of two Kubernetes levels: pod and cluster. While Kubernetes HPA and VPA methods adjust resources at the pod level, the Cluster Autoscaler scales up or down the number of nodes in a cluster.
What is Kubernetes Horizontal Pod Autoscaler (HPA)?
In many applications, usage changes over time - for example, more people visit an e-commerce store in the evening than around noon. When the demands of your application change, you can use the Horizontal Pod Autoscaler (HPA) to add or remove pods automatically based on CPU utilization.
HPA makes autoscaling decisions based on metrics that you provide externally or custom metrics.
To get started, you need to define how many replicas should run at any given time using the MIN and MAX values. Once configured, the Horizontal Pod Autoscaler controller takes care of checking metrics and making adjustments as necessary. It checks metrics every 15 seconds by default.
How does Horizontal Pod Autoscaler work?
Configuring the HPA controller will monitor your deployment's pods and understand whether the number of pod replicas needs to change. To determine this, HPA takes a weighted mean of a per-pod metric value and calculates whether removing or adding replicas would bring that value closer to its target value.
Example scenario
Imagine that your deployment has a target CPU utilization of 50%. You currently have five pods running there, and the mean CPU utilization is 75%. In this scenario, the HPA controller will add three replicas to bring the pod average closer to the target of 50%.
When to use Kubernetes HPA?
Horizontal Pod Autoscaler is an autoscaling mechanism that comes in handy for scaling stateless applications. But you can also use it to support scaling stateful sets.
To achieve cost savings for workloads that experience regular changes in demand, use HPA in combination with cluster autoscaling. This will help you reduce the number of active nodes when the number of pods decreases.
Limitations of Horizontal Pod Autoscaler
Note that HPA comes with some limitations:
- It might require architecting your application with a scale-out in mind so that distributing workloads across multiple servers is possible.
- HPA might not always keep up with unexpected demand peaks since new virtual machines may take a few minutes to load.
- If you fail to set CPU and memory limits on your pods, they may frequently terminate or waste resources if you choose to do the opposite.
- If the cluster is out of capacity, HPA cannot scale up until new nodes are added to the cluster. Cluster Autoscaler (CA) can automate this process.
What do you need to run Horizontal Pod Autoscaler?
Horizontal Pod Autoscaler (HPA) is a feature of the Kubernetes cluster manager that watches the CPU usage of pod containers, automatically resizing them as necessary to maintain a target level of utilization.
To do that, HPA requires a source of metrics. For example, when scaling based on CPU usage, it uses metrics-server. If you want to use custom or external metrics for HPA scaling, you need to deploy a service implementing the custom.metrics.k8s.io API or external.metrics.k8s.io API; this provides an interface with a monitoring service or metrics source.
Custom metrics include network traffic, memory, or any value that relates to the pod's application. And if your workloads use the standard CPU metric, make sure to configure the CPU resource limits for containers in the pod spec.
Expert tips for running Kubernetes HPA
1. Install metrics-server
Kubernetes HPA needs to access per-pod resource metrics to make scaling decisions. These values are retrieved from the metrics.k8s.io API provided by the metrics-server.
2. Configure resource requests for all pods
Another key source of information for HPAs scaling decisions is observed CPU utilization values of pods. But how are these values calculated? They are a percentage of the resource requests from individual pods.
If you miss resource request values for some containers, these calculations might become entirely inaccurate and lead to suboptimal operation and poor scaling decisions. Thats why its worth configuring resource request values for all containers of every pod thats part of the Kubernetes controller scaled by the HPA.
3. Configure custom and external metrics
Custom metrics
You can configure Horizontal Pod Autoscaler (HPA) to scale based on custom metrics, which are internal metrics that you collect from your application. HPA supports two types of custom metrics:
- Pod metrics - averaged across all the pods in an application, which support only the target type of AverageValue.
- Object metrics - metrics describing any other object in the same namespace as your application and supporting target types of Value and AverageValue.
Remember to use the correct target type for pod and object metrics when configuring custom metrics.
External metrics
These metrics allow HPA to autoscale applications based on metrics that are provided by third-party monitoring systems. External metrics support target types of Value and AverageValue.
When deciding between custom and external metrics, go for custom metrics because securing an external metrics API is more difficult than getting an internal one.
4. Verify that your HPA and VPA policies dont clash
Vertical Pod Autoscaler automates requests and limits configuration, reducing overhead and achieving cost savings. Horizontal Pod Autoscaler, on the other hand, aims to scale out rather than up or down.
Double-check that your binning and packing density settings aren't in conflict with each other when designing clusters for business or purpose-class tier of service.
5. Use instance weighting scores
Suppose one of your workloads ends up consuming more than it requested. Is this happening because the resources are needed? Or did the workload consume them because they were available but not critically required?
Use instance weighting when choosing instance sizes and types for autoscaling. Instance weighting is useful, especially when you adopt a diversified allocation strategy and use spot instances.
Example: HPA demo
As this is one of core Kubernetes features, the cloud service provider we use shouldnt matter. But for this example we will be using GKE.
You can create a cluster via the UI, or via the gcloud utils like so:
gcloud container \--project "[your-project]" clusters create "[cluster-name]" \--release-channel None \--zone "europe-west3-c" \--node-locations "europe-west3-c" \--machine-type "e2-standard-2" \--image-type "COS_CONTAINERD" \--disk-size "50" \--enable-autorepair \--num-nodes "3"
We can then connect to the cluster using:
gcloud container clusters get-credentials [cluster-name] --zone europe-west3-c --project [your-project]
This should also switch your context to the cluster, so whenever you use kubectl, youll be within this clusters context.
After weve done that, we can verify that we can see the nodes with
> kubectl get nodesNAME STATUS ROLES AGE VERSIONgke-valdas-1-default-pool-cf1cd6be-cvc4 Ready <none> 4m50s v1.22.11-gke.400gke-valdas-1-default-pool-cf1cd6be-q62h Ready <none> 4m49s v1.22.11-gke.400gke-valdas-1-default-pool-cf1cd6be-xrf0 Ready <none> 4m50s v1.22.11-gke.400
GKE comes with the metrics server preinstalled; we can verify that using
> kubectl get pods -n kube-system | grep metricsgke-metrics-agent-n92rl 1/1 Running 0 7m34sgke-metrics-agent-p5d49 1/1 Running 0 7m33sgke-metrics-agent-tf96r 1/1 Running 0 7m34smetrics-server-v0.4.5-fb4c49dd6-knw6v 2/2 Running 0 7m20s
Note, if your cluster doesnt have a metrics server, you can easily install it using one command:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
You can find more info here: https://github.com/kubernetes-sigs/metrics-server
We can also use the `top` command to verify that the metrics are collected:
> kubectl top pods -ANAMESPACE NAME CPU(cores) MEMORY(bytes)kube-system event-exporter-gke-5479fd58c8-5rt4b 1m 12Mikube-system fluentbit-gke-5r6v5 5m 22Mikube-system fluentbit-gke-nwcx9 4m 25Mikube-system fluentbit-gke-zz6gl 4m 25Mikube-system gke-metrics-agent-n92rl 2m 26Mikube-system gke-metrics-agent-p5d49 3m 27Mikube-system gke-metrics-agent-tf96r 3m 26Mikube-system konnectivity-agent-6fbc8c774c-4vvff 1m 6Mikube-system konnectivity-agent-6fbc8c774c-lsk26 1m 7Mikube-system konnectivity-agent-6fbc8c774c-rnvp2 2m 6Mikube-system konnectivity-agent-autoscaler-555f599d94-f2w5n 1m 4Mikube-system kube-dns-85df8994db-lvf55 2m 30Mikube-system kube-dns-85df8994db-mvxv5 2m 30Mikube-system kube-dns-autoscaler-f4d55555-pctcj 1m 11Mikube-system kube-proxy-gke-valdas-1-default-pool-cf1cd6be-cvc4 1m 22Mikube-system kube-proxy-gke-valdas-1-default-pool-cf1cd6be-q62h 1m 23Mikube-system kube-proxy-gke-valdas-1-default-pool-cf1cd6be-xrf0 1m 26Mikube-system l7-default-backend-69fb9fd9f9-fctch 1m 1Mikube-system metrics-server-v0.4.5-fb4c49dd6-knw6v 27m 24Mikube-system pdcsi-node-mf6pt 2m 9Mikube-system pdcsi-node-sxdrg 3m 9Mikube-system pdcsi-node-wcnw7 3m 9Mi
Now, lets create a single replica deployment with resource requests and limits:
apiVersion: apps/v1kind: Deploymentmetadata:name: hpa-demolabels:app: nginxspec:replicas: 1selector:matchLabels:app: nginxtemplate:metadata:labels:app: nginxspec:containers:- name: nginximage: k8s.gcr.io/nginx-slim:0.8 ports:- containerPort: 80resources:requests:cpu: 200mlimits:cpu: 1000m
Lets save this to a demo.yaml and run
> kubectl apply -f demo.yaml
We can check that its deployed via
> kubectl get deploy -n defaultNAME READY UP-TO-DATE AVAILABLE AGEhpa-demo 1/1 1 1 17m
Lastly, before configuring HPA, we need to expose a service that we can call to increase the load, which HPA will act upon.
Lets create a service.yaml like so:
apiVersion: v1kind: Servicemetadata:name: hpa-demolabels:app: nginxspec:ports:- port: 80selector:app: nginx
And apply it using
> kubectl apply -f service.yaml
We can verify that the service is working like so:
> kubectl get servicesNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEhpa-demo ClusterIP 10.92.62.191 <none> 80/TCP 7m35skubernetes ClusterIP 10.92.48.1 <none> 443/TCP 48m
As you can see, the hpa-demo service exists.
Finally, we need to configure HPA. To do that, we can create a file called hpa.yaml and fill it in with:
apiVersion: autoscaling/v1kind: HorizontalPodAutoscalermetadata:name: hpa-demospec:scaleTargetRef:apiVersion: apps/v1kind: Deploymentname: hpa-demominReplicas: 1maxReplicas: 5targetCPUUtilizationPercentage: 60
Once again, we apply it using
> kubectl apply -f hpa.yamlNow, lets watch HPA in action using>kubectl get hpa -wNAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEhpa-demo Deployment/hpa-demo 0%/60% 1 5 1 2m31s
As you can see, nothing is happening. But remember how we created a service earlier? Lets start generating some load using busybox:
> kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://hpa-demo; done"
We should immediately see a load increase under our HPA watch command:
hpa-demo Deployment/hpa-demo 16%/60% 1 5 1 8m47s
The load is still not enough to reach our target, but for the sake of this demo, lets set the targetCPUUtilizationPercentage: 10 and re-apply the hpa.yaml.
Under the watch command, we should see that our target is exceeded and a replica gets added.
hpa-demo Deployment/hpa-demo 16%/10% 1 5 1 15mhpa-demo Deployment/hpa-demo 16%/10% 1 5 2 15mhpa-demo Deployment/hpa-demo 9%/10% 1 5 2 16m
We can verify this via
> kubectl get deployNAME READY UP-TO-DATE AVAILABLE AGEhpa-demo 2/2 2 2 40m
As you can see, we have two pods while the deployment specified only one - so our HPA policy worked.
We can proceed to test downscaling by deleting the load-gen pod:
> kubectl delete pod load-generator
After a while, we should see:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGEhpa-demo Deployment/hpa-demo 0%/10% 1 5 1 26m
This means that downscaling worked.
We can verify that using:
> kubectl get deployNAME READY UP-TO-DATE AVAILABLE AGEhpa-demo 1/1 1 1 48m
Thats basically it. Weve seen how we can easily upscale and downscale based on pod resource usage.
Gain real-time cost visibility when using HPA
Increased scalability poses a challenge to cost monitoring and control in Kubernetes because autoscalers constantly adjust capacity.
CAST AI provides a free cost monitoring product you can use to get an hourly, daily, weekly, and monthly overview of your cloud cost.
Connect your cluster in one minute or less to instantly see your current costs in real time and access months of past cost data for comprehensive reporting.
CAST AI clients save an average of 63% on their Kubernetes bills
Connect your cluster and see your costs in 5 min, no credit card required.
Original Link: https://dev.to/castai/once-again-thank-you-for-your-response-and-if-anything-just-go-ahead-and-ask-me-looking-forward-to-hearing-back-from-you-soon-1b
Dev To
An online community for sharing and discovering great ideas, having debates, and making friendsMore About this Source Visit Dev To