This case study explores how our services assisted a client operating in the travel and hospitality industry to optimize their AWS EKS (Elastic Kubernetes Service) infrastructure, resulting in a substantial 30% reduction in their monthly bill. By leveraging spot instances, selecting appropriate instance types, implementing failover mechanisms, configuring pod disruption budgets, enabling proper alerting, incorporating on-demand instances, and utilizing cluster overprovisioning, we were able to help our client achieve significant cost savings without compromising on performance or reliability.
Our client, a prominent player in the travel and hospitality industry, relied heavily on their AWS EKS cluster to support their critical applications and services. However, they faced challenges regarding cost optimization and ensuring high availability, which prompted them to seek our expertise.
The client's primary concerns revolved around the escalating costs associated with running their AWS EKS cluster. They were also keen on implementing robust failover mechanisms and maintaining a high level of availability to minimize service disruptions. The absence of proper alerting mechanisms for spot instance terminations further intensified their concerns.
To address the challenges faced by our client, we devised a comprehensive solution encompassing the following key elements:
Spot Instances and Instance Types:
We implemented AWS spot instances, which offer significant cost savings compared to on-demand instances. By carefully selecting the appropriate instance types based on the workload requirements, we maximized cost efficiency while ensuring optimal performance.
We implemented a failover strategy to enhance the cluster's resilience and minimize downtime. By distributing the workload across multiple nodes, we designed a failover mechanism that ensured at least three pods were running on distinct nodes at all times. This approach guaranteed redundancy and fault tolerance, safeguarding against single points of failure.
Pod Disruption Budget (PDB):
To further enhance the availability and stability of the cluster, we configured Pod Disruption Budgets for all the mission critical deployments. This feature provided fine-grained control over the number of pods that could be simultaneously disrupted during maintenance or spot instance terminations. By enforcing PDBs, we minimized service disruptions and improved overall cluster reliability.
We integrated alerting systems that sent real-time notifications to Slack channels whenever a spot instance termination event occurred. This proactive alerting mechanism allowed the operations team to analyze alerts on how frequently these instance types are terminating and take prompt action, such as changing the instance types based on the history of availability to handle sudden terminations.
While spot instances offer substantial cost savings, they come with a risk of sudden termination. To mitigate this risk and ensure uninterrupted operation of critical services, we added a minimum of 30% on-demand instances to the cluster. This hybrid approach provided a safety net by maintaining a guaranteed capacity to handle workload spikes or spot instance interruptions.
To further optimize the spot instance termination. We implemented cluster over-provisioner. This tools when configured a proper priority class in kubernetes. It setups a dummy deployment with a configurable capacity which reserves a pool of cpu and memory. So in case of any mission critical pods go down this cluster over-provisioner will supply the cpu and memory from its pool.
Results and Benefits:
Through the implementation of our solution, our client experienced several significant benefits:
By migrating to spot instances, selecting appropriate instance types, and implementing cost-effective failover mechanisms, our client achieved a remarkable 30% reduction in their monthly AWS EKS bill. This cost optimization allowed them to allocate resources to other areas of their business, fostering growth and innovation.
Enhanced Availability and Reliability:
The failover design, along with the implementation of Pod Disruption Budgets, contributed to improved availability and resilience. The three-pod distribution across different nodes minimized the risk of service disruptions, ensuring a seamless experience for their customers.
Proactive Spot Instance Termination Handling:
The alerting mechanism integrated with Slack enabled the operations team to respond promptly to spot instance terminations. This proactive approach minimized downtime and maintained uninterrupted service availability.
The hybrid approach of incorporating on-demand instances alongside spot instances provided the necessary capacity to handle workload spikes and mitigated the risks associated with spot instance interruptions. The cluster overprovisioner further optimized resource utilization, reducing unnecessary costs and maximizing efficiency.
Through a combination of spot instances, instance type selection, failover design, Pod Disruption Budgets, alerting mechanisms, hybrid on-demand instances, and cluster over-provisioner, our services successfully helped our travel and hospitality client achieve a significant 30% cost reduction in their monthly AWS EKS bill. The enhanced availability, reliability, and resource optimization further solidified their infrastructure, enabling them to focus on their core business while enjoying substantial cost savings.