MORU


Date

August 25th, 2024

Client

E-commerce

Services

GITHUB ACTIONS CI/CD
AWS EKS
Helm Charts

Building a Production-Ready EKS Cluster with Terraform

Introduction

Amazon Elastic Kubernetes Service (EKS) has become a go-to solution for organizations looking to run containerized applications at scale. While AWS provides a managed Kubernetes control plane, setting up a production-ready EKS environment involves numerous components and best practices that need careful consideration.

In this article, I’ll walk through a complete Terraform implementation that automates the deployment of a production-grade EKS environment. This implementation is available in my GitHub repository: https://github.com/prasad-moru/AWS_EKS_TF.

Project Objectives

The main goals of this implementation are:

  1. Infrastructure as Code: Manage the entire AWS EKS infrastructure using Terraform
  2. Multi-Environment Support: Enable deployment across development, staging, and production environments
  3. Modularity: Create reusable Terraform modules for key components
  4. Security: Implement AWS security best practices including least privilege IAM roles
  5. Networking: Configure proper VPC segmentation with public and private subnets
  6. Add-ons: Include essential components like EBS CSI Driver and ALB Ingress Controller
  7. Container Registry: Set up ECR repositories with appropriate lifecycle policies

Architecture Overview

The architecture follows AWS best practices with the following components:

Project Structure

The repository is organized with a modular approach:

.
├── .gitignore
├── README.md
├── backend.tf              # Terraform state configuration
├── main.tf                 # Main entry point
├── outputs.tf              # Output definitions
├── providers.tf            # Provider configurations
├── variables.tf            # Variable definitions
├── modules/                # Reusable modules
│   ├── vpc/                # VPC configuration
│   ├── eks/                # EKS cluster configuration
│   ├── ebs-csi/            # EBS CSI Driver configuration
│   ├── alb-ingress/        # ALB Ingress Controller
│   └── ecr/                # ECR repositories
├── environments/           # Environment-specific configurations
│   ├── development/
│   ├── staging/
│   └── production/
└── kubernetes-workloads/   # Sample Kubernetes manifests

Key Components

VPC Configuration

The VPC module creates a network foundation with proper segmentation:

module "vpc" {
  source = "./modules/vpc"

  name             = local.name
  cidr             = var.vpc_cidr
  azs              = local.azs
  subnet_cidr_bits = var.subnet_cidr_bits
  cluster_name     = local.cluster_name
  tags             = local.vpc_tags
}

Key features:

EKS Cluster

The EKS module provisions a managed Kubernetes control plane:

module "eks" {
  source = "./modules/eks"
  
  cluster_name           = local.cluster_name
  cluster_version        = var.eks_version
  vpc_id                 = module.vpc.vpc_id
  private_subnet_ids     = module.vpc.private_subnet_ids
  public_subnet_ids      = module.vpc.public_subnet_ids
  cluster_role_arn       = aws_iam_role.cluster.arn
  node_security_group_id = aws_security_group.nodes.id
  enable_core_addons     = false
  tags                   = local.eks_tags
}

The node groups are created with auto-scaling capabilities:

resource "aws_eks_node_group" "this" {
  cluster_name    = module.eks.cluster_name
  node_group_name = local.node_group_name
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = module.vpc.private_subnet_ids

  scaling_config {
    desired_size = var.eks_node_desired_size
    max_size     = var.eks_node_max_size
    min_size     = var.eks_node_min_size
  }

  # Additional configuration...
}

IAM Security

The implementation follows AWS security best practices by creating dedicated IAM roles with least privilege:

resource "aws_iam_role" "cluster" {
  name = "${local.cluster_name}-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })

  # Tags and additional configuration...
}

For the worker nodes:

resource "aws_iam_role" "node" {
  name = "${local.node_group_name}-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
        Action = "sts:AssumeRole"
      }
    ]
  })

  # Tags and additional configuration...
}

The implementation also includes OIDC provider integration, allowing Kubernetes service accounts to assume IAM roles:

resource "aws_iam_openid_connect_provider" "eks" {
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = [data.tls_certificate.eks.certificates[0].sha1_fingerprint]
  url             = module.eks.cluster_identity_oidc_issuer

  # Tags and additional configuration...
}

Storage with EBS CSI Driver

The EBS CSI Driver module enables Kubernetes persistent volumes:

module "ebs_csi" {
  source = "./modules/ebs-csi"
  
  cluster_name      = module.eks.cluster_name
  oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
  oidc_provider_url = aws_iam_openid_connect_provider.eks.url
}

This module:

ALB Ingress Controller

For external access to Kubernetes services, the ALB Ingress Controller is configured:

module "alb_ingress" {
  source = "./modules/alb-ingress"
  count  = var.enable_alb_ingress ? 1 : 0
  
  cluster_name      = module.eks.cluster_name
  vpc_id            = module.vpc.vpc_id
  oidc_provider_arn = aws_iam_openid_connect_provider.eks.arn
  oidc_provider_url = aws_iam_openid_connect_provider.eks.url
}

This module handles:

Container Registry (ECR)

To store container images, ECR repositories are created:

module "ecr" {
  source = "./modules/ecr"
  
  repository_names     = var.ecr_repository_names
  image_tag_mutability = var.ecr_image_tag_mutability
  scan_on_push         = var.ecr_scan_on_push
  enable_lifecycle_policy = var.ecr_enable_lifecycle_policy
  max_image_count      = var.ecr_max_image_count
  node_role_arn        = aws_iam_role.node.arn
  
  # Tags and additional configuration...
}

Key features:

Multi-Environment Support

The repository supports multiple environments (development, staging, production) through environment-specific directories:

environments/
├── development/
├── staging/
└── production/

Each environment has its own:

This approach enables consistent infrastructure deployment across environments while allowing for environment-specific customizations.

State Management

The project uses an S3 backend with DynamoDB locking for state management:

terraform {
  backend "s3" {
    bucket         = "aws-eks-tt-automation"
    key            = "environments/development/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}

This provides:

Deployment Process

To deploy the infrastructure:

  1. Clone the repository: git clone https://github.com/prasad-moru/AWS_EKS_TF.git cd AWS_EKS_TF
  2. Create an S3 bucket and DynamoDB table for state management: aws s3api create-bucket --bucket aws-eks-tt-automation --region us-east-1 aws dynamodb create-table \ --table-name terraform-state-lock \ --attribute-definitions AttributeName=LockID,AttributeType=S \ --key-schema AttributeName=LockID,KeyType=HASH \ --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \ --region us-east-1
  3. Create a terraform.tfvars file with your configurations: aws_access_key = "your-access-key" aws_secret_key = "your-secret-key" region = "us-east-1" project = "YourProject" environment = "dev"
  4. Initialize, plan, and apply: terraform init terraform plan terraform apply
  5. Configure kubectl to connect to your cluster: aws eks update-kubeconfig --name <cluster-name> --region <region>

Sample Kubernetes Deployments

The repository includes sample Kubernetes manifests for testing:

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

These can be deployed with:

kubectl apply -f kubernetes-workloads/nginx-deployment.yaml
kubectl apply -f kubernetes-workloads/nginx-service.yaml
kubectl apply -f kubernetes-workloads/nginx-ingress.yaml

Security Hardening

The implementation includes several security best practices:

  1. Network Segmentation: Nodes are in private subnets with controlled access
  2. IAM Least Privilege: Roles with minimum necessary permissions
  3. Security Groups: Tightly controlled communication paths
  4. OIDC Federation: Service-account level permissions
  5. ECR Scanning: Automatic vulnerability scanning
  6. Private Endpoint Access: API server accessible within VPC

Monitoring and Management

While not fully implemented, the repository structure supports integration with monitoring tools:

monitoring/
└── grafana/
    └── dashboards/
        └── kubernetes-overview.json

This can be expanded to include CloudWatch, Prometheus, and other observability solutions.

Customization Options

The implementation is highly configurable through variables:

variable "eks_node_instance_types" {
  description = "EC2 instance types for EKS node groups"
  type        = list(string)
  default     = ["t3.medium"]
}

variable "eks_node_desired_size" {
  description = "Desired number of worker nodes"
  type        = number
  default     = 2
}

This allows for easy customization without modifying the core code.

Conclusion

This EKS Terraform implementation provides a solid foundation for running containerized workloads on AWS. It follows best practices for security, scalability, and maintainability, making it suitable for both development and production use.

Key takeaways:

  1. Infrastructure as Code enables consistent, repeatable deployments
  2. Modular design allows for reusability and clear separation of concerns
  3. Multi-environment support facilitates the development lifecycle
  4. Security is implemented at multiple layers
  5. Core Kubernetes add-ons are included for production readiness

For more details and to contribute, visit the GitHub repository: https://github.com/prasad-moru/AWS_EKS_TF.


References

Kubernetes Documentation

Amazon EKS Documentation

Terraform AWS Provider Documentation

AWS Well-Architected Framework

Back to top