Deploy an Elasticsearch cluster for Kubernetes (ECK) on Google Compute Platform (GCP on GKE) with Terraform – Part I
This will be a very technical post but I think that is gonna be also quite interesting if you are working with cloud technologies.
Elasticsearch is a pretty nice technology widely used on big data stuff, analysis and so on. However, this tool is heavy and little bit difficult to deploy and maintain on healthy status.
I'm working a lot with Google Compute Platform (GCP) that's why I decided to include this part as well.
First things first
If you don't have a GCP account, is pretty straightforward to get one, even with some free usage, Google will give you 300 dollars to spend on it... by previous registration with your credit card 😉 go ahead and do it: https://console.cloud.google.com
Also download the gcloud CLI: https://cloud.google.com/sdk/docs/install
We will be using the project called GKE Terraform project as you can check below:
Get access to your gcloud project on the CLI and perform the browser steps needed to achieve it:
$ gcloud auth login
Get access to your project:
Let's create an empty VPC to simulate one environment with previous stuff deployed on it, like other instances and so on.
Well, at this point we have the very basic infrastructure to start using Terraform.
Infrastructure as Code, what does that mean?
Terraform is the leading tool to deploy infrastructure on this way, you can define a very complex set of infrastructure with code functions and treating them like objects and variables.
The GKE Terraform project is available here:
https://github.com/calvarado2004/terraform-gke
Please note that the size of the nodes is huge, you can go ahead and delete some of those pools of nodes and customize the CPU's and memory according to your needs and budget, I will do that, of course. You can check here another branch with smaller nodes: https://github.com/calvarado2004/terraform-gke/tree/resize-to-small
ECK can be deployed on a single node, but the minimal enterprise configuration should have:
- One Kibana node
- One Coordinator node
- One Master node
- Two Data nodes
This deployment is creating a pool of nodes for each type of node, in order to enable the autoresizing on further moments of the infrastructure lifecycle. That could give you an idea of the complexity that you can handle easily with Terraform.
Kubernetes have two internal layers of networking. We will be using the following three CIDRs:
- 170.35.0.0/24 for our GCP VPC, the most external face.
- 10.99.240.0/20 for our Kubernetes services.
- 10.96.0.0/14 for our Kubernetes Pods.
You can install Terraform if you have Ubuntu using this way:
$ curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
$ sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
$ sudo apt-get update && sudo apt-get install terraform
Otherwise, check how to install it on your machine:
https://www.terraform.io/downloads.html
This the content of the file gke.tf
variable "gke_username" { default = "" description = "gke username" } variable "gke_password" { default = "" description = "gke password" } variable "cluster_name" { default = "gke-cluster" description = "cluster name" } variable "zone" { default = "us-east1-b" description = "cluster zone" } #Your pods will have an IP address from this CIDR variable "cluster_ipv4_cidr" { default = "10.96.0.0/14" description = "internal cidr for pods" } #Your Kubernetes services will have an IP from this range variable "services_ipv4_cidr_block" { default = "10.99.240.0/20" description = "nternal range for the kubernetes services" } # GKE cluster resource "google_container_cluster" "primary" { name = var.cluster_name location = var.zone remove_default_node_pool = true initial_node_count = 1 network = google_compute_network.vpc-gke.name subnetwork = google_compute_subnetwork.subnet.name cluster_ipv4_cidr = var.cluster_ipv4_cidr services_ipv4_cidr_block = var.services_ipv4_cidr_block min_master_version = "1.17.13-gke.2001" master_auth { username = var.gke_username password = var.gke_password client_certificate_config { issue_client_certificate = false } } cluster_autoscaling { enabled = false } } # Separately Managed Master Pool resource "google_container_node_pool" "master-pool" { name = "master-pool" location = var.zone cluster = google_container_cluster.primary.name node_count = 1 autoscaling { min_node_count = 1 max_node_count = 2 } management { auto_repair = true auto_upgrade = false } node_config { oauth_scopes = [ "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/devstorage.read_only", ] labels = { es_type = "master_nodes" } # 6 CPUs, 12GB of RAM preemptible = false image_type = "ubuntu_containerd" machine_type = "custom-6-12288" local_ssd_count = 0 disk_size_gb = 50 disk_type = "pd-standard" tags = ["gke-node", "${var.cluster_name}-master"] metadata = { disable-legacy-endpoints = "true" } } } # Separately Managed Data Pool resource "google_container_node_pool" "data-pool" { name = "data-pool" location = var.zone cluster = google_container_cluster.primary.name node_count = 2 autoscaling { min_node_count = 2 max_node_count = 4 } management { auto_repair = true auto_upgrade = false } node_config { oauth_scopes = [ "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/devstorage.read_only", ] labels = { es_type = "data_nodes" } # 14 CPUs, 41GB of RAM preemptible = false image_type = "ubuntu_containerd" machine_type = "custom-14-41984" local_ssd_count = 0 disk_size_gb = 50 disk_type = "pd-standard" tags = ["gke-node", "${var.cluster_name}-data"] metadata = { disable-legacy-endpoints = "true" } } } # Separately Managed Coordinator Pool resource "google_container_node_pool" "coord-pool" { name = "coord-pool" location = var.zone cluster = google_container_cluster.primary.name node_count = 1 autoscaling { min_node_count = 1 max_node_count = 2 } management { auto_repair = true auto_upgrade = false } node_config { oauth_scopes = [ "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/devstorage.read_only", ] labels = { es_type = "coordinator_nodes" } # 6 CPUs, 22GB of RAM preemptible = false image_type = "ubuntu_containerd" machine_type = "custom-6-22528" local_ssd_count = 0 disk_size_gb = 50 disk_type = "pd-standard" tags = ["gke-node", "${var.cluster_name}-coord"] metadata = { disable-legacy-endpoints = "true" } } } # Separately Managed Kibana Pool resource "google_container_node_pool" "kibana-pool" { name = "kibana-pool" location = var.zone cluster = google_container_cluster.primary.name node_count = 1 autoscaling { min_node_count = 1 max_node_count = 2 } management { auto_repair = true auto_upgrade = false } node_config { oauth_scopes = [ "https://www.googleapis.com/auth/logging.write", "https://www.googleapis.com/auth/monitoring", "https://www.googleapis.com/auth/devstorage.read_only", ] labels = { es_type = "kibana_nodes" } # 4 CPUs, 13GB of RAM preemptible = false image_type = "ubuntu_containerd" machine_type = "custom-4-13312" local_ssd_count = 0 disk_size_gb = 50 disk_type = "pd-standard" tags = ["gke-node", "${var.cluster_name}-kibana"] metadata = { disable-legacy-endpoints = "true" } } } output "kubernetes_cluster_name" { value = google_container_cluster.primary.name description = "GKE Cluster Name" }
And the content of the file vpc.tf
variable "project_id" { description = "project id" } variable "region" { description = "region" } provider "google" { project = var.project_id region = var.region } # VPC resource "google_compute_network" "vpc-gke" { name = "${var.cluster_name}-vpc" auto_create_subnetworks = "false" } # Subnet resource "google_compute_subnetwork" "subnet" { name = "${var.cluster_name}-subnet" region = var.region network = google_compute_network.vpc-gke.name ip_cidr_range = "170.35.0.0/24" } #Peering between OLD VMs vpc and GKE K8s vpc resource "google_compute_network_peering" "to-vms-vpc" { name = "to-vms-vpc-vpc-network" network = google_compute_network.vpc-gke.id peer_network = "projects/sigma-scheduler-297405/global/networks/vms-vpc-network" } resource "google_compute_network_peering" "to-gke-cluster" { name = "to-gke-cluster-vpc-network" network = "projects/sigma-scheduler-297405/global/networks/vms-vpc-network" peer_network = google_compute_network.vpc-gke.id } output "region" { value = var.region description = "region" } #Enable communication from GKE pods to external instances, networks and services outside the Cluster. resource "google_compute_firewall" "gke-cluster-to-all-vms-on-network" { name = "gke-cluster-k8s-to-all-vms-on-network" network = google_compute_network.vpc-portal.id allow { protocol = "tcp" } allow { protocol = "udp" } allow { protocol = "icmp" } allow { protocol = "esp" } allow { protocol = "ah" } allow { protocol = "sctp" } source_ranges = ["10.96.0.0/14"] }
Let's deploy this GKE Cluster with Terraform!
Deploy a whole cluster is quite easy:
$ git clone https://github.com/calvarado2004/terraform-gke.git $ git checkout resize-to-small Switched to branch 'resize-to-small' Your branch is up to date with 'origin/resize-to-small'. $ terraform init Initializing the backend... Initializing provider plugins... - Finding latest version of hashicorp/google... - Installing hashicorp/google v3.49.0... - Installed hashicorp/google v3.49.0 (signed by HashiCorp) The following providers do not have any version constraints in configuration, so the latest version was installed. To prevent automatic upgrades to new major versions that may contain breaking changes, we recommend adding version constraints in a required_providers block in your configuration, with the constraint strings suggested below. * hashicorp/google: version = "~> 3.49.0" Terraform has been successfully initialized! You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary. $ terraform plan -out=gke-cluster.plan $ terraform apply "gke-cluster.plan"