Deploy a Hybrid EKS Kubernetes Cluster: Use On-Premises nodes on Amazon EKS with Cilium CNI

Alberto Crescini
26 min readDec 10, 2024

--

Kubernetes has become the leading platform to orchestrate containerized workloads, enabling companies and individuals to achieve scalability, redundancy, and consistency of data across their applications. Using a declarative language, Kubernetes enables users to define a desired state of their applications—such as the number of replicas, network configurations and service discovery—and ensures this state is continuously maintained. The area of a Kubernetes cluster responsible for this is the control plane, more specifically the Kubernetes Controller Manager. This component makes high-level decisions and governs the cluster, ensuring that the desired state is achieved. Meanwhile, the data plane executes these decisions by running the actual workloads on worker nodes. As the control plane plays such a critical role in maintaining the desired state of a Kubernetes cluster, it would be a good practice to ensure it is backed by a redundant and replicated infrastructure. The control plane should also be able to handle increased traffic without being overwhelmed, maintaining a consistent level of performance.

Most Kubernetes clusters run on the cloud, and for a good reason. Managed services like Amazon Elastic Kubernetes Service (EKS) provide you a fully managed control plane, removing the operational complexity of managing your own control plane infrastructure. EKS also offers seamless integration with AWS resources and infrastructure, and it simplifies permission management through features like EKS Roles or IRSA, making it a solid choice for hosting your control plane.

But what if you want to extend beyond the cloud? Perhaps you need low-latency processing, have data sovereignty requirements, or need to comply with regulatory standards. Or, like me, you might have idle on-premises hardware — such as servers in a Proxmox home lab — that could host Kubernetes workloads. Alternatively, you may want to save costs by avoiding cloud-based worker nodes for certain workloads. In such cases, using a hybrid approach by combining the flexibility of EKS with on-premises hardware can provide the best of both worlds.

During the latest re:Invent 2024 conference in Las Vegas, AWS introduced a new feature called EKS Hybrid Nodes, which is something I’ve personally been eagerly awaiting. This feature opens up new possibilities for integrating on-premises resources directly with the EKS control plane, enabling you to maintain the benefits of a managed Kubernetes experience while utilizing your existing infrastructure.

In this guide, we will explore how to create this hybrid EKS setup, diving deep into key considerations such as the operating system, networking, security, and costs.

Network configuration

For this guide, I will assume that your AWS account already has a VPC with the appropriate routing table configurations to ensure that every instance you create in the VPC is accessible from your local network. I won’t dive into the specifics of setting up this connectivity for now, as the implementation can vary depending on your network topology. The simplest approaches include setting up a Site-to-Site VPN or configuring an EC2 instance as a gateway to handle VPN traffic. AWS provides detailed guides on these setups, which you can find on their official blog. Just make sure that the IP ranges of your remote and local networks do not overlap.

In a future article, I’ll walk you through how I configured my hybrid networking setup using an EC2 host and Tailscale to create a VPN over NAT-ted networks. For a comprehensive list of the required ports and protocols, refer to the AWS documentation.

If you encounter issues, consider using the VPC Reachability Analyzer to troubleshoot connectivity problems. This tool provides a detailed view of potential routing or permission misconfigurations, helping to identify and resolve issues quickly.

I’ve included a sample diagram below representing the current network scenario. This diagram offers an overview of the hybrid architecture, demonstrating how on-premises nodes seamlessly integrate with an EKS control plane hosted in AWS. Let’s break down the components and networking details to better understand the setup.

A high-level diagram of the network setup

On-Premises Networking

The on-premises infrastructure consists of Kubernetes worker nodes hosted within a private subnet. These nodes can be either physical servers or virtual machines, like the ones you might run in a Proxmox home lab. In this setup, the local nodes need a secure way to communicate with the EKS control plane and other AWS services in the cloud. This is achieved through a dedicated VPN client.

In my setup, the VPN client is another virtual machine, hosted on the same hypervisor, that establishes and maintains a secure connection with the VPN instance in the AWS Hybrid VPC (as shown in the diagram). Here’s how it works:

Packet Forwarding: The VPN client VM is configured to perform IPv4 forwarding, which enables it to route packets from other devices in the local network (e.g., the worker nodes) to the AWS VPC. This ensures that even devices without direct VPN connectivity can access the AWS environment through the VPN client.

Routing to AWS VPC: The VPN client acts as a bridge between the on-premises private subnet and the AWS Hybrid VPC. When a local worker node attempts to send a packet to an IP address within the AWS VPC’s CIDR range, the VPN client forwards it securely through the VPN tunnel.

Static Route Configuration

To enable any device in my network to reach remote AWS resources deployed in the VPC I had to configure a static route on my local network router. This static route tells the router that any traffic destined for the AWS VPC CIDR range — which is different from the local subnet — must be routed through the VPN client instance.

In my scenario, I’m using CIDR is 10.0.0.0/16 for the AWS VPC, whereas I am using an on-premises range of 192.168.0.0/16. For this reason, I have configured my router to forward all the 10.0.x.x traffic to the IP address of the VPN client instance.

A screenshot of my local router’ static routes page

If you prefer not to configure a static route on your local router, you can instead create a local route table directly on each worker node. By adding a specific route to the AWS VPC CIDR range on the node itself, you can ensure that only the necessary instances are allowed to forward traffic through the VPN client, while all other local traffic remains unaffected. This approach gives you more granular, least-privileged control over routing, allowing you to restrict access to only the required destinations. However, it requires manual configuration on each node, which may become cumbersome in larger environments.

AWS Networking

The EKS Control Plane operates within a dedicated section of the AWS network that is managed entirely by AWS, which remains invisible in your account. However, when you create your EKS cluster, you must associate it with a VPC from your account. This associated VPC will house the resources such as your EC2 worker nodes, load balancers, and storage volumes (EBS, EFS). To facilitate communication between the AWS-managed EKS control plane VPC and your associated VPC, a VPC Endpoint is used, ensuring secure and seamless connectivity.

In my case, the associated VPC has a CIDR block of 10.0.0.0/16, with two subnets per Availability Zone (AZ) — one private subnet and one public subnet. Additionally, I’ve configured a NAT Gateway in the public subnet to allow outbound internet access for resources in the private subnets, ensuring proper connectivity while maintaining security — this way, my EC2 VPN instance and other private resources are not directly exposed to the public internet, maintaining security while still allowing the necessary outbound connectivity.

The VPC that will be associated with the EKS Cluster

This will help us to grant a secure and controlled flow of traffic, as your private resources are shielded from direct internet access while benefiting from the internet access provided through the NAT device.

Prerequisites

At the time of writing this article, you can run EKS Hybrid nodes on the following operating systems:

  • Amazon Linux 2023
  • Ubuntu — 20.04, 22.04, 24.04
  • Red Hat Enterprise Linux — RHEL 8, RHEL 9

If you are running your nodes on Ubuntu 24.04, ensure you have installed the latest security patches, as a recent update addressed an issue with AppArmor and containerd. In this guide, we will be installing containerd, as it is the default Kubernetes runtime. If you prefer a different runtime, you can specify an alternative during the nodeadm install process using the --containerd-source flag. For a list of supported runtime options, check the available options here. In this article, I will be using an ubuntu 22.04 node.

On the networking prespective, Hybrid Nodes require a reliable connection between your on-premises environment and AWS. Therefore, it is recommended to have at least a 100 Mbps connection and a maximum round-trip latency of 200ms round trip latency for the hybrid nodes connection to the AWS Region. You can check your latency to reach an AWS region here.

Create your certificates

Let’s start by defining how we can establish a trust relationship between our local node with the EKS cluster. Security is always the number one priority when building stuff on the cloud, and ensuring that only trusted entities can interact with your cloud resources is essential. Thus, we need to let EKS know that our node is a trusted entity, so it should be treated as if it were an EC2 instance running within the VPC.

EKS allows two methods to grant temporary credentials to the cluster. The first option is AWS Systems Manager Hybrid Activations, which requires installing the Systems Manager agent on the node. The second option is IAM Roles Anywhere, which is the topology we will use here.

IAM Roles Anywhere enables you to grant temporary credentials to on-premises workloads using X.509 certificates issued by an either private or public Certificate Authority (CA).

Here’s how they work:

  1. Private Certificate Authority (CA): First, you need a private CA to issue certificates. This CA acts as a trusted authority for signing the certificates used by your on-premises nodes.
  2. Trust Anchor: Trust anchors are a key component of digital security and are often considered the “root of trust” for secure communication, digital certificates, and cryptographic protocols. In IAM Roles Anywhere, you configure one, which points to the root or intermediate CA that will issue the certificates. This establishes a trust relationship between your CA and AWS IAM by authorizing entities that authenticate using a certificate signed by that CA. These entities can then assume specific IAM roles based on the permissions defined in the associated IAM policies.
  3. IAM Roles: You define an IAM role that specifies the permissions the on-premises nodes need. This role is assumed by the nodes using the certificate-based authentication mechanism.

A trust anchor tells AWS that any entity presenting a certificate signed by the associated Certificate Authority (CA) can request temporary credentials to assume the specified IAM role

Step 1: Create a private CA

You have flexibility in choosing how to create and maintain your Certificate Authority (CA):

  1. AWS Private CA:
    This is AWS’s managed certificate authority service. It integrates seamlessly with IAM Roles Anywhere and allows you to issue and manage your certificates without needing to set up your own CA infrastructure.
  2. Third-Party CA:
    You can use a CA from a trusted third-party provider. The only requirement is that the CA’s root or intermediate certificate must be uploaded to AWS as a Trust Anchor. Examples include CAs like DigiCert, GlobalSign, or your organization’s enterprise CA.
  3. Self-Signed CA:
    If you have a self-signed CA in your on-premises environment or lab setup, you can use it for IAM Roles Anywhere. Simply upload the root certificate of your self-signed CA as the Trust Anchor.

Using AWS Private CA is a convenient option, but using a third-party or self-signed CA allows you to avoid additional costs associated with Amazon Private CA and to leverage existing certificate infrastructure.

Nonetheless, for this guide I will be using AWS Private CA.

However, keep in mind that additional hefty charges apply. Therefore please, take a look at the pricing details here before proceeding.

The process is largely similar even if you choose to host your own CA. In that case, you would generate the required certificates locally using tools like OpenSSL, rather than relying on AWS Private CA and AWS Certificate Manager.

A screenshot of AWS Private CA

As a first step, head over to the AWS Management Console and search for AWS Private CA in the search bar. Once there, get started by creating a new Private CA.

During the setup, you will need to specify a Subject Distinguished Name (DN). This includes key details about the CA, such as its name, organization, country, and other identifying information. These attributes help external systems verify and trust certificates issued by this CA.

My private Certificate Authority

Once done, click on Create CA and wait for the status to become Active.

Step 2: Create a trust anchor

We now have to establish trust between AWS and the Certificate Authority we’ve just created. Just go to AWS Role Anywhere and create a trust anchor

As you can see, the CA we created is shown

If you choose not to use AWS Private CA, you can click on the external certificate bundle section to upload your PEM-encoded certificate bundle created by your self-hosted CA. Once you’ve selected the Certificate Authority to use, specify a trust anchor name and click Create a trust anchor button.

Step 3: Generate a certificate for your node

So far, we’ve established a Certificate Authority (CA) linked to our trust anchor. Now, the next step is to create a certificate specifically for our local node. This certificate will be used to authenticate AWS API calls, ensuring that the AWS control plane treats requests from our node as coming from a trusted entity.

Since the root CA of this certificate is associated with our trusted anchor, any certificate we sign using this CA will be recognized by AWS as valid, effectively enabling the node to authenticate securely with AWS services.

The node’s authentication will be tied to an IAM Role, which we will define in the next steps. The permissions granted to this node will depend on the IAM policies that are attached to this role, giving you fine-grained control over what actions the node can perform.

If you’re using AWS Private CA, go to AWS Certificate Manager, click Request a certificate, and then select Request a private certificate. Choose the existing CA you created in the first step to issue your certificate.

Creating a RSA 2048 certificate

If you’re using a self-managed CA, you should create an RSA 2048-bit key pair and generate a Certificate Signing Request (CSR) for the Root CA you previously set up.

The certificate that will be used by node0 to authenticate

Wait for the certificate status to change to Issued — once it’s ready, export the certificate and private key to your local machine. We will need both the certificate and its corresponding private key to configure our on-premises node for authentication with IAM Roles Anywhere.

Exporting the certificate from AWS Certificate Manager

Assign permissions to your certificates

Now that we’ve set up an authentication mechanism for our node to communicate with AWS, the next step is to define the actions that entities authenticated with a certificate issued by our CA are allowed to perform. This involves associating our CA with our EKS cluster and creating an IAM role that specifies these permissions for the certificate holders.

We’ll start by creating the IAM Role, as this role will be linked to our CA when we associate it with the EKS cluster. On the IAM console, choose Roles and create a new role.

Step 1: Specify the trusted entity

On the Select trusted entity page, select a custom trust policy and paste the following JSON

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "rolesanywhere.amazonaws.com"
},
"Action": [
"sts:TagSession",
"sts:SetSourceIdentity"
],
"Condition": {
"ArnEquals": {
"aws:SourceArn": "TRUST_ANCHOR_ARN"
}
}
},
{
"Effect": "Allow",
"Principal": {
"Service": "rolesanywhere.amazonaws.com"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringEquals": {
"sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}"
},
"ArnEquals": {
"aws:SourceArn": "TRUST_ANCHOR_ARN"
}
}
}
]
}

Replace TRUST_ANCHOR_ARN with the ARN of the trust anchor you created earlier. The condition in this trust policy restricts AWS IAM Roles Anywhere from assuming the Hybrid Nodes IAM role to exchange temporary IAM credentials (as the trusted entity) only when the role session name matches the Common Name (CN) in the X.509 certificate installed on your hybrid nodes.

Step 2: Grant role permissions

After specifying our CA as a trusted entity, we’ll proceed with attaching the appropriate policies to the IAM role. To allow the hybrid nodes to describe EKS clusters and pull container images from Amazon Elastic Container Registry (ECR), we need to associate the role with the necessary permissions. To do this, select both the EKSDescribeClusterPolicy and AmazonEC2ContainerRegistryPullOnly policies.

Finish by defining a name for the role, as for instance AmazonEKSHybridNodesRole

Step 3: Associate the Role with IAM Roles Anywhere

After creating the IAM role that is associated with these two policies and trusted entity (the CA), the next step is to create an IAM Roles Anywhere profile. This profile acts as a bridge between the on-premises node and AWS, allowing the node to authenticate and assume the IAM role you just created. By creating this profile, you provide the mechanism that associates the certificate issued by your CA with the IAM role, thus establishing trust. Without the profile, the role wouldn’t be able to be assumed by the hybrid nodes because IAM doesn’t know which certificates are allowed to authenticate under that role.

When authenticating the node, you will specify a certificate, a private key, a trust anchor, and this profile. AWS will then return a set of temporary credentials that match the IAM role we created. In short, the profile explicitly states that this role can be invoked by a non-AWS workload.

Creating the Roles Anywhere profile

To create the IAM Roles Anywhere profile, head back to the IAM Roles Anywhere console and go to the Profiles section. Click create a profile and name it EKSHybridNodes. Select the role we just created, and leave the default session policies as they are.

Create an Hybrid EKS Cluster

Great! If you’ve reached this point, it means you’ve successfully set up the authentication mechanism for your node to AWS. Now, we need to create an EKS cluster that can accept hybrid nodes.

Please note that you can’t add an on-prem node to an existing EKS cluster. This is because the new API fields required for hybrid node configuration (such as the remote network configuration) can only be enabled during cluster creation as reported here.

Step 1: Configure cluster

To proceed, head over to the EKS console and click on the create a new cluster button. Select custom configuration and ensure you deselect the new EKS Auto Mode. At this stage, we don’t want AWS to automatically provide compute resources for our cluster, but you can always enable this option later on to incorporate additional compute scenarios like Fargate or Managed Node Groups.

Give your cluster a distinctive name — hybrid, in my case — and create the cluster IAM Role by clicking the Create Recommended Role. Choose how you’d like to grant access to the cluster, and then click next.

EKS Configure cluster specifications

Step 2: Configure networking

In the next section, you’ll need to specify the VPC where your cluster will be deployed and define the subnets where the control plane can place Elastic Network Interfaces (ENIs) to enable communication with the cluster.

Ensure you select the VPC that hosts the termination point for your VPN connection. If you’re planning to deploy internet-facing services, such as Load Balancers (NLB/ALB), include at least two public subnets deployed in two different availability zones. Additionally, select at least one private subnet should you need to provide internet-facing LoadBalancers or Ingresses.

When you create a cluster, Amazon EKS automatically generates a security group named eks-cluster-sg-<cluster-name>-<uniqueID>. This Cluster Security Group primarily facilitates communication between the EKS control plane and other resources within your VPC. While you cannot modify its inbound rules, you can restrict its outbound rules if necessary.

To enable communication between your hybrid nodes and the Amazon EKS control plane, you need to associate an additional security group with your cluster. This security group should include rules to allow inbound HTTPS traffic on port 443 from specific sources:

  1. Remote Node CIDR: This allows secure API calls and interactions from your on-premises hybrid nodes. — in my case 192.168.0.0/21
  2. Remote Pod CIDR: Enables secure communication from pods running on hybrid nodes — in my case192.168.64.0/18
  3. VPC CIDR Range: This allows traffic from VPC endpoints that may be deployed by the cluster within your VPC, such as for private communication with EKS control plane components or other AWS services — in my case10.0.0.0/16
The cluster security group, open to a wider range

Including the VPC CIDR range ensures that your cluster can interact with internal AWS services and VPC endpoints. This is particularly useful for traffic originating from EKS-managed endpoints like Kubernetes API calls and webhooks, which rely on HTTPS for secure communication.

Heading back to the EKS configuration, take a look at the hybrid node section, and just specify the Node CIDR block where required.

Insert the Cluster Services IP under the Service IPv4 range

Step 3: Configure add-ons

On the Select add-ons page, you can choose additional add-ons to enhance your cluster’s functionality. However, it’s recommended to keep the default add-ons unless you have specific needs to modify them.

Choose the following add-ons

By default, the wizard selects essential add-ons such as CoreDNS (which handles DNS for your cluster) and kube-proxy (which manages networking rules for pods). Disabling or removing any of these default add-ons can interfere with the proper operation of Kubernetes applications, so proceed with caution if you’re considering modifying them.

Do not select the VPC CNI, as we will be using Cilium for handling pod-to-pod networking.

Step 4: Review and create

Revise the settings you’ve selected and create the cluster. Note that based on my experience this process would require approximately 30 minutes.

Well done! You've made great progress in setting up your EKS hybrid node integration. Take a moment to pat yourself on the back for getting this far—it's no small feat! Keep it up!

Connect hybrid nodes

You’ve successfully configured the necessary networking, security, and IAM roles to authenticate and securely manage your hybrid nodes in AWS. We can finally head over our beloved local node to terminate our installation.

Step 1: Copy your certificates to the node

Start by SSH-ing into your instance and creating the /etc/iam/pki folder. Then, navigate to it.

sudo mkdir -p /etc/iam/pki
cd /etc/iam/pki

We will save here the certificate that we previously exported from Certificate Manager, or created locally if you’re hosting your own CA. These certificates serve as proof of identity when setting up IAM Roles Anywhere. They are issued by your trusted Certificate Authority (CA) and authenticate your on-premises node, verifying that it is authorized to interact with AWS services. This setup eliminates the need to store long-term AWS credentials on your node, following a similar approach to how EC2 instance profiles work for authentication.

We then create two files, server.pemandserver.pki, where we will reciprocally place the certificate and the un-encrypted private key.

sudo touch server.pem
sudo touch server.pki

As a first step, copy the content of your certificate body to the server.pem file. Then, take a look at your private key you downloaded before. Should it start with a similar block, it means it should be decypher before saving its content to server.pki.

-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIFKzBVBgkeekiG...

To do this, save it on the server-encrypted.keyfile, then run the following command to decrypt it:

sudo openssl rsa -in server-encrypted.key -out server.key

You will be prompted to enter the passphrase you specified when creating the certificate. Once the decryption is complete, you can delete the encrypted key:

rm server-encrypted.pki

Step 2: Install nodeadm

We can now install the hybrid nodes CLI, which allows your node to join the cluster and appear on the EKS console.

Move back to your home folder, create a folder for this project and move into it.

cd ~
mkdir eks-hybrid && cd eks-hybrid

Next, download the Hybrid Nodes CLI appropriate for your hardware architecture. Use one of the following links depending on your system, and then make it executable.

for x86_84 hosts:

curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/amd64/nodeadm'
chmod +x nodeadm

for ARM hosts:

curl -OL 'https://hybrid-assets.eks.amazonaws.com/releases/latest/bin/linux/arm64/nodeadm'
chmod +x nodeadm

We then need to install the hybrid nodes dependencies, based on our Kubernetes version. On the same folder, just run the following command, making sure you replace K8S_VERSION with the Kubernetes minor version of your Amazon EKS cluster, for example 1.31

sudo ./nodeadm install K8S_VERSION --credential-provider iam-ra

This process will take approximately two minutes, once done you will be seeing something similar to this:

node1@node1-mxp:~/eks-hybrid$ sudo ./nodeadm install 1.31 --credential-provider iam-ra
[sudo] password for node1:
{"level":"info","ts":1733733445.574191,"caller":"install/install.go:84","msg":"Creating package manager..."}
{"level":"info","ts":1733733445.574619,"caller":"install/install.go:93","msg":"Setting package manager config","containerd source":"distro"}
{"level":"info","ts":1733733445.574777,"caller":"install/install.go:94","msg":"Configuring package manager. This might take a while..."}
{"level":"info","ts":1733733445.5749362,"caller":"install/install.go:99","msg":"Validating Kubernetes version","kubernetes version":"1.31"}
{"level":"info","ts":1733733445.9190683,"caller":"install/install.go:105","msg":"Using Kubernetes version","kubernetes version":"1.31.2"}
{"level":"info","ts":1733733445.9219065,"caller":"flows/install.go:57","msg":"Installing containerd..."}
{"level":"info","ts":1733733462.7907484,"caller":"flows/install.go:66","msg":"Installing iptables..."}
{"level":"info","ts":1733733467.0780182,"caller":"flows/install.go:73","msg":"Installing AWS signing helper..."}
{"level":"info","ts":1733733471.194168,"caller":"flows/install.go:91","msg":"Installing kubelet..."}
{"level":"info","ts":1733733488.2541497,"caller":"flows/install.go:96","msg":"Installing kubectl..."}
{"level":"info","ts":1733733500.7898283,"caller":"flows/install.go:101","msg":"Installing cni-plugins..."}
{"level":"info","ts":1733733510.397283,"caller":"flows/install.go:106","msg":"Installing image credential provider..."}
{"level":"info","ts":1733733515.6521177,"caller":"flows/install.go:111","msg":"Installing IAM authenticator..."}
{"level":"info","ts":1733733529.8037457,"caller":"flows/install.go:52","msg":"Finishing up install..."}
node1@node1-mxp:~/eks-hybrid$

Step 3: Let the EKS cluster trust the Trust Anchor

We need to return to the Amazon EKS console for our last configuration step. So far, we’ve laid the groundwork for securely integrating hybrid nodes with the EKS cluster control-plane

  • We’ve created a Trust Anchor The trust anchor serves as the entity that verifies and associates a role with any node authenticated using a certificate issued by our root CA.
  • We have then granted Permissions to the Role: The IAM role associated with the trust anchor was granted the necessary permissions to interact with the EKS cluster, such as describing the cluster and pulling container images from Amazon ECR.
  • Finally, we have allowed External Entities to Assume the Role: By configuring the trust anchor, we ensured that external entities (in this case, your hybrid node) can assume the IAM role when authenticated via the certificates.

However, there’s one final step missing: telling your EKS cluster that it can trust this role to allow cluster access. Without this configuration, the cluster won’t consider the role as a trusted entity to grant access.

Amazon EKS uses a mutual trust model for hybrid nodes. While the role grants cluster access permissions to the ones that use it, the cluster itself must also be aware of this possible access mechanism.

Think of this step as introducing your nodes to your EKS cluster networking party. The trust anchor gave your nodes some funky golden tickets, ensuring they’re verified and ready to mingle. However, the event organizer — the EKS cluster — hasn’t been informed that these golden tickets are valid for entry!

Luckily it’s pretty easy, just open your cluster from the EKS console, and go to the access tab, and select create an Access Entry.

Specify the Hybrid Nodes IAM Role as the IAM principal, select Hybrid Linux as the type, verify your configuration and confirm the access entry.

Create a new IAM access entry for your cluster

Now we can use the AmazonEKSHybridNodesRole to access the cluster, and we can return to the terminal.

Step 4: Create a nodeadm configuration file

We are almost there! The next step involves creating a configuration file that ties together all the elements we’ve set up: the credentials, trust anchor, IAM Roles Anywhere profile, and the EKS cluster. This file will serve as the blueprint for securely connecting your hybrid nodes to the cluster.

Start by creating and open a new file named nodeConfig.yaml

vim nodeConfig.yaml

Here’s an example nodeConfig.yaml file for configuring your hybrid node with AWS IAM Roles Anywhere as the credentials provider. Make the following replacements based on your setup:

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: CLUSTER_NAME
region: AWS_REGION
hybrid:
iamRolesAnywhere:
nodeName: NODE_NAME
trustAnchorArn: TRUST_ANCHOR_ARN
profileArn: PROFILE_ARN
roleArn: ROLE_ARN
certificatePath: CERTIFICATE_PATH
privateKeyPath: KEY_PATH

Replace the placeholders in nodeConfig.yaml with the specific values of your environment:

  1. CLUSTER_NAME: Replace with the name of your EKS cluster. For example, hybrid
  2. AWS_REGION: Specify the AWS Region where your EKS cluster is hosted, such as eu-west-2.
  3. NODE_NAME: Ensure this matches the Common Name (CN) of the certificate on your node if your IAM trust policy uses the condition "sts:RoleSessionName": "${aws:PrincipalTag/x509Subject/CN}".
  4. TRUST_ANCHOR_ARN: Replace with the ARN of the trust anchor you configured during the setup.
  5. PROFILE_ARN: Replace with the ARN of the IAM Roles Anywhere profile you created earlier.
  6. ROLE_ARN: Replace with the ARN of your Hybrid Nodes IAM role.
  7. CERTIFICATE_PATH: Specify the absolute path to your node’s certificate file. If not specified, the default is /etc/iam/pki/server.pem.
  8. KEY_PATH: Specify the absolute path to your certificate’s private key. If not specified, the default is /etc/iam/pki/server.key.

Once done, it should look something similar to this:

apiVersion: node.eks.aws/v1alpha1
kind: NodeConfig
spec:
cluster:
name: hybrid
region: eu-west-2
hybrid:
iamRolesAnywhere:
nodeName: "node1.mxp"
trustAnchorArn: arn:aws:rolesanywhere:eu-west-2:<account_ID>:trust-anchor/cd75fb33-XXX-35fc9d00a8d5
profileArn: arn:aws:rolesanywhere:eu-west-2:<account_ID>:profile/14907a4f-XXX-2328800606c2
roleArn: arn:aws:iam::<account_ID>:role/AmazonEKSHybridNodesRole
certificatePath: /etc/iam/pki/server.pem
privateKeyPath: /etc/iam/pki/server.key

To connect your hybrid node to the Amazon EKS cluster, run the following command using the nodeConfig.yaml file you just created:

sudo ./nodeadm init -c file://nodeConfig.yaml

If you followed all the steps correctly, you should expect something similar to this:

node1@node1-mxp:~/eks-hybrid$ sudo ./nodeadm init -c file://nodeConfig.yaml
{"level":"info","ts":1733735211.5004578,"caller":"init/init.go:45","msg":"Checking user is root.."}
{"level":"info","ts":1733735211.5007808,"caller":"init/init.go:59","msg":"Loading installed components"}
{"level":"info","ts":1733735211.5012774,"caller":"node/node.go:13","msg":"Loading configuration..","configSource":"file://nodeConfig.yaml"}
{"level":"info","ts":1733735211.5023167,"caller":"node/node.go:23","msg":"Setting up hybrid node provider..."}
{"level":"info","ts":1733735211.5059185,"caller":"hybrid/validator.go:42","msg":"Validating configuration..."}
{"level":"info","ts":1733735211.5061007,"caller":"flows/init.go:31","msg":"Configuring Aws..."}
{"level":"info","ts":1733735211.5064785,"caller":"hybrid/configenricher.go:18","msg":"Enriching configuration..."}
{"level":"info","ts":1733735211.506586,"caller":"hybrid/configenricher.go:25","msg":"Default options populated","defaults":{"sandboxImage":"602401143452.dkr.ecr.eu-west-2.amazonaws.com/eks/pause:3.5"}}
{"level":"info","ts":1733735212.1288621,"caller":"hybrid/configenricher.go:32","msg":"Cluster details populated","cluster":"...","cidr":"172.16.0.0/16"}}
{"level":"info","ts":1733735212.1292624,"caller":"flows/init.go:41","msg":"Setting up system aspects..."}
{"level":"info","ts":1733735212.129481,"caller":"flows/init.go:44","msg":"Setting up system aspect..","name":"sysctl"}
{"level":"info","ts":1733735212.1501899,"caller":"flows/init.go:48","msg":"Finished setting up system aspect","name":"sysctl"}
{"level":"info","ts":1733735212.1503766,"caller":"flows/init.go:44","msg":"Setting up system aspect..","name":"swap"}
{"level":"info","ts":1733735212.1737316,"caller":"flows/init.go:48","msg":"Finished setting up system aspect","name":"swap"}
{"level":"info","ts":1733735212.1740541,"caller":"flows/init.go:44","msg":"Setting up system aspect..","name":"ports"}
{"level":"info","ts":1733735212.1742733,"caller":"system/ports.go:71","msg":"No firewall enabled on the host. Skipping setting firewall rules..."}
{"level":"info","ts":1733735212.1744494,"caller":"flows/init.go:48","msg":"Finished setting up system aspect","name":"ports"}
{"level":"info","ts":1733735212.174577,"caller":"flows/init.go:52","msg":"Configuring Pre-process daemons..."}
{"level":"info","ts":1733735212.1747005,"caller":"flows/init.go:63","msg":"Configuring daemons..."}
{"level":"info","ts":1733735212.1747599,"caller":"flows/init.go:67","msg":"Configuring daemon...","name":"containerd"}
{"level":"info","ts":1733735212.1748571,"caller":"containerd/config.go:44","msg":"Writing containerd config to file..","path":"/etc/containerd/config.toml"}
{"level":"info","ts":1733735212.175152,"caller":"flows/init.go:71","msg":"Configured daemon","name":"containerd"}
{"level":"info","ts":1733735212.1752634,"caller":"flows/init.go:67","msg":"Configuring daemon...","name":"kubelet"}
{"level":"info","ts":1733735212.2861054,"caller":"kubelet/config.go:377","msg":"Detected kubelet version","version":"v1.31.2"}
{"level":"info","ts":1733735212.2869146,"caller":"kubelet/config.go:466","msg":"Writing kubelet config to file..","path":"/etc/kubernetes/kubelet/config.json"}
{"level":"info","ts":1733735212.3224397,"caller":"flows/init.go:71","msg":"Configured daemon","name":"kubelet"}
{"level":"info","ts":1733735212.322689,"caller":"flows/init.go:79","msg":"Ensuring daemon is running..","name":"containerd"}
{"level":"info","ts":1733735212.3354127,"caller":"flows/init.go:83","msg":"Daemon is running","name":"containerd"}
{"level":"info","ts":1733735212.3354378,"caller":"flows/init.go:85","msg":"Running post-launch tasks..","name":"containerd"}
{"level":"info","ts":1733735212.3354461,"caller":"containerd/sandbox.go:20","msg":"Looking up current sandbox image in containerd config.."}
{"level":"info","ts":1733735212.4233396,"caller":"containerd/sandbox.go:32","msg":"Found sandbox image","image":"602401143452.dkr.ecr.eu-west-2.amazonaws.com/eks/pause:3.5"}
{"level":"info","ts":1733735212.4235258,"caller":"containerd/sandbox.go:34","msg":"Fetching ECR authorization token.."}
{"level":"info","ts":1733735212.641159,"caller":"containerd/sandbox.go:48","msg":"Pulling sandbox image..","image":"602401143452.dkr.ecr.eu-west-2.amazonaws.com/eks/pause:3.5"}
{"level":"info","ts":1733735215.0249465,"caller":"containerd/sandbox.go:53","msg":"Finished pulling sandbox image","image-ref":"sha256:6996f8da07bd405c6xxx041deda57d1d658ec20a78584f9f436c9a3bb7"}
{"level":"info","ts":1733735215.024975,"caller":"flows/init.go:89","msg":"Finished post-launch tasks","name":"containerd"}
{"level":"info","ts":1733735215.0249972,"caller":"flows/init.go:79","msg":"Ensuring daemon is running..","name":"kubelet"}
{"level":"info","ts":1733735215.322568,"caller":"flows/init.go:83","msg":"Daemon is running","name":"kubelet"}
{"level":"info","ts":1733735215.3226068,"caller":"flows/init.go:85","msg":"Running post-launch tasks..","name":"kubelet"}
{"level":"info","ts":1733735215.3226159,"caller":"flows/init.go:89","msg":"Finished post-launch tasks","name":"kubelet"}
node1@node1-mxp:~/eks-hybrid$

If the above command completes successfully, your hybrid node has joined your Amazon EKS cluster. You can confirm this by:

  1. Navigating to the compute tab for your cluster in the Amazon EKS console. Ensure that your IAM principal has the necessary permissions to view the node.
  2. Running the following command in your laptop terminal to check the nodes in your cluster
kubectl get nodes

Note: Ensure your laptop is authenticated with the AWS CLI and has the appropriate kubectl configuration for your cluster. Click here to learn more about setting this up.

Important Note

Your nodes will show a status of Not Ready. This is expected at this stage because there is no Container Network Interface (CNI) installed or configured on your hybrid nodes yet. The CNI is responsible for managing the network connectivity for your Kubernetes pods, and we will be installing it on the next step.

The node1.mxp was successfully added to the cluster!

Troubleshooting — If your nodes do not appear in the cluster:

If you’re unable to associate your node with EKS, start by debugging the connectivity using the following command:

sudo ./nodeadm debug -c file://nodeConfig.yaml

And it should return a response similar to the following one, should the connection be successful:

node1@node1-mxp:~/eks-hybrid$ sudo ./nodeadm debug -c file://nodeConfig.yaml
* Validating access to AWS IAM Roles Anywhere API endpoint [Success]
* Validating authentication against AWS [Success]
* Validating access to Kubernetes API endpoint [Success]
* Validating unauthenticated request to Kubernetes API endpoint [Success]
* Validating authenticated request to Kubernetes API endpoint [Success]
* Validating Kubernetes identity matches a Node identity [Success]
* Validating access to Kube-API server through VPC IPs [Success]

If you’re still facing issues, and the debugging utility does not provide enough information:

  • Double-check the configuration in your nodeConfig.yaml file.
  • Review your security group and firewall rules to ensure proper communication between your hybrid node and the Amazon EKS control plane.
  • Consult the Troubleshooting Hybrid Nodes documentation for additional guidance.

Deploy a CNI for the cluster using Cilium

In a Kubernetes cluster, a Container Network Interface (CNI) is used for managing network connectivity between pods, services, and nodes. In an Amazon EKS hybrid setup like ours, where part of the workloads reside on-premises and part on cloud, we need to have a common network plane that would allow communication across the two different IP classes we’re using.

Why Amazon VPC CNI Isn’t Supported for Hybrid Nodes

Amazon VPC CNI tightly integrates Kubernetes networking with the Amazon VPC by directly allocating IP addresses from the VPC’s CIDR to pods. While this works in all-AWS environment, it falls short in hybrid setups because:

  1. Cross-Network Dependencies: Pods running on on-premises nodes cannot directly acquire IPs from the VPC range (e.g., 10.0.0.0/16 in my case).
  2. Service IP Resolution: Hybrid nodes must resolve Kubernetes service IPs (from the cluster CIDR) and map them to pods, a capability VPC CNI cannot provide in this context.
The Cilium CNI infrastructure

For hybrid nodes, Cilium and Calico are better alternatives. Both CNIs are designed to bridge networking across different environments, making them ideal for setups where local networks interact with Kubernetes pods and services, building what’s known as a mesh network. They allow us to allocate an IP range that fits outside the one adopted by my router (which is 192.168.0.0/21) while the pod address space (192.168.64.0/18) remains separate, ensuring a smooth interaction between the on-prem and cloud environments. These CNIs help the host “resolve” service IPs and facilitate seamless communication between on-premises infrastructure and Kubernetes pods across different network ranges.

I will be using Cilium for this article, but the setup with Calico would be pretty similar, and you can find it reported here.

Step 1: Install the helm repo

Ensure you’ve installed Helm CLI on your laptop. If not, follow these instruction to get started. Then, install the Cilium repo

helm repo add cilium https://helm.cilium.io/

Step 2: Create a value file

We now need to define our pod network, called cilium-values.yaml.

You can use the following template to create a scenario featuring a single remote pod network, which masquerades the source IP address of all pod traffic leaving the cluster to the IP address of the node.

This configuration allows Cilium to run smoothly with Amazon EKS clusters, regardless of whether they have remote pod networks configured. By masquerading the pod traffic, it ensures that traffic originating from the pods will appear to come from the node’s IP address when it leaves the cluster, facilitating seamless communication between on-premises and cloud environments.

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/compute-type
operator: In
values:
- hybrid
ipam:
mode: cluster-pool
operator:
clusterPoolIPv4MaskSize: 25
clusterPoolIPv4PodCIDRList:
- POD_CIDR
operator:
unmanagedPodWatcher:
restart: false

Replace POD_CIDR with the CIDR range for your pods. In my case, this would be 192.168.64.0/18. This range defines the IP addresses assigned to the pods within your Kubernetes cluster.

Next, replace CIDR_SIZE with the size of the CIDR segment you wish to allocate to each node. For instance, if you want each node to handle a /25 subnet, you would use 25 as the value for CIDR_SIZE. This determines how many IP addresses will be allocated to each node in the cluster.

Step 3: Install Cilium

To install Cilium on your cluster, you’ll need to replace CILIUM_VERSION with the desired version of Cilium. You can find the latest patch release for a given minor version in the Stable Releases section of the Cilium documentation here. Use the following command to install Cilium:

helm install cilium cilium/cilium \
--version CILIUM_VERSION \
--namespace kube-system \
--values cilium-values.yaml

To verify that Cilium was installed successfully, you should see a few cilium pods popping up, specifically the cilium-operator deployment and the cilium-agent running on each of your hybrid nodes, using the following command:

kubectl get pods -n kube-system

Additionally, the status of your hybrid nodes should now display as Ready.

The node1.mxp is now configured and in a ready state

Congratulations! You’ve successfully set up an Amazon EKS cluster with hybrid nodes, bridged your on-premises infrastructure to the AWS cloud, and configured a robust networking solution with Cilium. This setup allows your cluster to operate across both on-prem environments and on AWS, leveraging the distributed and replicated Amazon EKS Control Plane.

What’s next?

With your hybrid EKS cluster up and running, you can now explore and experiment at your own pace. For example, I expanded my hybrid setup by adding another node using the same guide, simply generating a new certificate.

My two remote nodes hosted on my Proxmox server

Some ideas that would help you take it further:

  • Implementing mTLS with Istio and Kiali to encrypt communication at the pod-to-pod level and expanding your services via an Istio Gateway.
  • Setting up monitoring with Grafana and Prometheus to track service latency across your nodes and visualize cluster metrics.
  • Integrating with AWS services to securely access RDS, S3, or DynamoDB using IAM Roles for Service Accounts, eliminating the need for embedded credentials.

I’d love to hear what you build — please drop your ideas or creations in the comments below! I’ll also share advanced networking configurations and other cool features soon. Happy Kube-ing! 🚢✨

--

--

Alberto Crescini
Alberto Crescini

Written by Alberto Crescini

I'm Alberto, a Solutions Architect at AWS based in Hoxton, London. I hike and do bouldering —when I'm not, I'm probably feeding my sourdough starter 🧫

No responses yet