Engineering

Jenkins on EKS: We Run Build Pods on Demand (Here's How)

How we run a production Jenkins on Amazon EKS with build agents that spin up as Kubernetes pods. IRSA, cross-account AWS, Job DSL, the whole setup.

Amjad Syed
15 min read

We have been running Jenkins on Amazon EKS for a while now, and the setup has gotten to a place where I think it is worth writing down. No static agent nodes, no SSH keys flying around, no clicking through the Jenkins UI to create jobs. Every build spins up a fresh pod, does its thing, and dies. Every job is defined in Git.

This post is a walkthrough of how the pieces fit together. I have stripped out the boring bits and kept the parts I think anyone running Jenkins in 2026 should actually care about.

The Big Picture

Three ideas hold the whole setup up:

  1. Stateless controller, stateful storage. The Jenkins master runs as a single-replica StatefulSet with an EBS-backed PVC. So when the pod restarts or we bump the image version, $JENKINS_HOME is still there.
  2. Ephemeral agents. Every build creates a new pod from a YAML template, runs the build, and the pod gets deleted. The template lives in Git alongside the pipelines.
  3. Jobs as code. Folder hierarchy, parameters, pipeline scripts, all of it is Job DSL Groovy in a repo. A seed job runs the DSL and Jenkins materialises everything else. Nothing is configured through the UI.

Here is what it looks like end to end:

   user ──▶ ingress-nginx ─▶ Service ─▶ Jenkins Master (StatefulSet + PVC)
   (HTTPS)  (internal NLB)                  │
                                            │ JNLP 50000

                                  Build #N → kubectl create pod ──▶ Dynamic Agent Pod
                                                                     ├─ jnlp
                                                                     ├─ docker / kaniko
                                                                     ├─ maven
                                                                     ├─ kubectl
                                                                     ├─ terraform
                                                                     ├─ aws-cli
                                                                     └─ ssh-client

Everything lives in one namespace called jenkins. The controller, the agents, the ConfigMaps, the Secrets. Keeps RBAC small and easy to reason about.

IRSA: No Long-Lived AWS Keys, Ever

The first thing I want to talk about is AWS auth, because honestly this is where most Jenkins-on-Kubernetes setups go sideways. People end up with an AWS access key stored as a Jenkins credential and then they cant rotate it, they cant audit it, and it ends up in someones laptop somewhere.

We use IRSA (IAM Roles for Service Accounts). The Jenkins service account is annotated with an IAM role ARN. The EKS pod identity webhook injects a projected token into every pod that uses this service account, and the AWS SDKs pick that token up automatically.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: jenkins-serviceaccount
  namespace: jenkins
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::<aws-account-id>:role/jenkins-ci-role

That is the whole thing. No keys, no rotation, no secrets in Jenkins. The base IAM role gets whatever permissions our pipelines need (S3, EKS describe, SSM Parameter Store, etc).

Cross-Account With Profile Chaining

The cool part: we deploy into multiple AWS accounts (prod, nonprod, sandbox) from one Jenkins. No shared keys, no creating users in every account. We do it with an aws/config mounted as a ConfigMap, with two profiles chained together.

apiVersion: v1
kind: ConfigMap
metadata:
  name: awsconfig-configmap
  namespace: jenkins
data:
  config: |-
    [profile prod]
    role_arn = arn:aws:iam::<prod-account-id>:role/jenkins-ci-role
    web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token

    [profile nonprod]
    role_arn = arn:aws:iam::<nonprod-account-id>:role/jenkins-deploy-role
    source_profile = prod
    role_session_name = jenkinssession

The first profile authenticates with IRSA. The second uses the first as a source_profile and calls AssumeRole into a different account. Inside any pipeline, you just do aws --profile nonprod s3 ls and the SDK figures out the chain.

This single config has saved us alot of pain.

The Jenkins Controller (StatefulSet)

The controller is a single-replica StatefulSet because Jenkins is stateful. $JENKINS_HOME has plugin state, build history, job config, credentials. Lose it and you are starting over.

Rather than dumping the whole manifest, here is what we actually configure and why each bit is there:

Zone-pinned scheduling. EBS volumes are zone-bound. So we pin the pod to the same zone as the PVC using nodeAffinity on topology.kubernetes.io/zone. Without this, after a restart the pod can land in a different AZ and just fails to attach the volume. The pod sits there forever with a Multi-Attach error. If multi-AZ availability matters more to you than simplicity, swap EBS for EFS and drop the affinity.

A 100Gi gp3 PVC via volumeClaimTemplates. This is the $JENKINS_HOME mount. Sized for plugin state, build logs, and a reasonable retention window. It grows over time so put it on a graph.

JAVA_OPTS for fast agent provisioning. Out of the box, Jenkins waits a while before spinning up new agents when builds queue up. We pass a few flags to make it more aggressive:

  • NodeProvisioner.initialDelay=0 - dont pause before provisioning
  • NodeProvisioner.MARGIN=50 and MARGIN0=0.85 - scale up sooner
  • kubernetes.websocket.ping.interval=30000 - keep long builds alive when there is an NLB in front. We had agents getting disconnected on multi-hour builds before this was set.

Two container ports. 8080 for the UI/API and 50000 for inbound JNLP from agents. Both exposed via a ClusterIP service. The ingress is internal-only, so the controller never gets a public IP. You hit it through the VPN.

Why The Init Container Matters

This one needs its own paragraph because it trips people up.

Jenkins inside the container runs as UID 1000. But EBS volumes get mounted as root-owned by default. So if you just start Jenkins on a fresh PVC, the Jenkins process cant write to /var/jenkins_home. The pod crashloops with permission errors and the logs are pretty unhelpful about why.

The init container fixes this before Jenkins starts. It runs chown -R 1000:1000 /var/jenkins_home/ and hands the volume over to the Jenkins user so it can actually write.

Init containers run to completion before any normal container starts. So by the time the Jenkins master boots, ownership is fixed. Boring stuff, but skip it and the master never comes up cleanly on a fresh PVC.

The Real Magic: Dynamic Build Agents

This is the part I would actually recommend reading if you only have time for one section.

Here is the thing about old-school Jenkins. You have a fixed pool of agent nodes. Each build claims one. Hit your max executor count during a busy hour and builds just sit in the queue. Developers ping you on Slack asking why the deploy hasnt started. You spin up another agent VM. Next week it happens again.

The Jenkins Kubernetes plugin makes that whole problem disappear. It points at the in-cluster API (no creds needed, it uses the controllers service account). For every build, it reads a YAML pod spec from the jobs repo, creates a fresh pod in the cluster, runs the build inside it, and tears the pod down the moment the build finishes.

There is no fixed agent pool. The cluster’s autoscaler is your agent capacity. If 50 builds get triggered at once, 50 pods start in parallel. The only thing that gates concurrency is how many nodes EKS is willing to give you, which honestly comes down to how much you want to pay for compute that hour.

Once we moved to this model, a few things changed:

  • The build queue went to zero. Pretty much always, even during deploy storms. Builds run immediately because a new agent is just kubectl create pod away.
  • No more snowflake agents. Every build starts from the same template. The agent that ran yesterday is the agent that runs today, byte for byte.
  • Adding a new build tool is a pull request. Edit the pod template, open a PR, get it reviewed. No SSHing into agent VMs, no Puppet, no Ansible runs.
  • Pipelines live in Git too. Every build pulls its Jenkinsfile (or pipeline script) fresh from the repo on each run. So changing build logic is a PR, not a session in the Jenkins UI clicking through fields. Reviewable, auditable, and rolling back a bad change is git revert. The manual effort on Jenkins itself drops to almost nothing.
  • We only pay for the seconds builds actually run. Pods come up, do the work, go away. No idle agents burning compute overnight.

Our pod template has one container per tool family. So instead of one giant fat agent image with maven and terraform and kubectl and the docker CLI and aws-cli all crammed in, we have separate containers and the pipeline picks which one runs each step.

The Pod Template Format

The template is just a regular Kubernetes Pod spec. Here is the overall shape of the file, with the details left out so you can see the skeleton:

apiVersion: v1
kind: Pod
spec:
  serviceAccountName: jenkins-serviceaccount
  containers:
    - name: maven
      image: maven:3.9-eclipse-temurin-17
      command: [cat]
      tty: true
      volumeMounts:
        - ...

    - name: kubectl
      image: ...
      command: [cat]
      tty: true

    # repeat once per tool: node, python, terraform, aws-cli, docker, kaniko, ...

  volumes:
    - name: docker-sock-volume
      hostPath: { path: /var/run/docker.sock }
    - name: aws-config-volume
      configMap: { name: awsconfig-configmap }
    - name: aws-volume
      emptyDir: {}

The only Jenkins-specific trick is command: [cat] and tty: true on each container. The container starts, runs cat against a tty, and sits there waiting on stdin forever. When the pipeline says container('maven') { sh 'mvn package' }, Jenkins runs kubectl exec into that container and executes the shell. Repeat the same block once per tool you need.

What Goes In The Template

We have one container per build runtime, plus one per “ops tool”. Pick the ones your team actually uses, skip the rest.

Build runtime containers (one per language):

  • maven for Java and Spring Boot - image like maven:3.9-eclipse-temurin-17
  • node for TypeScript, React, Next.js - image like node:20-alpine
  • python for Python services and scripts - image like python:3.12-slim
  • golang for Go services - image like golang:1.22

Ops tool containers (one per deploy or infra task):

  • kubectl (or kubectl + helm bundled) for applying manifests
  • terraform for plan and apply
  • aws-cli for AWS commands
  • docker (Docker-in-Docker or socket-bind) for image builds
  • kaniko for rootless image builds where mounting the socket is not allowed
  • ssh-client for jobs that hop through a bastion

The whole template lives in Git. Changing a tool version is a pull request. Adding a new language is a pull request. Nothing is configured in the Jenkins UI.

Volumes and Mounts

Three volumes do most of the heavy lifting. Worth understanding what each one is for, because this is where the agent template gets confusing the first time you read it:

Docker socket (hostPath: /var/run/docker.sock). Mounted into the docker container only. Lets the container talk to the worker node’s Docker daemon directly, which makes image builds fast (no nested daemon, no extra layers). The trade-off is it couples the agent to the worker node and gives the build root-equivalent access to the host. That is why we also ship Kaniko: same job, no socket. If your security posture rules out the socket, just delete the docker container and run image builds in kaniko.

AWS config ConfigMap. Mounted at /root/.aws/config in every container that touches AWS. This is how the cross-account profile chaining we set up earlier becomes visible inside the build. Every container that mounts this gets the prod and nonprod profiles for free, no further config needed.

AWS scratch volume (emptyDir). Mounted at /root/.aws in the same containers. The AWS SDK caches assumed-role credentials and SSO tokens here. emptyDir means the cache lives only as long as the pod, which is only as long as the build. When the build ends, the pod dies, and the cached credentials disappear with it. That is exactly the property we want.

We also mount SSH keys from a Kubernetes Secret into containers that do git clone over SSH, and a GitHub token Secret into containers that hit the GitHub API.

One More Thing

The agent pod uses the same service account as the controller. So every container in the build pod inherits the controllers IRSA role. AWS API calls from inside any of them just work. No extra credential plumbing per pipeline.

Picking a Container in the Pipeline

Inside the pipeline, every stage wraps its shell commands in a container('name') { ... } block. Jenkins routes those commands to the right container in the build pod.

A minimal example:

stage('Build') {
    steps {
        container('maven') { sh 'mvn package' }
    }
}
stage('Deploy') {
    steps {
        container('kubectl') { sh 'kubectl apply -f k8s/' }
    }
}

mvn package runs in the maven container. kubectl apply runs in the kubectl container. Same workspace, same pod, just different tool. The pipeline file stays short because all the tooling already exists in the pod.

Specialised Agents With inheritFrom

Sometimes one job needs something weird. A vendor SDK. A Python CLI we wrote in-house. A database driver that does not belong in the main template. We dont want to bloat the base template every time some one-off comes up.

The pattern: write a tiny supplemental pod spec that just adds the extra container, and reference both files in the pipeline. The Kubernetes plugin merges them at build time using inheritFrom.

The supplemental file is normally just a few lines: a single container with an image, the usual command: [cat] keep-alive, plus any ConfigMap or Secret it needs via envFrom. In the pipeline, you reference the base template with inheritFrom and point yamlFile at the supplemental file. The Kubernetes plugin merges the two pod specs and runs the build in the combined pod.

The base template stays clean. The job-specific container brings its own dependencies. Easy to remove if the job goes away.

Jobs as Code: Job DSL + a Seed Job

If you have spent any time in Jenkins, you know the worst part is reproducing a Jenkins instance. Folder by folder, job by job, parameter by parameter. Painful.

So we dont configure jobs through the UI at all. We use Job DSL to describe every job as Groovy in a Git repo, and a single “seed” job in Jenkins runs the DSL and creates everything else.

Folder layout in the jobs repo:

jenkins-jobs/
├── containerTemplate/
│   ├── SlaveContainerTemplate.yaml
├── jenkins-dsl/
│   ├── common/folders.groovy
│   ├── platform/
│   ├── prod/
│   └── nonprod/
└── jenkins-pipelines/
    ├── backups/
    ├── deploys/
    └── platform/

The DSL describes the job object (name, folder, parameters, where to pull the pipeline from). The pipeline scripts describe what the job does. They are pulled fresh from Git on every build.

A typical job-as-code file has four things in it:

  • A pipelineJob block with the folder path and name (e.g. "prod/backups/daily-snapshot")
  • Schedule and retention via triggers { cron(...) } and logRotator(...)
  • Parameters declared with choiceParam, stringParam, booleanParam, or activeChoiceParam (more on that next)
  • An SCM block pointing at the pipeline script in Git: which repo, which branch, which file path

The whole thing is normally 20-30 lines of Groovy. To create the next job, you copy the file, change the names, and commit. The seed job picks it up on its next run and Jenkins materialises the new job in the folder you specified. No clicking through the UI.

Active Choices: Parameters That Know What Is In AWS

This bit is fun. The Active Choices plugin lets parameter values be computed by a Groovy script when the job page loads. So you can have a dropdown of EKS clusters that is queried live from AWS at the moment someone clicks “Build with Parameters”.

Inside an activeChoiceParam block you give it a Groovy script that returns a List<String>. The script can do whatever you want: shell out to aws eks list-clusters, hit the GitHub API for the team’s repos, query SSM for environment names, read from a database. Common shape:

activeChoiceParam('repo_name') {
    groovyScript {
        script('''
            // 1. Grab a credential by ID (e.g. a GitHub bot token)
            // 2. Call the external API (curl, AWS SDK, etc)
            // 3. Filter and shape the response
            // 4. Return a List<String>
        ''')
        fallbackScript("return ['placeholder']")
    }
}

Two things worth doing:

  • Always set a fallbackScript. If the live query breaks (rate limit, network blip, expired credential), the form needs to still load. Otherwise the whole job page hangs and nobody can run anything.
  • Look up Jenkins credentials by ID inside the script using CredentialsProvider.lookupCredentials(...). Never hardcode tokens in the script. The credential ID lives in plain text in the DSL, but the secret stays inside Jenkins.

No more hardcoded dropdowns that go stale. The repo list, the cluster list, the nodegroup list, all live. Saves us from updating Groovy every time someone adds a cluster.

Stuff I Wish Someone Had Told Me

A few operational things we learned the hard way:

Snapshot the EBS volume. $JENKINS_HOME lives on a PVC. We snapshot it on a schedule. Without that, an EBS failure or accidental PVC deletion is a really bad day.

Set idleMinutes on agent templates per job. Default is zero so the pod gets killed right after the build. For chatty jobs (back-to-back builds), set it to something like 20 so the pod stays warm. For one-shot daily jobs, leave it at zero.

Add resource limits to every container. The example template we showed above intentionally omits them. Do not ship like that. Once you know what each tool actually needs, set requests and limits. A runaway Maven build with no limits can evict other pods on the node.

External Secrets is worth the effort. We mount Kubernetes Secrets into agents directly in this post. For real workloads, hook up External Secrets Operator with AWS Secrets Manager so rotation does not require redeploying YAML.

Multi-AZ for the controller is a trade-off. Pinning to one zone keeps EBS simple but means an AZ outage takes Jenkins down. If you need HA, move $JENKINS_HOME to EFS and drop the zone affinity. We decided the operational simplicity of EBS was worth the risk for our use case.

Wrap-up

The pieces fit together like this:

  • Kubernetes does scheduling, secrets, RBAC, and pod lifecycle. We dont run agent processes, we let the cluster do it.
  • IRSA + a profile-chain aws/config gives us AWS auth that is both keyless and cross-account.
  • The Kubernetes plugin pod template turns the agent definition into versioned YAML in Git. Changing a tool version is a pull request.
  • Job DSL turns the Jenkins UI into a Git repository. Creating a job is also a pull request.
  • Pipelines stay short because the tooling is already in the pod. Every pipeline is just a series of container('foo') { sh '...' } blocks.

The result is a Jenkins that scales out elastically, holds no state outside $JENKINS_HOME, and has basically nothing configured by hand. The whole setup is reproducible from a fresh EKS cluster in about a day.

If you are running a Jenkins that still has snowflake agent VMs and jobs created through the UI, honestly, this is the pattern I would steal.


Need Help Setting This Up?

Running Jenkins on Kubernetes properly is one of those things that looks simple in a blog post and gets messy fast in real life. IRSA setup, plugin baking, agent template hygiene, Job DSL seed jobs, cross-account access, all of it has sharp edges.

Our team provides Jenkins consulting services to help you:

  • Design and migrate to a Kubernetes-native Jenkins with ephemeral pod agents and no static infrastructure
  • Set up jobs-as-code workflows using Job DSL, pipeline shared libraries, and seed jobs
  • Implement secure AWS access with IRSA and cross-account role chaining (no long-lived keys anywhere)

We have built this for teams across fintech, data engineering, and SaaS, and we know where the broken glass is.

Talk to our engineering team about your Jenkins setup

A

Amjad Syed

Published on May 18, 2026

Continue exploring these related topics

Ready to get started?

Need Kubernetes expertise?

From architecture to production support, we help teams run Kubernetes reliably at scale.

Get started
Chat with real humans
Chat on WhatsApp