I want to tell you about the night I stopped debating whether AI would replace DevOps engineers and started understanding what that question actually means.
It was not a conference talk that changed my mind. It was not a blog post or a benchmark or a LinkedIn thread. It was a real crisis, a downed region, a panicking client, and an AI tool sitting open on my screen at a moment when every minute had a cost.
Here is exactly what happened.
The night the AWS Bahrain region went down
During the recent conflict that impacted the Middle East, the AWS Bahrain region went down. Not degraded. Not slow. Down.
One of our customers ran their entire stack there. Kubernetes clusters, Aurora databases, application servers, storage. All of it sitting in a region that was no longer responding. The phone started ringing within minutes.
In that moment there is no time for meetings. No time for ticket creation. No time for careful deliberation. You make a call and you execute it, and the business is watching every minute that passes.
We made the call: rebuild everything in the AWS Ireland region. Cross-continent. Full stack. As fast as humanly possible.
The plan had four parts that all had to happen in parallel and in sequence at the same time. First, copy the latest backups out of Bahrain and get them into Ireland before anything else. Second, rebuild the entire infrastructure from scratch in Ireland using our battle-tested Terraform codebase. Third, deploy the application code into the fresh environment. Fourth, restore the data from those backups and bring everything back to life. Miss any step, do them in the wrong order, or get the sequencing wrong between them and you are starting over while the client clock keeps running.
We had years of hardened Terraform modules. Code that had been run, broken, fixed, and proven across multiple client environments. The foundation was solid. What we did not have was time.
I opened Claude Code and got to work.
I fed it our existing Terraform codebase and told it exactly what needed to happen: adapt everything for Ireland, wire up the networking from scratch, ensure the Kubernetes node groups come up clean in the new region, prepare the database layer to receive the restored snapshots, rebuild the storage configuration, make sure every component is ready to receive the application the moment the code deployment lands.
What would normally take a team of engineers the better part of a day was being compressed in real time. Claude Code worked through the Terraform systematically. It caught variable mismatches I would have found twenty minutes later. It generated configuration while I was still thinking through sequencing. It held the full context of our infrastructure across dozens of files simultaneously while I was making rapid judgment calls about what to prioritise and in what order.
The backups were moving. The infrastructure was coming up. The application code was deploying. For a while, everything was tracking.
And then, right in the middle of the data restore, it got stuck.
A network issue. Something in the configuration was preventing connectivity from behaving correctly at exactly the point where the restored data needed to start talking to the application layer. The restore was in progress, the application was deployed, the infrastructure was up, and nothing could reach anything else properly.
Claude Code tried several approaches. None of them worked. It was cycling through fixes without understanding the actual root cause, which is exactly what AI does when it reaches the edge of its pattern-matching ability. It can see the symptoms. It cannot feel the system.
I stepped in. I went directly into the configuration, read through it carefully, and found the issue myself. It was the kind of problem that requires you to understand not just the technology but the specific way this particular infrastructure had been built, the decisions made years earlier, and why certain things were configured the way they were. Context that exists nowhere in any documentation. Context that lives only in the head of someone who has worked with this system over time.
I explained the issue to Claude Code. I told it what was wrong, why it was wrong, and what the correct fix needed to be given the specific constraints of this environment.
It fixed it immediately.
The restore completed. The application came up. The client was back online in Ireland, running on infrastructure that had not existed two hours earlier, with data that had been sitting in a downed region on the other side of the world.
I will not give you a made-up number on how much time we saved because I have never had to do this manually in a live conflict scenario and I am not going to invent a comparison. What I can tell you is that the difference was not marginal. It was the kind of difference that changes whether a business survives a crisis or is still explaining the outage three days later.
What that night actually proved
The AI did not replace my expertise that night. It required it. At the exact moment it mattered most, Claude Code could not move forward without a human being who understood the system at a level no AI had access to.
Without years of Terraform experience, the codebase would not have been trustworthy enough to use under that kind of pressure. Without deep familiarity with this client’s architecture, I could not have directed Claude Code toward the right outcomes or caught what it was getting wrong. Without the judgment to read the configuration myself, identify the root cause, and explain it clearly, we would still have been stuck while the client waited.
The AI was fast. I was the one who knew where to go.
That experience settled something in me that months of thinking had not fully resolved.
The honest answer: yes, parts of this profession are already gone
I want to be direct with you, because I think the tech industry is too full of people cheerfully telling you everything is fine when it is not.
AI is already replacing real parts of the DevOps workflow. Writing Kubernetes manifests? Done in seconds. Boilerplate Terraform? Generated. Standard CI/CD configuration? Automated. Common cluster errors? Diagnosed faster than most engineers.
If your value as a DevOps professional is purely in producing YAML files and running kubectl commands, that is a genuinely vulnerable position. Not in five years. Now.
A junior engineer who spent two years learning to do exactly what AI can now do in thirty seconds is facing real market pressure. I would be dishonest to tell you otherwise.
But AI is not killing DevOps. It is killing commodity DevOps.
There are two very different versions of this profession, and people often talk about them as though they are the same thing.
The first version follows runbooks, copies patterns from the internet, sets up pipelines according to documentation, and resolves tickets that have known solutions. This version is process-driven, repeatable, and largely codifiable. AI is very good at codifiable things.
The second version decides which architecture a company should adopt when the tradeoffs are genuinely unclear. It responds to a production incident at 2am when the symptoms are ambiguous and there is no runbook. It identifies a network issue that an AI has been cycling around for twenty minutes. It earns the trust of a leadership team that has to bet millions of dollars on infrastructure it does not fully understand.
That second version requires judgment. And judgment is not a feature you can prompt.
Judgment comes from the scar tissue of things that went wrong and the hard-won pattern recognition that came out of fixing them. AI has none of that. It has text.
Where the demand is actually going
Every serious AI company runs on Kubernetes. GPU workloads, model serving pipelines, MLOps infrastructure, distributed training jobs. None of that works without people who understand cloud-native architecture at a deep level. The irony is that AI is driving enormous demand for exactly the infrastructure expertise it is supposedly replacing.
Platform engineering is growing. Security engineering is growing. FinOps, developer experience, compliance automation. These disciplines all require human judgment about risk, cost, and consequences that cannot be automated away because the decisions themselves are not deterministic.
The market is not shrinking. It is moving up the value chain. The engineers who will feel the squeeze are those who stayed at the bottom of it.
What AI can do and what you still need to own
After everything I have described, here is the clearest way I can summarise it:
| What AI can do well | What you still need to own |
|---|---|
| Writing Kubernetes manifests and Helm charts | Responding to an ambiguous production incident at 2am |
| Generating boilerplate Terraform modules | Making architecture decisions with real business tradeoffs |
| Standard CI/CD pipeline configuration | Telling a CTO something they do not want to hear |
| Diagnosing well-known cluster errors | Identifying a network issue Claude could not resolve itself |
| Cross-region IAM policy generation | Knowing which Terraform module to trust under pressure |
| Adapting existing IaC for a new region | Earning the trust of a team betting millions on your judgment |
| Catching variable mismatches in config files | Deciding when to override AI output and why |
| Holding infrastructure context in working memory | Understanding a client system built over years of experience |
The left column is real and it is growing every month. Engineers who are mostly living in the left column need to start moving right, deliberately and urgently.
The real question is not about AI
The real risk is not AI. The real risk is staying commodity.
The engineers who will struggle are not those who face AI. They are those who never developed genuine depth, who always relied on process over thinking, who built careers on doing what they were told rather than deciding what should be done.
That was always a fragile position. AI is just making the fragility more visible, faster.
Use AI every single day. Give it the tasks that do not need your thinking and spend your time on the ones that do. But invest relentlessly in the depth that AI cannot reach. The architectural judgment. The client trust. The ability to walk into a broken system at the worst possible moment and know, from hard-earned experience, exactly where to look.
Because on the night the region goes down, when the backups are moving and the infra is rebuilding and the restore is halfway through and the AI gets stuck, the client is not calling for a tool. They are calling for someone who knows what to do.
Make sure that person is you.
Have you been using AI in your infrastructure work? Where has it impressed you and where has it fallen short? I would genuinely like to know.
Need Expert DevOps Engineers Who Know When AI Is Not Enough?
The story above is not hypothetical. It is how Tasrie IT Services operates every day, combining AI tooling with deep engineering judgment to deliver results that neither can achieve alone.
Our DevOps consulting team brings the expertise that matters when it counts:
- Battle-tested infrastructure code proven across 50+ client environments
- CKA/CKAD/CKS certified engineers who understand Kubernetes at the deepest level
- Same-day crisis response via Slack and Teams, because outages do not wait for business hours
- Same senior engineer from start to finish, no junior rotation, no handoffs
Whether you need disaster recovery architecture, Kubernetes production support, or a team that can rebuild your entire stack in a new region under pressure, we have been there.
Talk to our engineering team about your infrastructure challenges