✍️ BLOG · IT Operations

The Great Lie of Incident Resolution: no, you are not losing €100K for every hour of downtime

ES | EN | PT
The great downtime lie
← Back to blog

Open any recent report about ITSM, attend any webinar organised by a software vendor, or read the latest Gartner Magic Quadrant. It will not take you five minutes to stumble across the terror metric:

"Every hour your systems are down costs you €100,000. Sometimes more than a million."

They repeat it until you are sick of it. They print it in colourful bar charts. Vendor salespeople tell you over and over before handing you a six-figure quote for their new "magic" Artificial Intelligence tool.

Let us tell the truth for once: if your company were losing €100,000 for every hour a server goes down, you would not be reading this article on LinkedIn. You would be on your yacht in Ibiza.

Unless you are the Santander Bank, Amazon in the middle of Black Friday, or the AENA network, having your CRM down for fifteen minutes, your ERP throwing a 500 error, or the intranet gateway failing to load is not going to bankrupt you. It is an inconvenience, yes. It is a service problem, of course. But it is not the financial apocalypse vendors try to sell you in order to move gold-priced licences.

Enough with the commercial scaremongering.

The real cost: silent, human and monthly

The real problem with technology incidents is not the minute of downtime. The true bleeding in companies — the kind that genuinely costs money every month — is silent, does not make headlines, and takes the shape of payroll and human burnout.

73%

of your developers and engineers spend at least half their working day firefighting — State of AI-First Operations Report, 2026

That is the real cost. Not lost sales from an abandoned shopping cart; it is development sprints that are never delivered, it is technical talent earning €60K a year acting as a telephone operator, and it is an IT team burned out, demotivated and on the verge of filing for stress leave.

We are treating incident resolution as if we were a Formula 1 pit crew, but we look more like a neighbourhood garage trying to fix a flat tyre with a fork.

If you truly want to stop burning money and talent, the first step is to stop believing market statistics and start looking at how your operational internals actually work. In nearly 25 years in this industry, I cannot recall a single company that knew with certainty how much each hour of downtime cost them.

The 4 doses of reality nobody tells you about

Here come four doses of reality about why your incident management is not working, and why no magic tool will save you if you do not change the process.

MTTR dashboard with vanity metrics
Dose 1 of 4

You are still measuring MTTR as if it were a trophy

If the system goes down at 3:00 a.m. and no users are working, the business impact is ZERO. But if you wake up an engineer from bed so the graph looks nice, you have just destroyed their productivity for the next day. We must stop measuring stopwatches and start measuring team burnout.

Read article
L1 L2 L3 escalation levels diagram
Dose 2 of 4

Escalation tiers are a ping-pong match, not a workflow

L1 support has become a simple "human router". The cost of a manually triaged ticket is around €15; automated, it drops to €2. But the real cost is flooding L2 and L3 with noise and routine tasks that require no grey matter.

Read article
AI copilot panel for IT remediation
Dose 3 of 4

You want AI, but you do not dare let it touch production

AI is fantastic at reading logs, consolidating alerts and giving context to the engineer. But resolving on its own? Absolutely not. 44% of companies prohibit AI from executing remediation steps without a human in the loop. Deploying it on broken infrastructure is like putting an autopilot on a car with a blown engine.

Read article
Post-mortem meeting with closed Jira ticket
Dose 4 of 4

You do post-mortems just to tick the box

The document gets filled in, the ticket gets closed... and it is never looked at again. Until three months later the same service goes down for exactly the same reason. 100% of IT leaders acknowledge that post-incident learning is vital. Only 48% actually apply it.

Read article

True resilience is not "zero downtime"

We have been sold an idea of unattainable operational perfection, based on buying licences at a ridiculous cost to prevent the system from going down for a single second.

The reality I have seen after years of dealing with infrastructure, escalations, client complaints and engineers on the verge of collapse is very different. This is something those of us who ride motorcycles have deeply internalised: there are only two types of riders — those who have fallen and those who will fall.

Systems will go down. Where there is code there are always bugs. The network fails. True resilience is not having a "99.999% Uptime" counter. Real resilience is having a team that trusts its processes.

An automated Level 1 that stops the garbage. A Level 2 supported by AI that gives context, not takes their job away. A Level 3 that can sleep soundly knowing they will only be woken up if the house is genuinely on fire, not because someone forgot their password.

You are not Santander Bank, and you do not need to be. You just need to stop buying smoke and start organising the chaos in your IT department, respecting the time, mental health and talent of the people who keep the lights on.

Did you find this useful? Share it on LinkedIn

Share on LinkedIn

Is your IT department putting out the same fires every week?

We do a rapid diagnosis of your operations management and tell you exactly where the bleeding is. No commitment required.

Let's talk More posts