Microservices: when they make sense and when they don't
Adopting microservices without the right organisational context is one of the most expensive architectural decisions we have seen in real projects. A company with an 8-engineer team building a SaaS platform with 15 independent microservices does not have more agility — it has more operational overhead, more network latency and more debugging complexity, without the independent scalability advantage that would justify the architecture.
Microservices are the right answer when business domains are sufficiently distinct to require independent deployment cycles, when scalability requirements vary significantly between components, and when teams are organised so that Conway's law favours separation. Without those conditions, a well-designed, modular monolith — what some call a "modular monolith" — operates with less friction and lower coordination costs.
The transition from monolith to microservices, when it does make sense, is best done incrementally — extracting first the services with the greatest difference in scalability requirements or rate of change, not the smallest or easiest ones. The most common anti-pattern is starting to "microservice" where the team has the most technical confidence, not where the architecture generates the most value.
Kubernetes in production: the operational reality
Kubernetes solves real container orchestration problems, but introduces an abstraction layer with its own operational complexity. The adoption cost is not just initial learning time — it is the ongoing cost of running a cluster in production: version upgrades, networking management (CNI, service mesh), persistent storage, control plane security, and incident response on a platform with multiple abstraction layers.
For most companies that do not have dedicated platform teams, managed services (EKS, AKS, GKE) with a well-defined operator model are the most sensible path. The discussion of 'should we use Kubernetes?' should be preceded by 'what platform capability do we have and want to have?'. Many use cases taken to Kubernetes would be better served by Cloud Run, Azure Container Apps or AWS App Runner, which abstract cluster management while retaining container benefits.
When Kubernetes is the right choice, investment in GitOps (ArgoCD, Flux) to manage cluster state as code is essential. Manually managed clusters accumulate configuration drift that manifests as hard-to-diagnose incidents and version migrations that consume weeks instead of days. Declarative state in git is not an optional practice — it is the difference between an auditable cluster and one that only the team that built it understands.
DevSecOps: integrating security without slowing delivery
DevSecOps is not adding a vulnerability scanner to the CI/CD pipeline and calling it good security practice. It is a shift in how the team thinks about security: from an output gate that slows deployment to an integrated discipline that catches problems before they reach production code.
The controls with the greatest impact integrated into the pipeline are: software composition analysis (SCA) to detect dependencies with known CVEs, static application security testing (SAST) for dangerous code patterns, container image scanning before registry push, and policy-as-code (OPA/Gatekeeper) to validate Kubernetes manifests against security policies before deploying to production. All these controls must be fast (under 3 minutes) or the team will disable them the moment they slow down an urgent deployment.
The hardest part of DevSecOps is not technical — it is cultural. Development and security teams have historically different incentives: speed vs. control. Integration works when security stops being the team that says no and becomes the team that enables: providing automated guardrails that give immediate feedback, not manual reviews that block for weeks. The investment in 'shift left' tooling pays off not only in prevented incidents, but in the relationship between teams.
Cloud-native observability: from logs to correlated signals
In cloud-native architectures, observability is not optional — it is the only way to understand what is happening in a distributed system. A service that fails in production does not have a single clear failure point; it has a chain of latencies, timeouts and errors that propagate between services and are only intelligible with complete distributed traces.
OpenTelemetry has become the de facto standard for observability instrumentation in cloud-native systems. The promise of vendor neutrality is real: instrument once and send to any backend (Jaeger, Tempo, Datadog, Honeycomb) reduces the observability lock-in that has trapped many teams in costly contracts. Automatic instrumentation for the most common frameworks (Express, FastAPI, Spring Boot, .NET) covers 80% of necessary spans with minimal configuration.
The current frontier is signal correlation: automatically relating a degraded trace to the deployment that caused it, to the infrastructure metric that explains it, and to the log that confirms it. The most advanced observability platforms (Grafana Stack, Honeycomb, Datadog APM) are advancing in this direction with ML models that reduce root cause identification time from hours to minutes. In 2025, the quality of observability in a cloud-native system is as determinant for SLA as the application architecture.
Is your architecture ready to scale?
We audit your current architecture, identify scalability bottlenecks and design the cloud-native roadmap with measurable impact on cost and availability.