KEDA: Event-Driven Autoscaling
An interactive deep dive into Kubernetes Event-Driven Autoscaling—architecture, two-phase scaling, trigger configurations, and production operations.
The Problem KEDA Solves
KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF Graduated project that extends Kubernetes with event-driven scaling capabilities, most notably the ability to scale workloads to zero—something native Kubernetes HPA cannot do.
The system operates through a clever two-phase architecture where the KEDA operator handles the 0↔1 replica transition while delegating 1↔N scaling to Kubernetes' native HPA controller through a custom external metrics adapter. This design decision, rather than reimplementing HPA logic, represents a pragmatic architectural choice that maximizes compatibility while minimizing maintenance burden.
Scale to Zero
Eliminate costs for idle workloads by scaling to zero replicas when there's no demand.
70+ Scalers
Connect to Kafka, Prometheus, Azure, AWS, databases, and dozens of other event sources.
HPA Compatible
Works with existing Kubernetes HPA, extending rather than replacing native scaling.
The Three-Component Architecture
KEDA deploys three distinct components in your cluster, each serving a specific architectural purpose. Click on any component to explore its role.
Click on any component to see details
Component Details
Click on any component in the diagram to see detailed information about its role and responsibilities in the KEDA architecture.
Key insight: Only one external metrics provider can exist per cluster. If you're running Datadog's cluster agent or another metrics provider, you'll face silent conflicts.
external.metrics.k8s.io, you'll face silent conflicts.Since KEDA v2.9, internal communication between the metrics server and operator uses gRPC on port 9666. The operator hosts a gRPC server that maintains scaler connections and caches metrics, while the metrics apiserver acts as a gRPC client. This architecture reduces connection overhead—instead of both components independently connecting to external sources, only the operator maintains those connections.
The Scaling Flow in Action
Watch how KEDA processes events from external sources and triggers scaling decisions through its components and into the Kubernetes control plane.
Click "Play" to see how KEDA processes an event and triggers scaling.
0 → 1 Scaling (KEDA)
When replicas are at zero, the KEDA operator directly handles activation based on the activationThreshold. This bypasses HPA entirely.
1 → N Scaling (HPA)
Once activated, HPA takes over using the external metrics served by KEDA. Scaling decisions use the standard threshold parameter.
The Two-Phase Scaling Model
The fundamental innovation in KEDA is scale-to-zero capability, implemented through a two-phase scaling model that works around HPA's limitation of requiring at least one replica.
Phase 1: Activation (0↔1)
Managed entirely by the KEDA operator. When your workload has zero replicas, KEDA's polling loop checks triggers at the configured pollingInterval.
The activationThreshold controls this boundary—the workload activates when the metric value exceeds (not equals) this threshold.
Phase 2: Scaling (1↔N)
Delegated to Kubernetes HPA. KEDA creates and manages an HPA resource for eachScaledObject, configuring it with external metrics.
The standard threshold parameter controls HPA's scaling decisions, determining how many replicas are needed based on the target per replica.
Interactive Threshold Simulator
Experiment with activation and scaling thresholds. Drag the slider or click scenario buttons to see how KEDA and HPA coordinate scaling decisions.
Threshold Zones
Replicas: 0
Workload scaled to zero. KEDA is polling triggers at the configured interval.
triggers:
- type: prometheus
metadata:
threshold: "100" # HPA scales when metric > 100/replica
activationThreshold: "50" # KEDA activates when metric > 50cooldownPeriod seconds (default 300) will it scale to zero. However, scaling from N replicas down to 1 is controlled by HPA's stabilization window, not KEDA's cooldownPeriod.Trigger Configurations
Triggers (scalers) are the heart of KEDA—they connect your workloads to external event sources. Each trigger type knows how to query a specific system and return metric values for scaling decisions. Explore common production configurations below.
Five Critical Production Gotchas
These are the gotchas that have caused the most debugging sessions and production incidents. Understanding them before deployment will save you significant time.
Pro tip: Before deploying KEDA ScaledObjects to production, run through this checklist and test each scenario in a staging environment. Most production incidents can be prevented by understanding these edge cases.
Production Deployment & Operations
For production environments, deploy KEDA via Helm with explicit resource requests. KEDA components are lightweight but scale with the number of ScaledObjects.
operator:
replicaCount: 2 # HA with leader election
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 1
memory: 1000Mi
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: [keda-operator]
topologyKey: kubernetes.io/hostname
metricsServer:
replicaCount: 1 # Cannot exceed 1 (single metrics provider)
prometheus:
operator:
enabled: true
serviceMonitor:
enabled: trueKey Prometheus Metrics
Scaler Health
keda_scaler_active— 1 if active, 0 if inactivekeda_scaler_metrics_value— Current metric valuekeda_scaler_detail_errors_total— Error count
System Health
keda_scaled_object_paused— Paused statuskeda_scaled_object_errors_total— Errorskeda_internal_scale_loop_latency— Loop latency
- alert: KEDAScalerErrors
expr: rate(keda_scaler_detail_errors_total[5m]) > 0
for: 5m
annotations:
summary: "KEDA scaler {{ $labels.scaler }} experiencing errors"
- alert: KEDAScalingLoopLatency
expr: keda_internal_scale_loop_latency_seconds > 30
for: 5m
annotations:
summary: "KEDA scaling loop delayed beyond threshold"Compatibility & Troubleshooting
| KEDA Version | Kubernetes Support | Notes |
|---|---|---|
v2.17Latest | v1.30 - v1.32 |
|
v2.16 | v1.29 - v1.31 |
|
v2.15 | v1.28 - v1.30 |
|
v2.14 | v1.27 - v1.29 |
|
Note: KEDA maintains N-2 Kubernetes version support with a 3-month release cadence. Always check the official documentation before upgrading.
ScaledJobs for Batch Workloads
ScaledJobs create Kubernetes Jobs rather than scaling replica counts, appropriate for batch processing where each job runs to completion.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: ml-batch-processor
spec:
jobTargetRef:
parallelism: 1
completions: 1
activeDeadlineSeconds: 600
backoffLimit: 4
template:
spec:
nodeSelector:
nvidia.com/gpu.present: "true"
containers:
- name: ml-worker
image: ml-inference:latest
resources:
limits:
nvidia.com/gpu: "1"
memory: "32Gi"
restartPolicy: Never
pollingInterval: 15
successfulJobsHistoryLimit: 10
failedJobsHistoryLimit: 5
maxReplicaCount: 20
scalingStrategy:
strategy: "accurate" # Best for queues without locked messages
triggers:
- type: rabbitmq
metadata:
queueName: ml-inference-queue
mode: QueueLength
value: "1"default
maxScale - runningJobCount
accurate
maxScale - pendingJobCount (best for queues)
eager
Immediately scales to max when queue is non-zero
Conclusion
KEDA's architecture reflects thoughtful engineering decisions—delegating 1↔N scaling to HPA rather than reimplementing it, using gRPC for efficient internal communication, and providing a clean separation between activation (0↔1) and scaling (1↔N) concerns.
For production deployment, focus on understanding the two-phase model, configuring HPA behavior for scale-down control, implementing fallback configurations, and monitoring scaler health through the exposed Prometheus metrics. The scale-to-zero capability combined with event-driven triggers makes KEDA particularly valuable for batch processing, ML inference workloads, and cost-sensitive environments where idle resources should be eliminated entirely.
Resources
An interactive visualization of KEDA architecture and operations
← Back to Articles