DevOpsCNCF Graduated

KEDA: Event-Driven Autoscaling

An interactive deep dive into Kubernetes Event-Driven Autoscaling—architecture, two-phase scaling, trigger configurations, and production operations.

The Problem KEDA Solves

KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF Graduated project that extends Kubernetes with event-driven scaling capabilities, most notably the ability to scale workloads to zero—something native Kubernetes HPA cannot do.

The system operates through a clever two-phase architecture where the KEDA operator handles the 0↔1 replica transition while delegating 1↔N scaling to Kubernetes' native HPA controller through a custom external metrics adapter. This design decision, rather than reimplementing HPA logic, represents a pragmatic architectural choice that maximizes compatibility while minimizing maintenance burden.

🎯

Scale to Zero

Eliminate costs for idle workloads by scaling to zero replicas when there's no demand.

📊

70+ Scalers

Connect to Kafka, Prometheus, Azure, AWS, databases, and dozens of other event sources.

HPA Compatible

Works with existing Kubernetes HPA, extending rather than replacing native scaling.

The Three-Component Architecture

KEDA deploys three distinct components in your cluster, each serving a specific architectural purpose. Click on any component to explore its role.

keda namespacekeda-operatorController RuntimegRPC :9666metrics-apiserverExternal Metrics API:443admission-webhooksValidation:9443gRPCScalers (70+)Kafka, Prometheus, Azure...Kubernetes ClusterHPA Controller1↔N ScalingetcdScaledObjects, HPAsTarget WorkloadsDeploymentStatefulSetPodsmetricscreates/manageswatch ScaledObjects

Click on any component to see details

Component Details

Click on any component in the diagram to see detailed information about its role and responsibilities in the KEDA architecture.

Key insight: Only one external metrics provider can exist per cluster. If you're running Datadog's cluster agent or another metrics provider, you'll face silent conflicts.

Critical Constraint: Kubernetes only supports one external metrics provider per cluster. If you're running Datadog's cluster agent or another metrics provider that registers forexternal.metrics.k8s.io, you'll face silent conflicts.

Since KEDA v2.9, internal communication between the metrics server and operator uses gRPC on port 9666. The operator hosts a gRPC server that maintains scaler connections and caches metrics, while the metrics apiserver acts as a gRPC client. This architecture reduces connection overhead—instead of both components independently connecting to external sources, only the operator maintains those connections.

The Scaling Flow in Action

Watch how KEDA processes events from external sources and triggers scaling decisions through its components and into the Kubernetes control plane.

External
KEDA
Kubernetes
📨
Event Source
Kafka/Prometheus/etc
🔍
Scaler
Query Metrics
🧠
KEDA Operator
Evaluate Thresholds
📊
Metrics Server
Serve to HPA
⚖️
HPA Controller
Calculate Replicas
🚀
Deployment
Scale Pods

Click "Play" to see how KEDA processes an event and triggers scaling.

0 → 1 Scaling (KEDA)

When replicas are at zero, the KEDA operator directly handles activation based on the activationThreshold. This bypasses HPA entirely.

1 → N Scaling (HPA)

Once activated, HPA takes over using the external metrics served by KEDA. Scaling decisions use the standard threshold parameter.

The Two-Phase Scaling Model

The fundamental innovation in KEDA is scale-to-zero capability, implemented through a two-phase scaling model that works around HPA's limitation of requiring at least one replica.

Phase 1: Activation (0↔1)

Managed entirely by the KEDA operator. When your workload has zero replicas, KEDA's polling loop checks triggers at the configured pollingInterval.

The activationThreshold controls this boundary—the workload activates when the metric value exceeds (not equals) this threshold.

Phase 2: Scaling (1↔N)

Delegated to Kubernetes HPA. KEDA creates and manages an HPA resource for eachScaledObject, configuring it with external metrics.

The standard threshold parameter controls HPA's scaling decisions, determining how many replicas are needed based on the target per replica.

Interactive Threshold Simulator

Experiment with activation and scaling thresholds. Drag the slider or click scenario buttons to see how KEDA and HPA coordinate scaling decisions.

0
0↑ Activation: 50↑ Threshold: 100500

Threshold Zones

Activation: 50
Threshold: 100
Scale to 00→1 (KEDA)1→N (HPA)

Replicas: 0

💤
Scaled to Zero
Inactive

Workload scaled to zero. KEDA is polling triggers at the configured interval.

ScaledObject configuration
triggers:
  - type: prometheus
    metadata:
      threshold: "100"              # HPA scales when metric > 100/replica
      activationThreshold: "50"    # KEDA activates when metric > 50
Cooldown Gotcha: The cooldownPeriod only applies when scaling to zero. When all triggers become inactive, KEDA starts a countdown timer. Only aftercooldownPeriod seconds (default 300) will it scale to zero. However, scaling from N replicas down to 1 is controlled by HPA's stabilization window, not KEDA's cooldownPeriod.

Trigger Configurations

Triggers (scalers) are the heart of KEDA—they connect your workloads to external event sources. Each trigger type knows how to query a specific system and return metric values for scaling decisions. Explore common production configurations below.

Select a trigger above to see configuration examples

Five Critical Production Gotchas

These are the gotchas that have caused the most debugging sessions and production incidents. Understanding them before deployment will save you significant time.

Pro tip: Before deploying KEDA ScaledObjects to production, run through this checklist and test each scenario in a staging environment. Most production incidents can be prevented by understanding these edge cases.

Production Deployment & Operations

For production environments, deploy KEDA via Helm with explicit resource requests. KEDA components are lightweight but scale with the number of ScaledObjects.

Production Helm values
operator:
  replicaCount: 2  # HA with leader election
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 1
      memory: 1000Mi
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [keda-operator]
          topologyKey: kubernetes.io/hostname

metricsServer:
  replicaCount: 1  # Cannot exceed 1 (single metrics provider)

prometheus:
  operator:
    enabled: true
    serviceMonitor:
      enabled: true
HA Limitation: While you can run 2 operator replicas withleader election for failover, the metrics server effectively runs as a singleton due to the external metrics API constraint.

Key Prometheus Metrics

Scaler Health

  • keda_scaler_active — 1 if active, 0 if inactive
  • keda_scaler_metrics_value — Current metric value
  • keda_scaler_detail_errors_total — Error count

System Health

  • keda_scaled_object_paused — Paused status
  • keda_scaled_object_errors_total — Errors
  • keda_internal_scale_loop_latency — Loop latency
Essential alerting rules
- alert: KEDAScalerErrors
  expr: rate(keda_scaler_detail_errors_total[5m]) > 0
  for: 5m
  annotations:
    summary: "KEDA scaler {{ $labels.scaler }} experiencing errors"

- alert: KEDAScalingLoopLatency
  expr: keda_internal_scale_loop_latency_seconds > 30
  for: 5m
  annotations:
    summary: "KEDA scaling loop delayed beyond threshold"

Compatibility & Troubleshooting

KEDA VersionKubernetes SupportNotes
v2.17Latest
v1.30 - v1.32
  • Latest stable release
  • Enhanced Prometheus scaler performance
  • New scalers added
v2.16
v1.29 - v1.31
  • Improved fallback reliability
  • gRPC optimizations
v2.15
v1.28 - v1.30
  • Workload Identity improvements
  • Strict trigger validation
  • ⚠️ Removed Azure Pod Identity support
  • ⚠️ Empty triggers: [] now fails validation
v2.14
v1.27 - v1.29
  • New scalers for cloud platforms
  • ⚠️ Removed metricName from trigger metadata
  • ⚠️ Use trigger.name instead

Note: KEDA maintains N-2 Kubernetes version support with a 3-month release cadence. Always check the official documentation before upgrading.

ScaledJobs for Batch Workloads

ScaledJobs create Kubernetes Jobs rather than scaling replica counts, appropriate for batch processing where each job runs to completion.

ScaledJob for ML batch processing
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: ml-batch-processor
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    activeDeadlineSeconds: 600
    backoffLimit: 4
    template:
      spec:
        nodeSelector:
          nvidia.com/gpu.present: "true"
        containers:
        - name: ml-worker
          image: ml-inference:latest
          resources:
            limits:
              nvidia.com/gpu: "1"
              memory: "32Gi"
        restartPolicy: Never

  pollingInterval: 15
  successfulJobsHistoryLimit: 10
  failedJobsHistoryLimit: 5
  maxReplicaCount: 20

  scalingStrategy:
    strategy: "accurate"  # Best for queues without locked messages

  triggers:
  - type: rabbitmq
    metadata:
      queueName: ml-inference-queue
      mode: QueueLength
      value: "1"

default

maxScale - runningJobCount

accurate

maxScale - pendingJobCount (best for queues)

eager

Immediately scales to max when queue is non-zero

Conclusion

KEDA's architecture reflects thoughtful engineering decisions—delegating 1↔N scaling to HPA rather than reimplementing it, using gRPC for efficient internal communication, and providing a clean separation between activation (0↔1) and scaling (1↔N) concerns.

For production deployment, focus on understanding the two-phase model, configuring HPA behavior for scale-down control, implementing fallback configurations, and monitoring scaler health through the exposed Prometheus metrics. The scale-to-zero capability combined with event-driven triggers makes KEDA particularly valuable for batch processing, ML inference workloads, and cost-sensitive environments where idle resources should be eliminated entirely.

Resources

An interactive visualization of KEDA architecture and operations

← Back to Articles
KEDA: Deep Technical Architecture and Production Operations Guide | ASleekGeek