DevOpsCNCF Graduated

KEDA: Event-Driven Autoscaling

An interactive deep dive into Kubernetes Event-Driven Autoscaling—architecture, two-phase scaling, trigger configurations, and production operations.

The Problem KEDA Solves

KEDA (Kubernetes Event-Driven Autoscaling) is a CNCF Graduated project that extends Kubernetes with event-driven scaling capabilities, most notably the ability to scale workloads to zero—something native Kubernetes HPA cannot do.

The system operates through a clever two-phase architecture where the KEDA operator handles the 0↔1 replica transition while delegating 1↔N scaling to Kubernetes' native HPA controller through a custom external metrics adapter. This design decision, rather than reimplementing HPA logic, represents a pragmatic architectural choice that maximizes compatibility while minimizing maintenance burden.

🎯

Scale to Zero

Eliminate costs for idle workloads by scaling to zero replicas when there's no demand.

📊

70+ Scalers

Connect to Kafka, Prometheus, Azure, AWS, databases, and dozens of other event sources.

⚡

HPA Compatible

Works with existing Kubernetes HPA, extending rather than replacing native scaling.

The Three-Component Architecture

KEDA deploys three distinct components in your cluster, each serving a specific architectural purpose. Click on any component to explore its role.

Click on any component to see details

Component Details

Click on any component in the diagram to see detailed information about its role and responsibilities in the KEDA architecture.

Key insight: Only one external metrics provider can exist per cluster. If you're running Datadog's cluster agent or another metrics provider, you'll face silent conflicts.

Critical Constraint: Kubernetes only supports one external metrics provider per cluster. If you're running Datadog's cluster agent or another metrics provider that registers forexternal.metrics.k8s.io, you'll face silent conflicts.

Since KEDA v2.9, internal communication between the metrics server and operator uses gRPC on port 9666. The operator hosts a gRPC server that maintains scaler connections and caches metrics, while the metrics apiserver acts as a gRPC client. This architecture reduces connection overhead—instead of both components independently connecting to external sources, only the operator maintains those connections.

The Scaling Flow in Action

Watch how KEDA processes events from external sources and triggers scaling decisions through its components and into the Kubernetes control plane.

External

KEDA

Kubernetes

📨

Event Source

Kafka/Prometheus/etc

→

🔍

Scaler

Query Metrics

→

🧠

KEDA Operator

Evaluate Thresholds

→

📊

Metrics Server

Serve to HPA

→

⚖️

HPA Controller

Calculate Replicas

→

🚀

Deployment

Scale Pods

Click "Play" to see how KEDA processes an event and triggers scaling.

0 → 1 Scaling (KEDA)

When replicas are at zero, the KEDA operator directly handles activation based on the activationThreshold. This bypasses HPA entirely.

1 → N Scaling (HPA)

Once activated, HPA takes over using the external metrics served by KEDA. Scaling decisions use the standard threshold parameter.

The Two-Phase Scaling Model

The fundamental innovation in KEDA is scale-to-zero capability, implemented through a two-phase scaling model that works around HPA's limitation of requiring at least one replica.

Phase 1: Activation (0↔1)

Managed entirely by the KEDA operator. When your workload has zero replicas, KEDA's polling loop checks triggers at the configured pollingInterval.

The activationThreshold controls this boundary—the workload activates when the metric value exceeds (not equals) this threshold.

Phase 2: Scaling (1↔N)

Delegated to Kubernetes HPA. KEDA creates and manages an HPA resource for eachScaledObject, configuring it with external metrics.

The standard threshold parameter controls HPA's scaling decisions, determining how many replicas are needed based on the target per replica.

Interactive Threshold Simulator

Experiment with activation and scaling thresholds. Drag the slider or click scenario buttons to see how KEDA and HPA coordinate scaling decisions.

Simulated Metric Value0

0↑ Activation: 50↑ Threshold: 100500

Threshold Zones

Activation: 50

Threshold: 100

Scale to 00→1 (KEDA)1→N (HPA)

Replicas: 0

💤

Scaled to Zero

Inactive

Workload scaled to zero. KEDA is polling triggers at the configured interval.

ScaledObject configuration

triggers:
  - type: prometheus
    metadata:
      threshold: "100"              # HPA scales when metric > 100/replica
      activationThreshold: "50"    # KEDA activates when metric > 50

Cooldown Gotcha: The cooldownPeriod only applies when scaling to zero. When all triggers become inactive, KEDA starts a countdown timer. Only aftercooldownPeriod seconds (default 300) will it scale to zero. However, scaling from N replicas down to 1 is controlled by HPA's stabilization window, not KEDA's cooldownPeriod.

Trigger Configurations

Triggers (scalers) are the heart of KEDA—they connect your workloads to external event sources. Each trigger type knows how to query a specific system and return metric values for scaling decisions. Explore common production configurations below.

Select a trigger above to see configuration examples

Five Critical Production Gotchas

These are the gotchas that have caused the most debugging sessions and production incidents. Understanding them before deployment will save you significant time.

Pro tip: Before deploying KEDA ScaledObjects to production, run through this checklist and test each scenario in a staging environment. Most production incidents can be prevented by understanding these edge cases.

Production Deployment & Operations

For production environments, deploy KEDA via Helm with explicit resource requests. KEDA components are lightweight but scale with the number of ScaledObjects.

Production Helm values

operator:
  replicaCount: 2  # HA with leader election
  resources:
    requests:
      cpu: 100m
      memory: 100Mi
    limits:
      cpu: 1
      memory: 1000Mi
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values: [keda-operator]
          topologyKey: kubernetes.io/hostname

metricsServer:
  replicaCount: 1  # Cannot exceed 1 (single metrics provider)

prometheus:
  operator:
    enabled: true
    serviceMonitor:
      enabled: true

HA Limitation: While you can run 2 operator replicas withleader election for failover, the metrics server effectively runs as a singleton due to the external metrics API constraint.

Key Prometheus Metrics

Scaler Health

keda_scaler_active — 1 if active, 0 if inactive
keda_scaler_metrics_value — Current metric value
keda_scaler_detail_errors_total — Error count

System Health

keda_scaled_object_paused — Paused status
keda_scaled_object_errors_total — Errors
keda_internal_scale_loop_latency — Loop latency

Essential alerting rules

- alert: KEDAScalerErrors
  expr: rate(keda_scaler_detail_errors_total[5m]) > 0
  for: 5m
  annotations:
    summary: "KEDA scaler {{ $labels.scaler }} experiencing errors"

- alert: KEDAScalingLoopLatency
  expr: keda_internal_scale_loop_latency_seconds > 30
  for: 5m
  annotations:
    summary: "KEDA scaling loop delayed beyond threshold"

Compatibility & Troubleshooting

KEDA Version	Kubernetes Support	Notes
v2.17Latest	v1.30 - v1.32	• Latest stable release • Enhanced Prometheus scaler performance • New scalers added
v2.16	v1.29 - v1.31	• Improved fallback reliability • gRPC optimizations
v2.15	v1.28 - v1.30	• Workload Identity improvements • Strict trigger validation ⚠️ Removed Azure Pod Identity support ⚠️ Empty triggers: [] now fails validation
v2.14	v1.27 - v1.29	• New scalers for cloud platforms ⚠️ Removed metricName from trigger metadata ⚠️ Use trigger.name instead

Note: KEDA maintains N-2 Kubernetes version support with a 3-month release cadence. Always check the official documentation before upgrading.

ScaledJobs for Batch Workloads

ScaledJobs create Kubernetes Jobs rather than scaling replica counts, appropriate for batch processing where each job runs to completion.

ScaledJob for ML batch processing

apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: ml-batch-processor
spec:
  jobTargetRef:
    parallelism: 1
    completions: 1
    activeDeadlineSeconds: 600
    backoffLimit: 4
    template:
      spec:
        nodeSelector:
          nvidia.com/gpu.present: "true"
        containers:
        - name: ml-worker
          image: ml-inference:latest
          resources:
            limits:
              nvidia.com/gpu: "1"
              memory: "32Gi"
        restartPolicy: Never

  pollingInterval: 15
  successfulJobsHistoryLimit: 10
  failedJobsHistoryLimit: 5
  maxReplicaCount: 20

  scalingStrategy:
    strategy: "accurate"  # Best for queues without locked messages

  triggers:
  - type: rabbitmq
    metadata:
      queueName: ml-inference-queue
      mode: QueueLength
      value: "1"

default

maxScale - runningJobCount

accurate

maxScale - pendingJobCount (best for queues)

eager

Immediately scales to max when queue is non-zero

Conclusion

KEDA's architecture reflects thoughtful engineering decisions—delegating 1↔N scaling to HPA rather than reimplementing it, using gRPC for efficient internal communication, and providing a clean separation between activation (0↔1) and scaling (1↔N) concerns.

For production deployment, focus on understanding the two-phase model, configuring HPA behavior for scale-down control, implementing fallback configurations, and monitoring scaler health through the exposed Prometheus metrics. The scale-to-zero capability combined with event-driven triggers makes KEDA particularly valuable for batch processing, ML inference workloads, and cost-sensitive environments where idle resources should be eliminated entirely.

← Back to Articles

The Problem KEDA Solves

Scale to Zero

70+ Scalers

HPA Compatible

The Three-Component Architecture

Component Details

The Scaling Flow in Action

0 → 1 Scaling (KEDA)

1 → N Scaling (HPA)

The Two-Phase Scaling Model

Phase 1: Activation (0↔1)

Phase 2: Scaling (1↔N)

Interactive Threshold Simulator

Threshold Zones

Replicas: 0

Trigger Configurations

Azure Service Bus

Apache Kafka

Prometheus

PostgreSQL

Redis Lists

RabbitMQ

AWS SQS

Cron

Five Critical Production Gotchas

cooldownPeriod vs Stabilization Window

idleReplicaCount Only Supports Zero

Fallback Only Works with AverageValue

Kafka Scaling vs Partition Count

Silent Prometheus Query Errors

Production Deployment & Operations

Key Prometheus Metrics

Scaler Health

System Health

Compatibility & Troubleshooting

ScaledJobs for Batch Workloads

default

accurate

eager

Conclusion

Resources

Official Documentation

GitHub Repository

Sample Implementations

Community Slack