Dynamic Resource Allocation: What Changes When Devices Become First-Class

·9 min read·asleekgeek
Abstract diagram showing Kubernetes scheduling pathways for GPU resources with structured device claims

Device-plugin vs DRA scheduling paths — the allocation model shifts from opaque integer counters to structured, scheduler-visible claims.

For five years, the device-plugin model was how Kubernetes surfaces GPUs to workloads. A DaemonSet on each node registers an integer counter — say, nvidia.com/gpu: 4 — and the scheduler deducts from that count when a pod is placed. Simple to implement, simple to understand, and — at scale — demonstrably insufficient. Four structural ceilings are baked into the model by design, not accident.

Dynamic Resource Allocation (DRA) is the Kubernetes mechanism designed to replace that model. It reached stable (GA) in Kubernetes v1.34 — released August 2025 — under the resource.k8s.io/v1 API group, enabled by default with no feature gate required [1].

This article walks through why DRA exists, what the API looks like, how the scheduling path changes, and which parts of the ecosystem have caught up. If you have read the earlier articles in this series on gang scheduling and GPU sharing mechanisms, this is where those pieces connect to the Kubernetes roadmap.

The four ceilings of the device-plugin model

Understanding what DRA replaces is a prerequisite for understanding what it does. The device-plugin model hits four hard limits.

1. Integer-only requests

Extended resources must be whole integers. A pod requests nvidia.com/gpu: 1 or nvidia.com/gpu: 2 — never 0.5 or a memory-bound fraction. Workarounds like MIG slices or time-slicing expose fractional capacity by pre-advertising named integer resources (e.g. nvidia.com/mig-3g.40gb), but the granularity is fixed at configuration time. Workloads cannot negotiate at scheduling time — they must fit one of the pre-cut profiles.

2. No topology in the request

A pod requesting two GPUs cannot express "give me two GPUs connected by NVLink on the same node." That constraint must be encoded in node labels and affinity rules authored by the workload owner — who has to know the topology of the cluster ahead of time. The device plugin has no path to communicate fabric topology to the scheduler before allocation decisions are made.

3. Opaque allocation

The scheduler sees integer counts in Allocatable but not which physical devices are currently free. Cluster Autoscaler cannot simulate device-plugin allocation to decide whether a pending pod would fit on a hypothetical new node — it has to guess or assume. KEP-4381's motivation section states explicitly that a goal of structured parameters is to allow the scheduler to "handle and Cluster Autoscaler to simulate claim allocation themselves without relying on a third-party driver" [2].

4. No dynamic repartitioning

Changing a MIG profile requires draining the node, reconfiguring the partition, and restarting the plugin. Partitioning is static relative to the scheduling loop. A workload that needs a 2g.20gb slice on a node currently partitioned into 1g.10gb slices cannot be accommodated at scheduling time — the cluster admin must proactively match partition strategy to anticipated workload shapes, or lose utilisation.

A short history of two KEPs

The DRA initiative produced two distinct designs. KEP-3063 introduced the original "control-plane controller" model (alpha in Kubernetes 1.26): a third-party driver handled all claim allocation via API callbacks, keeping allocation logic entirely opaque to the scheduler. This preserved Cluster Autoscaler's inability to simulate allocations — the same structural problem as device plugins, moved one layer up. KEP-3063 was withdrawn as the primary path in Kubernetes 1.32 [3].

KEP-4381 ("structured parameters") reversed the roles: the driver publishes structured capability data into ResourceSlice objects in the API server, and the scheduler itself performs allocation using CEL expressions from the claim. No driver callback at scheduling time — the scheduler can reason about devices entirely from first-party API objects. KEP-4381 shipped beta (v1beta1) in Kubernetes 1.32, added v1beta2 in 1.33, and reached stable as resource.k8s.io/v1 in Kubernetes 1.34 [2]. When documentation refers to "DRA" today, it means the KEP-4381 model.

The DRA object model: four API types

Four types compose the resource.k8s.io/v1 surface, each with a distinct role in the allocation lifecycle:

  • ResourceSlice — created and maintained by the DRA driver (typically a DaemonSet on each node). Describes the devices a node offers: their attributes (GPU model, memory, NVLink topology, MIG capability), driver reference, and pool membership. The scheduler reads ResourceSlice objects directly — no callback to a driver is needed to decide whether a device fits a claim.
  • DeviceClass — cluster-level object authored by the admin or installed by the driver. Acts as a selector: "any GPU managed by the gpu.nvidia.com driver". Workloads reference a DeviceClass to avoid hard-coding driver names into job manifests.
  • ResourceClaim — the actual allocation request. Specifies what is needed (e.g. "one GPU with >= 40 GB memory from the nvidia-gpu DeviceClass") using CEL selector expressions. The scheduler resolves the claim against available ResourceSlices and marks which device is reserved. A ResourceClaim can be shared across pods in a workload — for example, all workers in a distributed training job can reference the same claim.
  • ResourceClaimTemplate — a pod-level template that generates a per-pod ResourceClaim automatically, mirroring the PersistentVolumeClaim template pattern. Used when each pod needs its own exclusive device allocation.

A minimal DeviceClass and ResourceClaim look like this:

device-class.yaml
# resource.k8s.io/v1 — Kubernetes >= 1.34 (DRA GA, enabled by default)
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: nvidia-gpu
spec:
  selectors:
    - cel:
        expression: "device.driver == 'gpu.nvidia.com'"
resource-claim.yaml
# ResourceClaim — memory-bound fractional request (no integer ceiling)
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: training-gpu
  namespace: team-research
spec:
  devices:
    requests:
      - name: gpu
        deviceClassName: nvidia-gpu
        selectors:
          - cel:
              expression: "device.attributes['gpu.nvidia.com'].memory >= 40*1024*1024*1024"
pod-with-claim.yaml
# Pod referencing the claim — note spec.resourceClaims, not resources.limits
apiVersion: v1
kind: Pod
metadata:
  name: training-worker
spec:
  resourceClaims:
    - name: gpu-claim
      resourceClaimName: training-gpu
  containers:
    - name: trainer
      image: nvcr.io/nvidia/pytorch:25.03-py3
      resources:
        claims:
          - name: gpu-claim

How the scheduling path changes: device plugin vs DRA

The two models differ most at the point where the scheduler decides whether a pod fits a node. In the device-plugin model that decision is made with incomplete information; in DRA it is made with full structured visibility.

Device-plugin path (current):

  1. Node device plugin reports integer count to kubelet.
  2. Kubelet reflects count in Node.Status.Allocatable.
  3. Scheduler subtracts from integer counter during filter/score. No device attributes visible.
  4. Kubelet calls device plugin Allocate() at pod start; plugin selects specific device(s) — scheduler never learns which physical device was used.

DRA path (Kubernetes >= 1.34):

  1. DRA driver publishes ResourceSlice objects (per node, per device) with full attribute data into the API server.
  2. Workload author creates a ResourceClaim with CEL selectors expressing actual needs (memory, topology, MIG capability).
  3. Scheduler evaluates CEL expressions against ResourceSlice data inline — no driver round-trip. Marks the specific device reserved in the ResourceClaim status.
  4. DRA kubelet plugin reads the resolved ResourceClaim and performs device setup (CDI annotations, environment variables) at pod start.
  5. Cluster Autoscaler can simulate step 3 against virtual ResourceSlice projections — enabling correct scale-out decisions for pending DRA workloads.

The structural improvement is that allocation decisions and the data needed to make them live in the same API layer. The scheduler is no longer reasoning about a proxy (an integer counter) for a physical reality it cannot see.

What is stable in v1.34, what is still maturing

DRA has a layered maturity model. The core API stabilised in v1.34; subsequent releases graduated additional capabilities. The following reflects the Kubernetes v1.34 baseline and notes what v1.36 (released April 2026) moved to beta or GA [4]:

The NVIDIA DRA driver: what is supported today

NVIDIA's DRA driver for Kubernetes is hosted at github.com/NVIDIA/k8s-dra-driver-gpu. The driver has two components: a ResourceSlice publisher (DaemonSet per node) that writes device attributes into the API server, and a kubelet plugin that handles device setup at pod start via CDI rather than the legacy device-path mounting used by device plugins [7].

The driver README is explicit: "While some GPU allocation features can be tried out, they are not yet officially supported." The kubelet plugin is disabled by default even when the driver is bundled with recent GPU Operator versions. The GPU Operator's Helm values must explicitly enable the DRA DaemonSet.

One feature that is officially supported in the NVIDIA DRA driver: the ComputeDomain feature for Multi-Node NVLink (MNNVL) topologies — configurations where multiple nodes are connected via NVLink fabric (e.g. GB200 NVL72 racks). This is the first production use-case for DRA on NVIDIA hardware because the device-plugin model has no mechanism to represent cross-node fabric membership at all [7].

CDI (Container Device Interface) is a prerequisite. DRA relies on CDI for device injection into container namespaces rather than legacy device-path bind-mounts. CDI must be enabled in the container runtime configuration, and the NVIDIA driver version must be >= 580. Verify CDI readiness against your GPU Operator release notes before enabling the DRA kubelet plugin — CDI configuration support varies across Kubernetes distributions and operator versions.

Coexistence: DRA and device plugin on the same cluster

DRA and the device plugin can run simultaneously on the same cluster — and on the same node. A node can advertise nvidia.com/gpu via the device plugin (for existing workloads using resources.limits) while also running the DRA kubelet plugin (for workloads using resourceClaims). Migration is per-workload, not per-cluster, which allows incremental adoption.

The integrations that break without modification during migration are predictable:

  • Gang-scheduling queue managers (e.g. Volcano PodGroup) count resources.limits[nvidia.com/gpu]; DRA claims in spec.resourceClaims are invisible to the quota engine — queue quotas will be inaccurate until the scheduler adds DRA awareness.
  • Quota-aware admission controllers that model GPU resources as extended resources (e.g. Kueue's ResourceFlavor) need the DRAExtendedResources feature gate enabled and tuned for DRA claims — the alpha feature gate is disabled by default in v1.34.
  • Helm charts with resources.limits["nvidia.com/gpu"] must be updated to use resourceClaims — the two request styles are not interchangeable.
  • nvidia-smi inside containers works with CDI device injection, but requires NVIDIA Driver >= 580 and CDI enabled in containerd. Legacy device-path mounting used by the device plugin is not used by the DRA driver.

Where DRA sits in the scheduling stack

DRA operates at the hardware-abstraction layer — the same layer as the device plugin. It replaces the device plugin's role in advertising and allocating devices; it does not replace quota-aware admission (e.g. Kueue) or gang scheduling (e.g. Volcano). The scheduling stack described in the earlier articles in this series remains valid under DRA: a quota controller admits workloads; a gang scheduler coordinates multi-pod placement; DRA resolves and reserves the actual device. The integration seams need updating as the ecosystem catches up — see the caveats in the previous section — but the architectural separation of concerns is unchanged.

The one place DRA genuinely changes the stack is at the Cluster Autoscaler. Autoscaler scale-out decisions for GPU workloads have historically required heuristics or custom expanders because device-plugin allocation was opaque. With ResourceSlice data available in the API, Autoscaler gains the same structured visibility the scheduler has — enabling correct bin-packing simulation when deciding whether to provision a new GPU node.

When to adopt: a pragmatic guide

DRA is GA in the API, but ecosystem readiness is not uniform. A useful frame for adoption decisions:

Hold: cluster on Kubernetes < 1.34, or production queue accounting depends on Volcano/Kueue

If your cluster runs Kubernetes 1.31 or 1.32 with GPU workloads managed through Volcano queues and Kueue quotas, stay on the device-plugin path. The stable nvidia.com/gpu extended-resource path is mature, the observability integration (DCGM exporter → Prometheus) works cleanly with it, and introducing DRA before the queue managers gain DRA awareness creates quota accounting blind spots that are hard to debug.

Pilot: cluster on Kubernetes >= 1.34, evaluating NVLink topologies or inference bin-packing

Pilot DRA when your cluster runs Kubernetes >= 1.34 and one of the following applies:

  • You are evaluating multi-node NVLink topologies (e.g. GB200 NVL72) where DRA's ComputeDomain feature — the first officially supported NVIDIA DRA feature — addresses a gap the device plugin cannot fill.
  • You are running inference workloads where memory-bound fractional requests would meaningfully improve bin-packing over fixed MIG profiles — and Kubernetes >= 1.36's Partitionable Devices beta is available.
  • You are building a new workload class with no existing Helm-chart debt and can design for resourceClaims from the start.

In any pilot: run the device plugin in parallel on the same nodes, monitor quota accounting gaps in your queue manager, and verify CDI configuration on your container runtime before enabling the DRA kubelet plugin.

Honest caveats: what DRA does not yet do

  • No on-demand MIG repartitioning at claim time. DRA advertises current MIG profiles in ResourceSlice, but dynamically creating a new MIG partition in response to an incoming claim is not part of the stable v1.34 API. Partitionable Devices (KEP-4815) is the mechanism that will eventually enable this; it was beta (enabled by default) in v1.36. MIG profiles still must be pre-configured at the node level for the stable path [5].
  • GPU allocation is not yet production-supported by NVIDIA. The NVIDIA DRA driver README states explicitly that general GPU allocation features are not yet officially supported. ComputeDomain for MNNVL is the exception. Monitor the driver's release notes — not blog posts — before committing production workloads [7].
  • Queue manager integration is partial. Kueue's DRA integration (DRAExtendedResources) was alpha and disabled by default in v1.34. Volcano's PodGroup quota accounting does not count DRA claims in the current release. Running DRA workloads through these queue managers without explicit DRA integration produces quota accounting gaps.
  • DCGM observability is not DRA-aware. DCGM exporter reports per-device GPU utilisation metrics regardless of how the device was allocated — device plugin or DRA. However, correlating a DCGM metric to a specific ResourceClaim (rather than to a pod via extended-resource labels) requires updated dashboards and metric labelling. The existing Prometheus integration works, but claim-level attribution needs manual work.

References

  1. Kubernetes Blog — "Kubernetes v1.34: DRA has graduated to GA" (September 2025).
  2. Kubernetes SIG-Node — KEP-4381: DRA Structured Parameters README (motivation, design, graduation criteria). kubernetes/enhancements on GitHub.
  3. Kubernetes SIG-Node — KEP-3063: Dynamic Resource Allocation (control-plane controller model, withdrawn as primary path in 1.32). kubernetes/enhancements issue tracker.
  4. Kubernetes Blog — "Kubernetes v1.36 Release" (April 2026). DRA Admin Access GA, Prioritized List GA, Partitionable Devices beta, Device Taints beta, Consumable Capacity beta noted in release highlights.
  5. Kubernetes SIG-Node — KEP-4815: Partitionable Devices (alpha in 1.33, beta in 1.36). kubernetes/enhancements issue tracker.
  6. Kueue — "Dynamic Resource Allocation" concepts page. kueue.sigs.k8s.io. (DRAExtendedResources feature gate, integration architecture.)
  7. NVIDIA — k8s-dra-driver-gpu repository. GitHub. (ComputeDomain official support; GPU allocation not yet officially supported; kubelet plugin disabled by default.)
  8. Kubernetes Documentation — "Dynamic Resource Allocation" concept page (updated for v1 API in Kubernetes 1.34).

Tags

#dra#kubernetes#gpu#series:ai-platform-mlops#series-order/30

About the Author

asleekgeek

asleekgeek

Senior Developer, Architect, DevOps

Owner and main author "ASleekGeek website" #husband #father #software-developer #geek #reader-of-all-things #food-lover #mufc-fan #aspiring-guitarist

Thanks for reading! Explore more articles.

Back to Articles