AI Platform Engineering & MLOps · Part 30 of 34

Dynamic Resource Allocation: What Changes When Devices Become First-Class

DRA reached GA in Kubernetes v1.34. Here is why it exists, what the resource.k8s.io API looks like, and the honest adoption caveats for today’s clusters.

14 min read·2 interactive components·8 references

Device Plugin (top) vs DRA (bottom) — the key difference is that the DRA scheduler evaluates CEL expressions against structured ResourceSlice data rather than an opaque integer counter.

For five years, the device-plugin model was how Kubernetes surfaces GPUs to workloads. A DaemonSet on each node registers an integer counter — say, nvidia.com/gpu: 4 — and the scheduler deducts from that count when a pod is placed. Simple to implement, simple to understand, and — at scale — demonstrably insufficient. Four structural ceilings are baked into the model by design, not accident.

Dynamic Resource Allocation (DRA)is the Kubernetes mechanism designed to replace that model. It reached stable (GA) in Kubernetes v1.34 — released August 2025 — under the resource.k8s.io/v1 API group, enabled by default with no feature gate required [1].

This article walks through why DRA exists, what the API looks like, how the scheduling path changes, and which parts of the ecosystem have caught up. If you have read the earlier articles in this series on gang scheduling and GPU sharing mechanisms, this is where those pieces connect to the Kubernetes roadmap.

The four ceilings of the device-plugin model

Understanding what DRA replaces is a prerequisite for understanding what it does. The device-plugin model hits four hard limits.

1. Integer-only requests

Extended resources must be whole integers. A pod requests nvidia.com/gpu: 1 or nvidia.com/gpu: 2 — never 0.5 or a memory-bound fraction. Workarounds like MIG slices or time-slicing expose fractional capacity by pre-advertising named integer resources (e.g. nvidia.com/mig-3g.40gb), but the granularity is fixed at configuration time. Workloads cannot negotiate at scheduling time — they must fit one of the pre-cut profiles.

2. No topology in the request

A pod requesting two GPUs cannot express “give me two GPUs connected by NVLinkon the same node.” That constraint must be encoded in node labels and affinity rules authored by the workload owner — who has to know the topology of the cluster ahead of time. The device plugin has no path to communicate fabric topology to the scheduler before allocation decisions are made.

3. Opaque allocation

The scheduler sees integer counts in Allocatable but not which physical devices are currently free. Cluster Autoscalercannot simulate device-plugin allocation to decide whether a pending pod would fit on a hypothetical new node — it has to guess or assume. KEP-4381's motivation section states explicitly that a goal of structured parameters is to allow the scheduler to “handle and Cluster Autoscaler to simulate claim allocation themselves without relying on a third-party driver” [2].

4. No dynamic repartitioning

Changing a MIG profilerequires draining the node, reconfiguring the partition, and restarting the plugin. Partitioning is static relative to the scheduling loop. A workload that needs a 2g.20gb slice on a node currently partitioned into 1g.10gb slices cannot be accommodated at scheduling time — the cluster admin must proactively match partition strategy to anticipated workload shapes, or lose utilisation.

A short history of two KEPs

The DRA initiative produced two distinct designs. KEP-3063 introduced the original “control-plane controller” model (alpha in Kubernetes 1.26): a third-party driver handled all claim allocation via API callbacks, keeping allocation logic entirely opaque to the scheduler. This preserved Cluster Autoscaler's inability to simulate allocations — the same structural problem as device plugins, moved one layer up. KEP-3063 was withdrawn as the primary path in Kubernetes 1.32 [3].

KEP-4381(“structured parameters”) reversed the roles: the driver publishes structured capability data into ResourceSlice objects in the API server, and the scheduler itself performs allocation using CEL expressionsfrom the claim. No driver callback at scheduling time — the scheduler can reason about devices entirely from first-party API objects. KEP-4381 shipped beta (v1beta1) in Kubernetes 1.32, added v1beta2 in 1.33, and reached stable as resource.k8s.io/v1 in Kubernetes 1.34 [2]. When documentation refers to “DRA” today, it means the KEP-4381 model.

The DRA object model: four API types

Four types compose the resource.k8s.io/v1 surface, each with a distinct role in the allocation lifecycle:

ResourceSlice — created and maintained by the DRA driver (typically a DaemonSet on each node). Describes the devices a node offers: their attributes (GPU model, memory, NVLink topology, MIG capability), driver reference, and pool membership. The scheduler reads ResourceSlice objects directly — no callback to a driver is needed to decide whether a device fits a claim.
DeviceClass — cluster-level object authored by the admin or installed by the driver. Acts as a selector: “any GPU managed by the gpu.nvidia.com driver”. Workloads reference a DeviceClass to avoid hard-coding driver names into job manifests.
ResourceClaim — the actual allocation request. Specifies what is needed (e.g. “one GPU with >= 40 GB memory from the nvidia-gpu DeviceClass”) using CEL selector expressions. The scheduler resolves the claim against available ResourceSlices and marks which device is reserved. A ResourceClaim can be shared across pods in a workload — for example, all workers in a distributed training job can reference the same claim.
ResourceClaimTemplate — a pod-level template that generates a per-pod ResourceClaim automatically, mirroring the PersistentVolumeClaim template pattern. Used when each pod needs its own exclusive device allocation.

A minimal DeviceClass and ResourceClaim look like this:

device-class.yaml

# resource.k8s.io/v1 — Kubernetes >= 1.34 (DRA GA, enabled by default)
apiVersion: resource.k8s.io/v1
kind: DeviceClass
metadata:
  name: nvidia-gpu
spec:
  selectors:
    - cel:
        expression: "device.driver == 'gpu.nvidia.com'"

resource-claim.yaml

# ResourceClaim — memory-bound fractional request (no integer ceiling)
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: training-gpu
  namespace: team-research
spec:
  devices:
    requests:
      - name: gpu
        deviceClassName: nvidia-gpu
        selectors:
          - cel:
              expression: "device.attributes['gpu.nvidia.com'].memory >= 40*1024*1024*1024"

pod-with-claim.yaml

# Pod referencing the claim — note spec.resourceClaims, not resources.limits
apiVersion: v1
kind: Pod
metadata:
  name: training-worker
spec:
  resourceClaims:
    - name: gpu-claim
      resourceClaimName: training-gpu
  containers:
    - name: trainer
      image: nvcr.io/nvidia/pytorch:25.03-py3
      resources:
        claims:
          - name: gpu-claim

How the scheduling path changes: device plugin vs DRA

The two models differ most at the point where the scheduler decides whether a pod fits a node. In the device-plugin model that decision is made with incomplete information; in DRA it is made with full structured visibility.

Device-plugin path (current)

1Node device plugin reports integer count to kubelet.
2Kubelet reflects count in Node.Status.Allocatable.
3Scheduler subtracts from integer counter during filter/score. No device attributes visible.
4Kubelet calls device plugin Allocate() at pod start; plugin selects specific device(s) — scheduler never learns which physical device was used.

DRA path (Kubernetes >= 1.34)

1DRA driver publishes ResourceSlice objects (per node, per device) with full attribute data into the API server.
2Workload author creates a ResourceClaim with CEL selectors expressing actual needs (memory, topology, MIG capability).
3Scheduler evaluates CEL expressions against ResourceSlicedata inline — no driver round-trip. Marks the specific device reserved in the ResourceClaim status.
4DRA kubelet plugin reads the resolved ResourceClaim and performs device setup (CDI annotations, environment variables) at pod start.
5Cluster Autoscalercan simulate step 3 against virtual ResourceSlice projections — enabling correct scale-out decisions for pending DRA workloads.

The structural improvement is that allocation decisions and the data needed to make them live in the same API layer. The scheduler is no longer reasoning about a proxy (an integer counter) for a physical reality it cannot see.

The step-through below walks through each stage of the DRA request lifecycle and contrasts it with the device-plugin equivalent at that exact point.

DRA Request Flow vs Device Plugin — step by step

Select each step to see how DRA and the device plugin differ at that point in the lifecycle.

1 · Workload request

DRAresource.k8s.io/v1

Pod spec includes spec.resourceClaims referencing a ResourceClaim with CEL selectors: e.g. "memory >= 40 GiB from nvidia-gpu DeviceClass". No hard-coded device path.

CEL selector — memory-bound, not integer-bound

Device Plugincurrent model

Pod spec includes resources.limits["nvidia.com/gpu"]: 1 — a whole integer. No memory constraint possible. Device selected by the plugin, not the scheduler.

Integer only — fixed granularity at config time

Step 1 of 5

What is stable in v1.34, what is still maturing

DRA has a layered maturity model. The core API stabilised in v1.34; subsequent releases graduated additional capabilities. The following reflects the Kubernetes v1.34 baseline and notes what v1.36 (released April 2026) moved to beta or GA [4]:

Capability	Status in v1.34	Status in v1.36
ResourceClaim, ResourceClaimTemplate, DeviceClass, ResourceSlice core objects	GA (stable)	GA (stable)
Basic CEL device selectors; per-claim and shared-claim allocation	GA (stable)	GA (stable)
ResourceClaim reference in pod specs	GA (stable)	GA (stable)
DRAConsumableCapacity (capacity-aware scheduling)	Alpha	Beta
KEP-4815 Partitionable Devices (sub-partition request at claim time)	Alpha	Beta (enabled by default)
Device Taints (taint individual devices in ResourceSlice)	Alpha	Beta
Admin Access (privileged claim for device-admin workloads)	Alpha	GA
Prioritized List (ordered device preference in a single claim)	Alpha	GA
Kueue DRAExtendedResources feature gate	Alpha (disabled by default)	Alpha (check release notes)
Volcano PodGroup DRA awareness	Not present	Not present

Ecosystem integration — alpha / in-progress as of v1.34: Kueue DRAExtendedResources feature gate (alpha, disabled by default in v1.34) for quota-aware scheduling of DRA workloads [6]. VolcanoPodGroup DRA awareness: not yet present in the current Volcano release — queue quota tracking is inaccurate for DRA claims.

The NVIDIA DRA driver: what is supported today

NVIDIA's DRA driver for Kubernetes is hosted at github.com/NVIDIA/k8s-dra-driver-gpu. The driver has two components: a ResourceSlice publisher (DaemonSet per node) that writes device attributes into the API server, and a kubelet plugin that handles device setup at pod start via CDI rather than the legacy device-path mounting used by device plugins [7].

The driver README is explicit: “While some GPU allocation features can be tried out, they are not yet officially supported.” The kubelet plugin is disabled by default even when the driver is bundled with recent GPU Operatorversions. The GPU Operator's Helm values must explicitly enable the DRA DaemonSet.

One feature that is officially supported in the NVIDIA DRA driver: the ComputeDomain feature for Multi-Node NVLink (MNNVL) topologies — configurations where multiple nodes are connected via NVLink fabric (e.g. GB200 NVL72 racks). This is the first production use-case for DRA on NVIDIA hardware because the device-plugin model has no mechanism to represent cross-node fabric membership at all [7].

CDI (Container Device Interface)is a prerequisite. DRA relies on CDI for device injection into container namespaces rather than legacy device-path bind-mounts. CDI must be enabled in the container runtime configuration, and the NVIDIA driver version must be >= 580. Verify CDI readiness against your GPU Operator release notes before enabling the DRA kubelet plugin — CDI configuration support varies across Kubernetes distributions and operator versions.

Coexistence: DRA and device plugin on the same cluster

DRA and the device plugincan run simultaneously on the same cluster — and on the same node. A node can advertise nvidia.com/gpu via the device plugin (for existing workloads using resources.limits) while also running the DRA kubelet plugin (for workloads using resourceClaims). Migration is per-workload, not per-cluster, which allows incremental adoption.

The integrations that break without modification during migration are predictable:

Gang-scheduling queue managers (e.g. Volcano PodGroup) count resources.limits[nvidia.com/gpu]; DRA claims in spec.resourceClaims are invisible to the quota engine — queue quotas will be inaccurate until the scheduler adds DRA awareness.
Quota-aware admission controllers that model GPU resources as extended resources (e.g. Kueue's ResourceFlavor) need the DRAExtendedResources feature gate enabled and tuned for DRA claims — the alpha feature gate is disabled by default in v1.34.
Helm charts with resources.limits["nvidia.com/gpu"] must be updated to use resourceClaims — the two request styles are not interchangeable.
nvidia-smi inside containers works with CDIdevice injection, but requires NVIDIA Driver >= 580 and CDI enabled in containerd. Legacy device-path mounting used by the device plugin is not used by the DRA driver.

Where DRA sits in the scheduling stack

DRAoperates at the hardware-abstraction layer — the same layer as the device plugin. It replaces the device plugin's role in advertising and allocating devices; it does not replace quota-aware admission (e.g. Kueue) or gang scheduling (e.g. Volcano). The scheduling stack described in the earlier articles in this series remains valid under DRA: a quota controller admits workloads; a gang scheduler coordinates multi-pod placement; DRA resolves and reserves the actual device. The integration seams need updating as the ecosystem catches up — see the caveats in the previous section — but the architectural separation of concerns is unchanged.

The one place DRA genuinely changes the stack is at the Cluster Autoscaler. Autoscaler scale-out decisions for GPU workloads have historically required heuristics or custom expanders because device-plugin allocation was opaque. With ResourceSlicedata available in the API, Autoscaler gains the same structured visibility the scheduler has — enabling correct bin-packing simulation when deciding whether to provision a new GPU node.

When to adopt: a pragmatic guide

DRA is GA in the API, but ecosystem readiness is not uniform. A useful frame for adoption decisions:

Hold: cluster on Kubernetes < 1.34, or production queue accounting depends on Volcano/Kueue

If your cluster runs Kubernetes 1.31 or 1.32 with GPU workloads managed through Volcano queues and Kueue quotas, stay on the device-plugin path. The stable nvidia.com/gpu extended-resource path is mature, the observability integration (DCGM exporter→ Prometheus) works cleanly with it, and introducing DRA before the queue managers gain DRA awareness creates quota accounting blind spots that are hard to debug.

Pilot: cluster on Kubernetes >= 1.34, evaluating NVLink topologies or inference bin-packing

Pilot DRA when your cluster runs Kubernetes >= 1.34 and one of the following applies:

You are evaluating multi-node NVLink topologies (e.g. GB200 NVL72) where DRA's ComputeDomain feature— the first officially supported NVIDIA DRA feature — addresses a gap the device plugin cannot fill.
You are running inference workloads where memory-bound fractional requests would meaningfully improve bin-packing over fixed MIG profiles— and Kubernetes >= 1.36's Partitionable Devices beta is available.
You are building a new workload class with no existing Helm-chart debt and can design for resourceClaims from the start.

In any pilot: run the device plugin in parallel on the same nodes, monitor quota accounting gaps in your queue manager, and verify CDI configuration on your container runtime before enabling the DRA kubelet plugin.

The explorer below shows what changes and what stays the same at each layer of the GPU scheduling stack when you adopt DRA.

Before / After DRA — layer-by-layer comparison

Select a layer to see what changes and what stays the same.

BeforeExtended resources on Node

Device plugin DaemonSet writes integer count to Node.Status.Allocatable
Pod requests resources.limits["nvidia.com/gpu"]: 1
No structured attribute data visible to the scheduler
No per-device audit trail in Kubernetes API

After DRAresource.k8s.io/v1 objects

DRA driver DaemonSet writes ResourceSlice (per device, per node) with full attributes
DeviceClass selects devices by driver name or capability expression
ResourceClaim specifies requirements as CEL expressions (memory, topology)
ResourceClaimTemplate generates per-pod claims automatically
Allocation status recorded in ResourceClaim — auditable in the API

Note: Helm charts using resources.limits must be migrated to resourceClaims — the two styles are not interchangeable.

Honest caveats: what DRA does not yet do

No on-demand MIG repartitioning at claim time

DRA advertises current MIG profiles in ResourceSlice, but dynamically creating a new MIG partition in response to an incoming claim is not part of the stable v1.34 API. Partitionable Devices (KEP-4815) is the mechanism that will eventually enable this; it was beta (enabled by default) in v1.36. MIG profiles still must be pre-configured at the node level for the stable path [5].

GPU allocation is not yet production-supported by NVIDIA

The NVIDIA DRA driver README states explicitly that general GPU allocation features are not yet officially supported. ComputeDomain for MNNVLis the exception. Monitor the driver's release notes — not blog posts — before committing production workloads [7].

Queue manager integration is partial

Kueue's DRA integration (DRAExtendedResources) was alpha and disabled by default in v1.34. Volcano's PodGroup quota accounting does not count DRA claims in the current release. Running DRA workloads through these queue managers without explicit DRA integration produces quota accounting gaps.

DCGM observability is not DRA-aware

DCGM exporterreports per-device GPU utilisation metrics regardless of how the device was allocated — device plugin or DRA. However, correlating a DCGM metric to a specific ResourceClaim (rather than to a pod via extended-resource labels) requires updated dashboards and metric labelling. The existing Prometheus integration works, but claim-level attribution needs manual work.

References

[1] Kubernetes Blog — “Kubernetes v1.34: DRA has graduated to GA” (September 2025). kubernetes.io/blog
[2] Kubernetes SIG-Node — KEP-4381: DRA Structured Parameters README (motivation, design, graduation criteria). kubernetes/enhancements on GitHub. github.com/kubernetes/enhancements
[3] Kubernetes SIG-Node — KEP-3063: Dynamic Resource Allocation (control-plane controller model, withdrawn as primary path in 1.32). kubernetes/enhancements issue tracker. github.com/kubernetes/enhancements/issues/3063
[4] Kubernetes Blog — “Kubernetes v1.36 Release” (April 2026). DRA Admin Access GA, Prioritized List GA, Partitionable Devices beta, Device Taints beta, Consumable Capacity beta noted in release highlights. kubernetes.io/blog
[5] Kubernetes SIG-Node — KEP-4815: Partitionable Devices (alpha in 1.33, beta in 1.36). kubernetes/enhancements issue tracker. github.com/kubernetes/enhancements/issues/4815
[6] Kueue — “Dynamic Resource Allocation” concepts page. kueue.sigs.k8s.io. (DRAExtendedResources feature gate, integration architecture.) kueue.sigs.k8s.io
[7] NVIDIA — k8s-dra-driver-gpu repository. GitHub. (ComputeDomain official support; GPU allocation not yet officially supported; kubelet plugin disabled by default.) github.com/NVIDIA/k8s-dra-driver-gpu
[8] Kubernetes Documentation — “Dynamic Resource Allocation” concept page (updated for v1 API in Kubernetes 1.34). kubernetes.io/docs

Continue the Journey

AI Platform