MIG configuration — profile selection, fragmentation, and the partition contract

·9 min read·asleekgeek
GPU chip diagram showing MIG partition slices across compute and memory

MIG partitions a single GPU into hardware-isolated instances with independent SM, memory, and cache allocations.

Multi-Instance GPU (MIG) is the only GPU-sharing mechanism that provides true hardware isolation — isolated SMs, isolated memory bandwidth, isolated L2 cache, and bounded NVLink bandwidth per partition. That isolation is not free: profiles are configured statically, the geometry of which profiles can coexist on a single card is strictly constrained, and moving a GPU in or out of MIG mode disrupts every workload on that node. This article is the operational guide for choosing profiles, avoiding fragmentation, managing strategy transitions, and observing per-partition health. For the decision between MIG, MPS, time-slicing, and fractional GPU mechanisms, see the GPU sharing decision tree (article 27 in this series).

MIG is supported on A100 (both 40 GB and 80 GB variants), H100, and the newer B200 and GB200 hardware. On GPU generations without MIG hardware support — T4, A10, L4 — time-slicing or MPS are the only multi-tenant options. Verify your GPU SKU in the NVIDIA MIG User Guide before designing a partitioning policy around MIG.

Profile geometry and the slice model

NVIDIA publishes the full profile table for each GPU SKU in the MIG User Guide [1]. Profile names encode both compute fraction and memory allocation: the pattern is Xg.YYgb, where X is the number of compute (SM) slices and YY is the framebuffer in gigabytes. Understanding the slice model is prerequisite to understanding fragmentation.

The A100 has 7 compute slices and 8 memory slices (one memory slice is reserved for GPU management, leaving 7 user-allocable). Profile allocation consumes fixed multiples of these slices: a 1g.10gb consumes 1 compute slice and 1 memory slice; a 3g.40gb consumes 3 compute slices and 4 memory slices. Profiles must pack without overlap — you cannot create an instance that spans a slice already claimed by another [1, 2].

A100 40 GB profile table

The A100 40 GB has 8 memory slices of 5 GB each and 7 compute slices (108 total SMs). Representative profiles:

A100 40 GB — representative MIG profiles
Profile      Framebuffer  SMs (of 108)  Max instances  Typical workload
1g.5gb       5 GB         14            7              Tiny inference, embedding servers
2g.10gb      10 GB        28            2              Small inference, 7B model at int8
3g.20gb      20 GB        42            1              Mid-size inference or LoRA fine-tuning
4g.20gb      20 GB        56            1              Wider compute, same memory as 3g
7g.40gb      40 GB        98            1              Full GPU — training, largest inference

A100 80 GB profile table

A100 80 GB — representative MIG profiles
Profile        Framebuffer  SMs (of 108)  Max instances  Typical workload
1g.10gb        10 GB        14            7              Tiny inference, feature extractors
2g.20gb        20 GB        28            3              7B LLM at fp16, batch ≤ 8
3g.40gb        40 GB        42            2              13B LLM at int8; LoRA fine-tuning
4g.40gb        40 GB        56            1              Wider compute, same memory as 3g
7g.80gb        80 GB        98            1              Full GPU — distributed training
1g.10gb+me     10 GB        14            1              Media-engine variant

H100 SXM 80 GB profile table

H100 SXM shares the same 7-slice memory geometry as the A100 80 GB but increases total SM count to 132. Profile names are identical in shape to the A100 80 GB table. Per-instance throughput is substantially higher due to the newer architecture and increased NVLink bandwidth — the H100 architecture whitepaper [3] gives detailed comparative bandwidth figures. The mig-user-guide [1] lists exact SM allocations per profile which may vary by driver revision.

H100 SXM 80 GB — representative MIG profiles
Profile      Framebuffer  SMs (of 132)  Max instances
1g.10gb      10 GB        16            7
2g.20gb      20 GB        32            3
3g.40gb      40 GB        48            2
4g.40gb      40 GB        64            1
7g.80gb      80 GB        112           1

H100 PCIe (80 GB) supports the same profile set but has lower inter-GPU bandwidth than the SXM variant. For NVLink-dependent distributed training across partitions, H100 SXM nodes are preferable.

B200 and GB200

The B200 and GB200 GPUs support MIG with up to 7 instances. The 1g.23gb profile provides 7 equal partitions per GPU, confirmed in the GPU Operator MIG manager ConfigMap assets [4]. The all-1g.23gb configuration in the GPU Operator deploys 7 such instances per GPU on GB200 hardware. The full B200 profile list is maintained in the NVIDIA MIG User Guide [1].

Profile selection — the two-variable decision

Profile selection reduces to two independent constraints: framebuffer headroom and SM width. Get either one wrong and you pay in OOM crashes or throughput loss respectively.

Framebuffer headroom is non-negotiable. Model weights plus KV cache plus activations must fit entirely in the instance's framebuffer. A 7B parameter model in fp16 requires roughly 14 GB for weights alone, meaning a 1g.10gb partition will OOM before serving a single request. A 2g.20gb gives 6 GB headroom for KV cache. Practical sizing target: provision framebuffer at approximately 1.5× the peak resident model footprint measured with DCGM_FI_DEV_FB_USED under production-representative load.

SM width determines throughput. An inference server handling batched requests benefits from a wider partition (3g.40gb or 4g.40gb). A small embedding endpoint serving single requests wastes SMs on a 3g.40gb; a 1g.10gb is sufficient and releases capacity for other tenants.

Concrete profile recipe per workload type (A100 80 GB basis):

Profile selection guide — A100 80 GB
Workload                                  Profile      Reasoning
Embedding / feature extraction (batch=1)  1g.10gb      Memory abundant; SM width wasted on batch=1
7B LLM inference fp16, batch ≤ 8          2g.20gb      20 GB framebuffer + KV cache headroom
13B LLM inference int8, batch ≤ 16        3g.40gb      ~13 GB weights + activation headroom
70B LLM inference int4, batch ≤ 4         7g.80gb      Full GPU; quantised 70B just fits
LoRA fine-tuning, 7B model                3g.40gb      Gradient checkpointing needs headroom
Distributed training (DDP, ZeRO)          7g.80gb      Never share; training saturates the GPU
Avoid running distributed training frameworks (e.g. PyTorch DDP, DeepSpeed ZeRO) on a fractional MIG instance. The reduced SM count and constrained NVLink bandwidth typically cause significant throughput degradation that batch-size tuning alone is unlikely to recover. Training jobs should land on whole-GPU nodes or on 7g.80gb profiles when MIG mode is mandatory for operational reasons.

Packing geometry and the fragmentation trap

MIG instances are allocated from a fixed pool of memory slices and compute slices per GPU. Because profiles consume fixed multiples of these slices, not all profile combinations fit on one card. The NVIDIA MIG User Guide §MIG Instance Placement [1] defines the valid placements. The fragmentation trap emerges from one combination almost every team encounters first.

The classic stranded-capacity scenario on A100 80 GB: two 3g.40gb instances each consume 4 of 8 memory slices (40 GB each) and 3 of 7 compute slices. Running two of them exhausts all 8 memory slices and uses 6 of 7 compute slices. The remaining 1 compute slice cannot host any additional instance because all memory is already allocated. The 2g.20gb profile requires 2 memory slices — impossible to fit. The 1g.10gb profile requires 1 memory slice — also impossible. One-seventh of the compute capacity is stranded and cannot be reclaimed without evicting and re-partitioning. Multiple sources independently document this geometry behaviour [1, 2].

Valid packing combinations for the A100 80 GB (representative):

A100 80 GB — valid packing combinations
Combination                        Instances  Memory slices  Compute slices
7 × 1g.10gb                        7          7 of 8         7 of 7   # fully efficient
3 × 2g.20gb + 1 × 1g.10gb         4          7 of 8         7 of 7   # fully efficient
2 × 3g.40gb + 1 × 1g.10gb         —          9 of 8         INVALID  # exceeds memory budget
2 × 3g.40gb                        2          8 of 8         6 of 7   # 1 compute slice stranded
1 × 4g.40gb + 1 × 3g.40gb         2          8 of 8         7 of 7   # fully efficient
1 × 7g.80gb                        1          8 of 8         7 of 7   # full GPU, no sharing

The operational lesson is to standardise on a single profile per node pool where feasible. A pool of 7 × 1g.10gb instances is predictable, schedules cleanly, and never produces fragmentation. A pool of mixed profiles requires manual geometry tracking and will strand capacity as workloads arrive and depart in uncontrolled order.

When mixed profiles are genuinely required — for example, one tenant needs a 3g.40gb for a 13B model while another needs 1g.10gb for an embedding endpoint on the same node — use mig.strategy=mixed and configure explicit scheduler labels (e.g. Kueue ResourceFlavor) so the scheduler can distinguish nodes by their active geometry. Without scheduler-side labels, jobs will fail to bind to nodes because the resource type (nvidia.com/mig-3g.40gb vs nvidia.com/mig-1g.10gb) does not auto-match.

Strategy modes and the GPU Operator MIG manager

The NVIDIA GPU Operator includes a MIG manager component that reconciles the GPU's partition state against a desired configuration expressed as a Kubernetes ConfigMap. Two strategy modes are exposed via the mig.strategy Helm value [5].

Single strategy

mig.strategy=single partitions every GPU on the node into instances of the same profile. The device plugin exposes a single resource type (e.g. nvidia.com/mig-1g.10gb). This is the operationally safest mode: no fragmentation risk, homogeneous scheduling, and straightforward quota configuration.

GPU Operator Helm values — single strategy
mig:
  strategy: single

devicePlugin:
  config:
    name: mig-parted-config
    default: "all-1g.10gb"
mig-parted-config ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: mig-parted-config
  namespace: gpu-operator
data:
  config.yaml: |
    version: v1
    mig-configs:
      all-1g.10gb:
        - devices: all
          mig-enabled: true
          mig-devices:
            "1g.10gb": 7
      all-3g.40gb:
        - devices: all
          mig-enabled: true
          mig-devices:
            "3g.40gb": 2

Mixed strategy

mig.strategy=mixed exposes multiple resource types simultaneously from one node. Pods must request the exact MIG resource type they need:

Pod spec — requesting a specific MIG partition
resources:
  limits:
    nvidia.com/mig-3g.40gb: "1"

Mixed mode increases operational complexity. The scheduler must track per-GPU geometry state, and fragmentation can accumulate as workloads with different profile requirements arrive in an unfavorable order. Reserve mixed strategy for node pools where the profile mix is known, stable, and modelled in advance.

Strategy transitions and the drain requirement

The NVIDIA GPU Operator documentation explicitly states that the node must be free (drained) of GPU workloads before any reconfiguration is triggered [5]. The MIG manager automates steps two through four of the transition sequence — it watches the nvidia.com/mig.config node label and applies reconfiguration when that label changes. Draining GPU workloads before relabelling is your responsibility.

MIG reconfiguration — cordon, drain, relabel
# 1. Cordon the node so no new GPU pods are scheduled
kubectl cordon <gpu-node>

# 2. Drain existing GPU workloads
kubectl drain <gpu-node> --ignore-daemonsets --delete-emptydir-data

# 3. Apply the new MIG ConfigMap (if the profile set is changing)
kubectl apply -f mig-parted-config.yaml

# 4. Relabel the node — this triggers the MIG manager
kubectl label node <gpu-node> nvidia.com/mig.config=all-3g.40gb --overwrite

# 5. Watch the MIG manager pod logs to confirm reconciliation
kubectl logs -n gpu-operator -l app=mig-manager -f

# 6. Uncordon once the manager completes
kubectl uncordon <gpu-node>

Expect a node-level GPU outage of two to five minutes during reconfiguration. MIG strategy transitions are disruptive by design — they require the driver to reset GPU state. Factor this into change-management windows and ensure workloads running on the node have checkpoint or retry capabilities before you trigger a transition.

Per-partition observability with DCGM

DCGM (Data Center GPU Manager) exposes MIG-aware metrics when MIG is enabled. Each MIG instance appears as a distinct entity in the metrics stream, identified by a GPU instance ID and tagged with the profile name. MIG support was introduced in the dcgm-exporter v2.4.0-rc.2 release and is documented in the NVIDIA DCGM User Guide MIG field ID support section [6].

DCGM labels each metric series with gpu (physical GPU index), GPU_I_ID (MIG instance ID), and GPU_I_PROFILE (profile name, e.g. 1g.10gb). Key metrics for per-partition observability:

Key DCGM metrics for MIG partitions
Metric                          Description                          Scope
DCGM_FI_DEV_GPU_UTIL            SM utilisation (%)                   Per-instance
DCGM_FI_PROF_GR_ENGINE_ACTIVE   Graphics/compute engine active frac  Per-instance (higher precision)
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE Tensor core active fraction          Per-instance
DCGM_FI_DEV_FB_USED             Framebuffer in use (MiB)             Per-instance
DCGM_FI_DEV_FB_FREE             Framebuffer free (MiB)               Per-instance
DCGM_FI_DEV_POWER_USAGE         Power draw (W)                       GPU-level only
DCGM_FI_DEV_XID_ERRORS          Hardware fault counter               GPU-level only

Note that DCGM_FI_PROF_GR_ENGINE_ACTIVE provides higher precision per-instance utilisation data than DCGM_FI_DEV_GPU_UTIL in MIG mode [6]. For tenant utilisation dashboards, prefer GR_ENGINE_ACTIVE as the primary signal.

A Prometheus recording rule that aggregates per-profile utilisation across a node pool (using a Prometheus-compatible recording rule format — substitute your own metrics pipeline equivalent if you use a different backend):

PrometheusRule — per-MIG-profile utilisation summary
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: mig-utilisation-summary
  namespace: gpu-operator
spec:
  groups:
    - name: mig.utilisation
      rules:
        - record: mig_instance:gr_engine_active:avg5m
          expr: |
            avg by (node, GPU_I_PROFILE) (
              avg_over_time(DCGM_FI_PROF_GR_ENGINE_ACTIVE{GPU_I_PROFILE!=""}[5m])
            )
        - record: mig_instance:fb_used_mib:max5m
          expr: |
            max by (node, GPU_I_PROFILE) (
              max_over_time(DCGM_FI_DEV_FB_USED{GPU_I_PROFILE!=""}[5m])
            )

GPU Feature Discovery (GFD) propagates the active MIG strategy and profile counts into node labels, making them queryable by the scheduler and visible in node inspection:

Inspect GFD MIG labels on a node
# Confirm GFD labels reflect the current MIG configuration
kubectl get node <gpu-node> -o json \
  | jq '.metadata.labels | with_entries(select(.key | startswith("nvidia.com/mig")))'

# Expected output for a 7 × 1g.10gb node:
# {
#   "nvidia.com/mig.capable": "true",
#   "nvidia.com/mig.strategy": "single",
#   "nvidia.com/mig-1g.10gb.count": "7",
#   "nvidia.com/mig-1g.10gb.memory": "10240"
# }

Driver upgrades and profile pinning

NVIDIA occasionally adds new MIG profiles in driver releases. For example, the 1g.10gb+me media-engine variant appeared in later Ampere driver revisions. A profile referenced in the MIG ConfigMap that does not exist in the active driver binary causes the MIG manager pod to enter CrashLoopBackOff. Pin the GPU Operator chart version explicitly in your Helm values and validate the profile table against the GPU Operator release notes [7] before upgrading.

GPU Operator Helm values — pin operator version
# Pin explicitly; do not use 'latest'
operator:
  version: "v24.9.2"   # illustrative — validate against current release notes

Profile geometry is preserved across driver upgrades — re-partitioning is not required unless the new driver introduces a changed or renamed profile. After a node OS upgrade, the MIG manager re-reads the ConfigMap and verifies the partition state, re-applying it only if the GPU state has drifted from the desired configuration.

References

  1. NVIDIA — Multi-Instance GPU User Guide (r580). NVIDIA Corporation, 2024. Covers supported profiles, instance placement constraints, and slice geometry for A100, H100, B200.
  2. NVIDIA Technical Blog — Getting the Most Out of the A100 GPU with MIG. NVIDIA Developer Blog, 2020. Describes the 7-compute-slice / 8-memory-slice model and per-profile allocation mechanics.
  3. NVIDIA — H100 Tensor Core GPU Architecture Whitepaper. NVIDIA Corporation, 2022. H100 SM counts, memory geometry, and NVLink bandwidth figures.
  4. NVIDIA GPU Operator — MIG manager ConfigMap (source, GitHub). NVIDIA Corporation. Source of truth for valid profile names and the all-1g.23gb B200/GB200 configuration.
  5. NVIDIA GPU Operator — MIG Manager Documentation. NVIDIA Corporation, 2024. MIG manager ConfigMap schema, strategy modes, drain requirement, and the node-labelling protocol.
  6. NVIDIA DCGM User Guide — MIG Field ID Support. NVIDIA Corporation, 2024. Full list of DCGM metrics available per MIG instance, including GPU_I_ID and GPU_I_PROFILE label semantics.
  7. NVIDIA GPU Operator — Release Notes. NVIDIA Corporation, 2024. Version-to-profile compatibility table; required reading before any GPU Operator upgrade.

Tags

#mig#gpu-partitioning#series-order/29#series:ai-platform-mlops

About the Author

asleekgeek

asleekgeek

Senior Developer, Architect, DevOps

Owner and main author "ASleekGeek website" #husband #father #software-developer #geek #reader-of-all-things #food-lover #mufc-fan #aspiring-guitarist

Thanks for reading! Explore more articles.

Back to Articles