AI Platform Engineering & MLOps Series · Part 1 of 34

What is MLOps in 2026?

A defensible working definition — not a tools list, but a discipline. Four reference framings, three points of real disagreement, and a vocabulary the rest of this series builds on.

9 min read·2 interactive components·6 references

FrameDataTrainEvalRegistryServeMonitorRetrain

MLOps is one of those terms that gets defined by the tool vendor selling you something, the job description of the role you just opened, and the conference talk you attended last quarter — and all three definitions disagree. That disagreement is not noise. It reflects real, unresolved questions about where software engineering ends and machine learning operations begins. This article takes a position on those questions, defends it against the four most authoritative reference definitions in the field, and closes with the vocabulary this series uses going forward.

If you arrived here wanting a one-sentence answer for a planning meeting: MLOps is the discipline of treating machine-learning models as production software backed by data. The rest of this article unpacks why that sentence is load-bearing and where four major reference frameworks agree and disagree with it.

A working definition

This series defaults to the following definition: MLOps is the discipline of treating machine-learning models as production software backed by data. Three commitments fall out of that sentence, and a team that skips any one of them is doing something adjacent to MLOps but not MLOps itself.

Production software — models live behind versioned interfaces, ship through code review and CI pipelines, run on the same cluster substrate as the rest of the platform, and observe the same service-level objectivesas any other service. The word “production” is not aspirational here; it means the model is taking real traffic and there is a human on call for when it misbehaves.

Backed by data — the artifact under management is the model plus the data that produced it, not just the weights file. Lineage from raw data through feature engineering, through the training run, to the deployed version is non-negotiable. Without it you cannot reproduce a model, audit it, or safely roll it back. Sculley et al.'s 2015 NeurIPS paper identified data dependencies as one of the primary sources of hidden maintenance cost in ML systems — a finding that every subsequent maturity framework has absorbed [1].

Maintained — models degrade. The input distribution shifts, the world changes, the labelling assumptions drift. The MLOps lifecycle explicitly includes retraining, evaluation, rollback, and retirement — not just first deploy. A team that ships a model and moves on to the next project is doing data science with a deployment step, not MLOps.

Practical Implication: If your team ships models and has no automated retraining loop, you are doing the first half of MLOps. The second half — monitoring, drift detection, retraining trigger, evaluation gate, promotion — is where most teams discover they have been accumulating configuration debt silently.

The four reference definitions

Four sources carry enough authority that any working definition must account for them. They agree on the spine and disagree at the edges — the disagreements are where the interesting engineering decisions live.

Google: CI/CD/CT

Google's Architecture Center document frames MLOps as DevOps principles applied to ML systems, with one structural addition: Continuous Training (CT) alongside the familiar CI and CD [2]. In Google's model, CI is no longer only about testing code — it tests and validates data, data schemas, and models. CD is no longer about a single service — it deploys a training pipeline that then produces a model prediction service. CT is described as “a new property, unique to ML systems, that's concerned with automatically retraining and serving the models.” The same document notes that only a small fraction of a real-world ML system is actually ML code; the surrounding elements — data pipelines, configuration, monitoring, serving infrastructure — are the majority of the system. This observation traces directly back to the Sculley et al. paper [1].

Google describes three maturity levels: Level 0 (fully manual — data scientists train models in notebooks, hand off to engineers, deploy via ad hoc scripts), Level 1 (ML pipeline automation — the full training pipeline is deployed and triggered automatically), and Level 2 (CI/CD pipeline automation — pipeline components themselves are built, tested, and deployed via automated pipelines). This is a tighter and less prescriptive model than Microsoft's.

Microsoft: the five-level maturity model

Microsoft's Azure Architecture Center publishes an MLOps maturity model with five levels: Level 0 (No MLOps), Level 1 (DevOps but no MLOps), Level 2 (Automated Training), Level 3 (Automated Model Deployment), and Level 4 (Full MLOps Automated Operations) [3]. The model explicitly assesses three dimensions at each level: people and culture, processes and structures, and objects and technology. This framing is deliberately organisational rather than purely tooling-focused — a team can have an experiment trackerand a pipeline orchestrator running and still be at Level 0 if the people, processes, and governance are not in place. Microsoft's intended use is as a self-assessment guide for estimating the scope of work when adopting MLOps, not as a deployment checklist.

The Hidden Technical Debt paper: what MLOps is trying to fix

Sculley et al.'s 2015 NeurIPS paper does not define MLOps — the term was not yet current — but it names the failure modes MLOps now exists to fix [1]. The paper's core argument is that while ML offers a powerful toolkit for building complex prediction systems quickly, it is dangerous to treat those quick wins as free. Using the software engineering lens of technical debt, the authors identify ML-specific risk factors that accumulate silently: boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, configuration debt, and changes in the external world. Any MLOps definition that does not address these failure modes is incomplete.

This paper is load-bearing for the whole field. The concepts of data dependencies, configuration debt, and hidden feedback loops — now standard vocabulary in MLOps planning conversations — originate here. Reading it before adopting any MLOps tooling is not optional.

CD Foundation MLOps SIG: extension of DevOps

The CD Foundation's MLOps SIG defines MLOps as “the extension of the DevOps methodology to include Machine Learning and Data Science assets as first class citizens within the DevOps ecology” [4]. This is the most explicitly artifact-centric definition: ML models and datasets are not special exceptions to DevOps practice but full participants in it — versioned, tested, promoted through environments, and governed like any other production artifact. The SIG roadmap also makes a distinction worth preserving: MLOps is not AIOps. “AIOps often means an application of AI technologies to Ops data with sometimes unclear aims” — a separate concern this series addresses in its own article.

The Thoughtworks CD4MLformulation, closely related to the CD Foundation position, defines the practice as: “a software engineering approach in which a cross-functional team produces machine learning applications based on code, data, and models in small and safe increments that can be reproduced and reliably released at any time” [5]. The operative phrase is code, data, and models — the three versioned artifact types that make ML delivery structurally different from conventional software delivery.

The explorer below lets you select each reference framework and see which of the eight lifecycle stages it explicitly covers — making the overlaps and gaps concrete.

Definition Lens Explorer

Select a reference framework to see which lifecycle stages it explicitly covers — and where the gaps are.

Google: CI/CD/CT

2020 / 2024

Google Cloud Architecture Center

Extends DevOps with Continuous Training (CT) — the third loop unique to ML. Notes that ML code is a small fraction of a real-world ML system; the surrounding data pipelines, configuration, serving infrastructure, and monitoring are the majority.

Unique contribution

Continuous Training (CT): automated re-training pipelines as a first-class delivery concern, not a future project.

Lifecycle coverage for this lens

Problem Framing—

Data Preparation✓

Training✓

Evaluation✓

Registry✓

Serving—

Monitoring✓

Retraining✓

6 of 8 stages explicitly addressed by this framework

Where the definitions diverge — and this series' position on each

The four framings converge on a shared core: ML systems need versioning beyond just code, models need a retraining loop, and production ML needs observability that classical DevOps does not provide. They diverge on three specific points where the engineering decisions are non-trivial.

Divergence 1: Is feature engineering in scope?

Google's framework and the CD Foundation include feature engineeringin the MLOps perimeter. Microsoft's maturity model leaves it to a separate DataOps function. The pragmatic resolution is: it depends on whether you operate a feature store. If you do, the feature store is operational infrastructure with its own availability requirements and training-serving parity guarantees — it sits squarely inside MLOps. If you do not, feature engineering is executed as part of the training pipeline and the boundary with DataOps is the raw data snapshot the training job receives. This series treats feature stores as MLOps infrastructure when they are present and DataOps output when they are not.

Divergence 2: Where does MLOps end and ModelOps begin?

Some practitioners use “ModelOps” for the post-deployment half of the lifecycle — monitoring, drift detection, retraining triggers, rollback. Others treat it as a synonym for MLOps. This series treats MLOps as covering the full lifecycle and ModelOps as a sub-concern within it. The practical reason is that teams which separate training-side and serving-side ownership under different discipline names consistently encounter a leaking seam between training and serving — the organisation optimises each half independently and no one owns the retraining loop that connects them. Keeping the term unified keeps the accountability unified.

Divergence 3: Is LLMOps a subset or a successor?

Google and Microsoft both treat LLMOps as an extension of MLOps, adding LLM-specific concerns on top of a shared foundation. The CD Foundation has begun publishing separate LLMOps guidance. Synthesising the peer-reviewed literature on MLOps-to-LLMOps transitions [6] with the major reference frameworks, LLMOps adds four operating concerns not found in classical MLOps — prompt and tool versioning, agent runtime management, LLM-judge evaluation suites, and semantic drift monitoring — built on a foundation that is largely unchanged. This series takes the extension position. LLMOps is addressed in a later part; this article establishes the foundation it extends.

Practical Implication: The most expensive divergence in practice is Divergence 2. Teams that split ModelOps from MLOps organisationally almost always end up with a gap at the retraining seam — nobody explicitly owns triggering retraining when monitoring detects drift. This gap is where silent model degradation lives.

Boundaries: MLOps vs DevOps, DataOps, AIOps, and LLMOps

Adjacent disciplines cause more confusion among new practitioners than any definitional gap. The boundaries below are sharp by design — the cost of a fuzzy boundary is diffused accountability.

DevOps ships application code. MLOps inherits every DevOps practice — versioning, CI/CD, observability, incident response — and adds the model artifact and its training data as first-class versioned things. A DevOps team that “also does ML” without versioning data and without a retraining loop is doing software engineering applied to ML artifacts, not MLOps.

DataOps owns the data pipelines that feed everything downstream — analytics, BI, and ML. MLOps consumes DataOps output. The practical boundary is the feature store or the training data snapshot: upstream is DataOps territory; downstream is MLOps. The SLAs differ across the boundary — analytics tolerates hours of staleness; online inference often does not. Where the same engineers wear both hats, the boundary still matters because the failure modes are different.

AIOps is the application of AI or ML techniques to IT operations data — anomaly detection on logs, automated root cause analysis, predictive capacity planning. It is not a superset of MLOps; it is a consumer of ML infrastructure. The CD Foundation MLOps SIG explicitly flags the confusion: “MLOps is not to be confused with 'AIOps'” [4]. The two disciplines share tooling at the infrastructure layer but have orthogonal goals.

LLMOps is MLOps extended to systems where the model is a large language model and the application is the orchestration built around it. The classical MLOps lifecycle — experiment tracking, model registry, serving, monitoring, retraining — applies in full; LLMOps adds concerns that have no classical equivalent (prompt versioning, agent runtime state, retrieval pipeline observability, KV cache management). This series covers LLMOps as a dedicated topic in a later part; every principle established here carries forward.

The map below makes the boundary distinctions interactive. Click any discipline region to see what lives there and how it relates to MLOps — with examples from the text.

Boundary Mapper

Click a discipline region to see what lives there and how it relates to MLOps.

MLOps Core

The full eight-stage ML lifecycle: experiment tracking, model registry, serving, monitoring, and automated retraining. Models live behind versioned interfaces, ship through CI pipelines, and observe the same SLOs as any other service.

What lives here

Training pipeline automation
Model registry with lifecycle states
Canary deployments for models
Drift detection and retraining triggers
Data lineage from raw data to deployed model

The eight-stage lifecycle

This series models the lifecycle as a closed loop of eight stages. The loop structure is the point: a system that runs straight from training to serving, without closing back to retraining, is not MLOps — it is a one-shot batch job with a deployment step. The failure to close the loop is among the most common reasons ML systems degrade silently in production [2].

The eight stages in order: Problem Framing (translate a business objective into a measurable ML problem); Data Preparation (sourcing, cleaning, labelling, versioning, validation); Training (experiments, hyperparameter tuning, run tracking); Evaluation (held-out test, slice analysis, fairness checks, model card); Registry (artifact versioning, lifecycle state, promotion gate); Serving (inference endpoints, canary deployment, rollback); Monitoring (data drift, prediction distribution, business metric correlation); and Retraining (automated trigger, fresh data ingestion, evaluation gate, promotion, rollout). Subsequent articles in this series address each stage in depth.

Three stages are most commonly skipped by teams that call themselves MLOps: Evaluation (treated as running a notebook once rather than a repeatable automated gate), Registry (model artifacts promoted by convention rather than by an enforced pipeline), and Retraining (treated as a future concern until a production degradation makes it urgent). Each skip creates exactly the failure modes Sculley et al. described — undeclared consumers, configuration debt, hidden feedback loops that degrade silently [1].

1Problem Framing

Translate business objective into measurable ML problem

2Data Preparation

Source, clean, label, version, validate

3Training

Experiments, hyperparameter tuning, run tracking

4Evaluation

Held-out test, slice analysis, fairness, model card

5Registry

Artifact versioning, lifecycle state, promotion gate

6Serving

Inference endpoints, canary deployment, rollback

7Monitoring

Data drift, prediction distribution, business metrics

8Retraining

Trigger, fresh data, evaluation gate, promotion, rollout

Three anti-patterns worth naming

MLOps as a tools list. “We have an experiment tracker, a pipeline orchestrator, and a model registry” is an inventory, not a discipline. MLOps is the practiceof using those tools to maintain models in production — the discipline shows up in the runbooks and the on-call rotation, not in the Helm releases. Microsoft's maturity model is explicit on this point: Level 0 is reachable even with a full tool stack if the people, processes, and governance are not in place [3].

MLOps as DevOps with a notebook. A pipeline that bakes a model artifact into a container image and ships it as an application deployment has no data versioning, no lineage, and no retraining loop. It is deployment-as-a-service for models, not MLOps. The absence of data lineage means the model cannot be audited, reproduced, or rolled back cleanly.

MLOps as a parallel platform. Standing up a separate Kubernetes cluster “for ML” because the central platform team is a bottleneck is a short-term gain and a long-term tax. The team that runs its own cluster reinvents networking, identity, observability, and supply chain security — all already solved adjacent to them. The better pattern, which the next article in this series examines, is an AI Platform team that extends the existing platform with ML primitives rather than forking it.

Practical Implication: The “parallel platform” anti-pattern is especially common when ML teams are growing fast and the central platform team has a long backlog. The short-term velocity gain is real but the long-term cost — two security perimeters, two identity domains, two observability stacks — compounds silently until a compliance or incident event forces the consolidation anyway.

What this series uses going forward

For the remaining 33 articles in this series, the following definitions hold:

›MLOps — The discipline of treating machine-learning models as production software backed by data, covering the full eight-stage lifecycle from problem framing through retraining.
›Maturity — Assessed on two complementary scales: Microsoft's five-level model for organisational capability (people, process, technology) and Google's three-level model for pipeline automation depth. Both are used because they measure different things.
›DataOps — Upstream of MLOps; the boundary is the training data snapshot or feature store.
›ModelOps — The post-deployment half of MLOps; not a separate discipline in this series.
›LLMOps — An extension of MLOps for LLM-based systems; the shared foundation is established here, and the four genuinely new concerns are addressed in a dedicated later part.
›AIOps — A distinct discipline (the application of AI to IT operations data) and not a synonym for MLOps.

The next article in this series examines who owns MLOps at the team level — the AI Platform team, what it is responsible for, and where its accountability boundary sits with the central platform team, the MLOps engineer, and the data organisation.

References

[1] D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison. “Hidden Technical Debt in Machine Learning Systems.” Advances in Neural Information Processing Systems 28 (NeurIPS), 2015.
[2] Google Cloud Architecture Center. “MLOps: Continuous delivery and automation pipelines in machine learning.” Google Cloud Documentation, 2020 (updated 2024).
[3] Microsoft Azure Architecture Center. “MLOps maturity model.” Microsoft Learn, 2022 (updated 2024).
[4] CD Foundation MLOps SIG. “MLOps Roadmap 2022.” Continuous Delivery Foundation, 2022.
[5] Danilo Sato, Arif Wider, Christoph Windheuser. “Continuous Delivery for Machine Learning.” Thoughtworks / martinfowler.com, 2019 (updated 2023).
[6] MDPI Information. “Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language Models.” Information, vol. 16, no. 2, 2025.

Continue the Journey

AI Platform

What is MLOps in 2026?

A working definition

The four reference definitions

Google: CI/CD/CT

Microsoft: the five-level maturity model

The Hidden Technical Debt paper: what MLOps is trying to fix

CD Foundation MLOps SIG: extension of DevOps

Definition Lens Explorer

Where the definitions diverge — and this series' position on each

Divergence 1: Is feature engineering in scope?

Divergence 2: Where does MLOps end and ModelOps begin?

Divergence 3: Is LLMOps a subset or a successor?

Boundaries: MLOps vs DevOps, DataOps, AIOps, and LLMOps

Boundary Mapper

MLOps Core

The eight-stage lifecycle

Three anti-patterns worth naming

What this series uses going forward

References

Continue the Journey

The ML lifecycle, end to end, in production

MLOps vs LLMOps — the 60 / 40 seam

The Architectural Saga of Kubernetes