AI Platform Engineering & MLOps · Part XV of 34

Governance and lineage — answering the four questions a regulator will ask

Model cards, lineage chains, and audit trails mapped to the four questions every regulator asks: what is it, where did it come from, who approved it, and is what you’re serving what you signed?

10 min read·2 interactive components·6 references

Backward · auditForward · impact analysis

Once a model registry is in place and a curation policy controls what enters it, the next discipline is governance and lineage — the audit infrastructure that lets you answer the four questions a serious customer or regulator will eventually ask: what is this model, where did it come from, who approved it, and is the artifact you are serving the artifact you signed?

This article maps each question to a concrete artifact, names the five lineage edges that must be captured, and explains why governance is engineering work — not a compliance checkbox filled in at audit time. It is the fifteenth article in the AI Platform Engineering & MLOps series; it assumes you have read the preceding articles on the model registry, curation policy, and repository patterns.

The four regulator questions

Framing governance as four questions keeps the work grounded. Abstract governance programmes drift into documentation theatre; the four questions do not.

1What is this model? — Answered by the model card: purpose, training data, evaluation results, known failure modes, intended use, restrictions, and licence.
2Where did it come from? — Answered by the lineage chain: raw data version → dataset version → training-run hash → model artifact hash. Every link must be queryable.
3Who approved it? — Answered by the audit trail: the curation-gate approval record, the named approver, the evidence attached (eval results, security review, legal sign-off where applicable), and the timestamp.
4Is what you are serving what you signed? — Answered by signature verification at admission: the serving runtime must verify the artifact’s cryptographic signature before the serving pod starts, confirming the artifact has not been tampered with between registration and deployment.

A regulated environment (financial services, healthcare, public sector) requires all four to be answerable at any time, for any model, including retired ones. An unregulated team can choose the depth of implementation — but the four questions remain a useful design target regardless.

Model cards — structured disclosure for question one

The model card concept was formalised by Mitchell et al. in Model Cards for Model Reporting (FAT* 2019), which proposed short documents accompanying trained models to communicate their performance characteristics across culturally, demographically, or phenotypically distinct groups. The paper’s core insight is that a model without documented limitations will be applied beyond those limits — and that the absence of disclosure is itself a design decision with consequences. [1]

A model card answering question one must include, at minimum:

Name, version, and registry URI — the pointer to the artifact.
Owner — the team and named individual accountable for this model.
Purpose — a specific problem statement, not a generic capability label. “Score loan applications for credit-default risk” passes; “classify text” does not.
Training data — dataset name and version, date range covered, known biases or gaps. Linked to the data catalogue. The companion pattern here is Datasheets for Datasets (Gebru et al., CACM 2021), which applies the same disclosure discipline to the dataset side. [2]
Evaluation results — offline suite results against a defined baseline; slice-level results by cohort where relevant.
Intended use and out-of-scope use — what this model will reliably do, and what it will not. The out-of-scope field is often the most practically useful.
Known failure modes — what red-teaming or past incidents revealed.
Restrictions — licence terms for imported weights, data-residency requirements, customer-segment limitations.
Approvals — who signed off at the curation gate, security review, and legal review (where applicable).

The curation gate should enforce specificity on the purpose and intended-use fields before promoting a model to staging. A model card that is a copy-pasted template with generic entries provides no governance value; the gate should reject it.

Lineage — the five edges that must be captured

A lineage chain answers question two. It is a directed graph with five edges that must each be explicitly recorded and queryable. The absence of any one edge means the chain is broken and the audit question cannot be answered.

1Raw data source version — the data-catalogue reference for the warehouse table, object-store path, or streaming topic that fed the training job. Versioned by snapshot timestamp or partition key.
2Dataset version — the materialised training dataset: a frozen Parquet snapshot, a versioned DVC commit, or a named dataset in a feature store. This is the edge most often left unrecorded and the one that breaks the chain in impact-analysis scenarios.
3Training-run hash — the code commit SHA, the hyperparameter set, the environment specification, and the platform job identifier. This is what a model registry (e.g. MLflow, MLMD) records natively as an Execution with attached Artifacts.
4Model artifact hash — the registered version in the registry: the file digest in object storage, the OCI content-addressable digest, or the registry version record. This is the artifact identifier that the signing step attests.
5Deployed-version selector — which serving-runtime version is currently using this artifact, derived from the GitOpscontroller’s desired-state history. This is the edge that closes the loop between the registry and the running system.

Each edge supports two traversal directions:

Forward (impact analysis): “Dataset version v17 has a data-quality issue. Which training runs used it? Which model artifacts were produced? Which serving versions are currently live with those artifacts?”
Backward (audit): “The model serving requests at 02:00 UTC yesterday — what dataset trained it, what code commit produced it, who approved the promotion, when?”

The tracer below walks the five-edge chain in both directions — and lets you break the dataset-version edge to see exactly where an audit fails.

Lineage Graph Tracer

Walk the five-edge lineage chain in either traversal direction. Each step shows the record that must exist at that node and the regulator question it answers.

Scenario

“The model serving requests at 02:00 UTC yesterday — what dataset trained it, what code commit produced it, who approved the promotion, when?”

Deployed-version selector→Model artifact hash→Training-run hash→Dataset version→Raw data source

Edge 1 of 5 — Deployed-version selector

serving v2.3.1

What must be recorded

Which serving-runtime version is currently using this artifact, derived from the GitOps controller’s desired-state history.

Regulator question it answers

Question two from the serving side — this is the edge that closes the loop between the registry and the running system.

Lineage standards — OpenLineage and MLMD

Two open standards are worth anchoring on. OpenLineage defines a vendor-neutral event schema with three core entities: Job (a process that consumes or produces datasets), Run (an execution of a Job, with a start event and a terminal event), and Dataset (an abstract data resource with a stable identifier). Facets extend each entity with domain-specific metadata — schema, data-quality assertions, column-level lineage. [3]

ML Metadata (MLMD) — the lineage library from the TFX project — models the same problem with three types: Artifact (a file or object), Execution (a component run), and Event (the relationship between an execution and its input/output artifacts). MLMD is storage-backend-agnostic (SQLite or MySQL) and is designed to be embedded in ML pipeline orchestrators. [4]

Both models converge on the same insight: lineage is a property of events and relationships, not of the artifacts themselves. Recording the artifact alone — “model v2.3.1 exists” — is not lineage. Lineage requires recording what produced the artifact, what that execution consumed, and what the artifact in turn feeds. The practical recommendation is to emit OpenLineage events from your training pipeline and store execution metadata in a registry-native store (e.g., an MLflow-compatible backend or MLMD), with a reconciliation step that cross-links the two.

The audit trail — answering questions three and four

Question three (who approved it?) and question four (is what you are serving what you signed?) are answered by different but complementary mechanisms.

Approval records

Every state transition in the registry — from sandbox to staging, from staging to production — must generate an immutable record containing: the actor (the named approver’s identity, not a service account), the timestamp, the evidence links (eval results URL, security-review ticket, model-card version), and the target state. The registry’s built-in transition log is a reasonable starting point; production-regulated environments typically supplement it with an append-only audit log shipped to a tamper-evident storewith a retention period that outlives the model’s production lifetime.

The EU AI Act, Article 12, requires that high-risk AI systems technically enable automatic recording of events over the system’s lifetime and that logs be retained for a period appropriate to the intended use, with a minimum floor of at least six months, unless otherwise provided by applicable Union or national law. [5] Article 11 requires technical documentation sufficient for a conformity assessment. These requirements confirm that the audit trail is not an optional layer — it is a design constraint.

Signature verification — closing question four

Question four asks whether the artifact being served right now is exactly the artifact that was reviewed and signed during curation. Without signature verification at admission, the answer is “probably” — and “probably” is not acceptable for a regulated deployment.

The signing step happens at the moment of registry promotion: the training pipeline or promotion workflow signs the artifact using a tool such as Cosign (from the Sigstoreproject) and stores the signature alongside the artifact — as an OCI annotation in an OCI-artifact registry, or as a sidecar attestation in non-OCI registries. At deploy time, the cluster’s admission controller verifies the signature before the serving pod is permitted to pull the artifact. If the signature is absent or invalid, the pod does not start.

The OCI-artifact registry pattern has an advantage here: OCI content-addressed digests (sha256:...) are intrinsic to the storage format, and admission controllers with OCI awareness can verify both the digest and the cosign signature in a single policy. Non-OCI registries (MLflow’s default artifact store, for instance) require an additional layer — typically an init-container or a pre-deploy hook — to perform the equivalent check.

The supply-chain depth and SBOM implications of this signing pattern are covered in Article 16 of this series (SBOM and signing for ML). This article establishes why question four matters and names the mechanism; the next article goes deeper on the mechanics and the SLSA framework.

Regulatory context — NIST AI RMF and EU AI Act

The NIST AI Risk Management Framework (AI RMF 1.0, January 2023) structures AI governance across four functions: GOVERN, MAP, MEASURE, and MANAGE. The GOVERN function is the horizontal — it defines organisational accountability, policy, and documentation requirements that the other three functions depend on. [6] GOVERN 1.1 explicitly requires documentation of applicable laws, regulations, and organisational policies — a direct mapping to the model-card and audit-trail disciplines described here.

The EU AI Act adds two articles relevant to model governance. Article 11 (Technical documentation) requires that high-risk AI systems be accompanied by documentation sufficient for a conformity assessment — this maps to the model card. Article 12(Record-keeping) requires automatic logging throughout the system’s lifetime — this maps to the audit trail and the lineage chain. Neither article prescribes specific tooling; both set evidence requirements that the governance disciplines in this article are designed to satisfy.

Two additional regulatory concerns surface in environments with strong data-sovereignty requirements:

Data residency — some training data must not leave a geographic region. The lineage chain must record the data-source region; registry replication must respect residency boundaries.
Right to erasure — if a subject requests deletion of their data, the lineage chain answers which models trained on data that included that subject. The policy response (retrain, accept the residual risk, retire) is an organisational decision; the lineage chain makes the question answerable. Without the dataset-version edge in the chain, the question cannot be answered reliably.

Common failure modes

Model cards that no one reads. A card filled with generic entries — “intended use: text classification” — satisfies no one. Gate enforcement must require specificity. If the curation approver cannot evaluate the card without asking follow-up questions, the card is not complete.
Broken lineage at the dataset edge. The raw-data-to-dataset-version link is the most fragile because it is often manual. Wire it into the training-pipeline template — automated tagging on pipeline start — so it is captured without relying on engineers to remember.
Signature verification at the wrong layer. Verifying the container image but not the model artifact leaves a gap: the image is clean but the artifact it pulls at runtime may not be. Both layers need coverage.
Audit logs co-located with operational logs. When the operational log store is rotated or purged, the audit history disappears with it. Audit logs need separate storage, longer retention, and — in regulated environments — tamper-evident options such as object lock.
Treating governance as a compliance exercise rather than an engineering discipline. The lineage query “which serving versions are using the artifact trained on the bad dataset?” is equally useful for a data-quality incident and a regulatory audit. Teams that build governance for the regulator often end up with evidence that is technically present but not operationally useful — queries that take days, not minutes.

Score your own platform against the four questions — the checklist below maps every artifact from this article to the regulator question it answers.

Audit Readiness Scorer

Tick the artifacts your platform has today. A regulator question only counts as answerable when every artifact under it is in place — none of the four mechanisms can substitute for the others.

0 of 4 regulator questions answerable · 0/19 artifacts in place

Gaps remain in: Q1 · What is this model? · Q2 · Where did it come from? · Q3 · Who approved it? · Q4 · Is what you serve what you signed?

Q1 · What is this model?

Mechanism: Model card

Gap · 0/7

Q2 · Where did it come from?

Mechanism: Lineage chain

Gap · 0/5

Q3 · Who approved it?

Mechanism: Audit trail

Gap · 0/4

Q4 · Is what you serve what you signed?

Mechanism: Signature verification

Gap · 0/3

Cross-series connections

This article sits inside Part 4 of the series (Serving, registry & lineage). The preceding articles (12–14) establish the registry, curation policy, and repository patterns that governance instruments sit on top of. The question answered here — “is the served artifact the signed artifact?” — opens directly into the supply-chain security theme of Part 8. Article 16 (SBOM and signing for ML) takes the signing mechanics further: SLSA levels, Software Bills of Materials, and end-to-end verification from artifact creation to admission-controller enforcement.

If you are reading this article because a third-party auditor is incoming and you need to understand the evidence gap, the summary is: the model cardanswers the “what” question in documents; the lineage chainanswers the “from where” question in queryable metadata; the audit trailanswers the “who approved” question with immutable records; and the signed artifact answers the “is it intact” question cryptographically. None of these can substitute for the others. See Article 16 for the supply-chain depth behind that last question.

References

[1] Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D., & Gebru, T. (2019). Model Cards for Model Reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT* ’19), ACM. arXiv:1810.03993.
[2] Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2021). Datasheets for Datasets. Communications of the ACM, 64(12), 86–92. DOI:10.1145/3458723.
[3] OpenLineage Project. OpenLineage Specification — Object Model. openlineage.io. Retrieved June 2026.
[4] Google / TFX Team. ML Metadata (MLMD) — TFX Guide. TensorFlow Extended documentation. Retrieved June 2026.
[5] European Parliament and Council. (2024). EU Artificial Intelligence Act, Article 12 — Record-Keeping. Official Journal of the European Union.
[6] National Institute of Standards and Technology. (2023). Artificial Intelligence Risk Management Framework (AI RMF 1.0). NIST AI 100-1. January 2023.

Continue the Journey

AI Platform