coopvec-non-optimal-matrix-layout

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/coopvec-non-optimal-matrix-layout.md contains the canonical detection logic + GPU reasoning.

TL;DR

Cooperative Vectors (SM 6.9) target the tensor-core / matrix-engine hardware on each IHV: NVIDIA Ada Lovelace's tensor cores, AMD RDNA 3/4's WMMA path, and Intel Xe-HPG's XMX engines. Each engine prefers a vendor-specific weight layout so the matrix-element fetch hits the engine's native swizzle pattern in a single transaction. The HLSL spec exposes two opaque enums â€” MATRIX_LAYOUT_INFERENCING_OPTIMAL and MATRIX_LAYOUT_TRAINING_OPTIMAL â€” that the driver maps to its hardware-preferred layout at upload time via the D3D12_LINEAR_ALGEBRA_MATRIX_LAYOUT_CONVERT API.

What the rule fires on

A dx::linalg::MatrixMul, dx::linalg::MatrixVectorMul, or dx::linalg::OuterProductAccumulate call whose matrix-layout enum argument is not one of the IHV-optimal layouts (MATRIX_LAYOUT_INFERENCING_OPTIMAL for the inference path, MATRIX_LAYOUT_TRAINING_OPTIMAL for the training path). The rule walks the matrix-handle constant-fold chain, identifies the layout enum at the call site, and fires when a row-major or column-major layout is used for an inference matrix.

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see coopvec-non-optimal-matrix-layout.md -> Examples.

coopvec-non-optimal-matrix-layout ​

TL;DR ​

What the rule fires on ​

Why it matters ​

Examples ​

See also ​

coopvec-non-optimal-matrix-layout

TL;DR

What the rule fires on

Why it matters

Examples

See also