Skip to content

linalg-matrix-non-optimal-layout

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/linalg-matrix-non-optimal-layout.md contains the canonical detection logic + GPU reasoning.

TL;DR

SM 6.10 promotes vector::CooperativeVector to linalg::Matrix. Drivers route OPTIMAL-tagged matrices to the on-chip matrix engine without a per- element swizzle; the matrix-engine fetcher (NVIDIA Blackwell 5th-gen Tensor Cores for FP4/FP6, NVIDIA Hopper for FP8, AMD RDNA 4's 2nd-gen AI accelerator, Intel Xe2 XMX) is gated on the OPTIMAL layout. A row-major / column-major declaration triggers a per-element swizzle on every fetch, costing 2-4x throughput on every IHV's matrix engine.

What the rule fires on

A linalg::*Mul / OuterProductAccumulate call (SM 6.10, proposal 0035) whose matrix-layout enum argument is MATRIX_LAYOUT_ROW_MAJOR or MATRIX_LAYOUT_COLUMN_MAJOR instead of the IHV-preferred MATRIX_LAYOUT_INFERENCING_OPTIMAL / MATRIX_LAYOUT_TRAINING_OPTIMAL. The rule activates only on SM 6.10+ targets and is the SM 6.10 successor to coopvec-non-optimal-matrix-layout.

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see linalg-matrix-non-optimal-layout.md -> Examples.

See also


This is a v1.0-ship stub. Full analysis pending; track issue link TBD.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).