Skip to content

coopvec-stride-mismatch

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/coopvec-stride-mismatch.md contains the canonical detection logic + GPU reasoning.

TL;DR

When a cooperative-vector call uses a generic row-major or column-major layout, the matrix engine on each IHV (Ada tensor cores, RDNA 3/4 WMMA, Xe-HPG XMX) walks the source buffer using the stride argument as the per-row byte advance. The engine assumes the stride is the natural one for the matrix shape and component type; if it isn't, the engine reads garbage bytes from outside the matrix or from the wrong row, and produces NaN-laced or zero results. There is no error signalled at runtime — the tensor engine has no concept of buffer bounds beyond what the stride tells it.

What the rule fires on

A cooperative-vector matrix-load call (MatrixMul, MatrixVectorMul, OuterProductAccumulate) whose constant-folded stride argument does not equal the natural row-stride implied by the matrix dimensions and the component type (rows * sizeof(component) or cols * sizeof(component) depending on layout). The SM 6.9 cooperative-vector specification requires the stride to match the matrix layout exactly when the layout enum is not OPTIMAL; mismatches produce undefined behaviour because the matrix-engine fetcher walks the wrong number of bytes per row.

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see coopvec-stride-mismatch.md -> Examples.

See also


This is a v1.0-ship stub. Full analysis pending; track issue link TBD.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).