coopvec-base-offset-misaligned

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/coopvec-base-offset-misaligned.md contains the canonical detection logic + GPU reasoning.

TL;DR

Tensor / matrix engines on every IHV require their source operands to be aligned because the engine's load unit is wired for vector-width transactions. NVIDIA Ada Lovelace's tensor cores fetch operands in 128-bit-aligned groups; AMD RDNA 3/4 WMMA loads through a 128-bit-aligned scalar path; Intel Xe-HPG XMX engines align to the SIMD width. A misaligned base offset either splits the fetch into two transactions (cutting throughput in half) or, on stricter implementations, faults the load â€” the cooperative-vector spec writes the latter as undefined behaviour to give IHVs the freedom to fail loudly.

What the rule fires on

A cooperative-vector matrix-load call (MatrixMul, MatrixVectorMul, OuterProductAccumulate) whose constant-folded offset argument is not aligned to the cooperative-vector spec's mandated alignment for the chosen component type and layout (typically 16 bytes for float / FP16 / BF16 paths, 64 bytes for the OPTIMAL layouts on most IHVs). The rule walks the constant-fold chain on the offset argument and the alignment annotation surfaced via Slang reflection, then fires on misalignment.

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see coopvec-base-offset-misaligned.md -> Examples.

coopvec-base-offset-misaligned ​

TL;DR ​

What the rule fires on ​

Why it matters ​

Examples ​

See also ​

coopvec-base-offset-misaligned

TL;DR

What the rule fires on

Why it matters

Examples

See also