Skip to content

linalg-matrix-element-type-mismatch

Status: shipped (Phase 8) — see CHANGELOG.

What it detects

A linalg::*Mul chain whose matrix element type (e.g. COMPONENT_TYPE_FLOAT16, COMPONENT_TYPE_FLOAT_E4M3) is mixed with a high-precision accumulator (COMPONENT_TYPE_FLOAT32 / _FLOAT64) without an explicit conversion. Activates only on SM 6.10+ targets.

Why it matters on a GPU

The matrix-engine fetcher silently widens the matrix's elements to the accumulator's precision, performing a per-element conversion that costs throughput on every IHV's matrix engine (Blackwell 5th-gen Tensor Cores, RDNA 4 AI accelerator, Xe2 XMX, Hopper Tensor Cores). Operations that look free in code are paid for at the fetcher.

Examples

Bad

hlsl
linalg::MatrixVectorMul(COMPONENT_TYPE_FLOAT16, COMPONENT_TYPE_FLOAT32, ...);

Good

hlsl
linalg::MatrixVectorMul(COMPONENT_TYPE_FLOAT16, COMPONENT_TYPE_FLOAT16, ...);

Options

none

Fix availability

suggestion — The intended type chain is application-specific.

See also

  • HLSL Specs proposal 0035
  • Companion blog post: not yet published

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).