Skip to content

wavereadlaneat-constant-non-zero-portability

Status: shipped (Phase 3) — see CHANGELOG.

(via ADR 0011)

What it detects

A WaveReadLaneAt(x, K) call where K is a compile-time non-zero constant in an entry point that lacks a [WaveSize(N)] (or [WaveSize(min, max)]) attribute pinning the wave size. The detector folds the second argument to a constant if possible, reads the entry's [WaveSize] attribute via reflection, and fires when the constant is >= 32 (potentially out-of-range on a 32-wide wave) or, more tightly, when the constant is >= min(supported_wave_sizes_on_target). The companion AST-only rule wavereadlaneat-constant-zero-to-readfirst (Phase 2) handles the K == 0 case; this rule handles K != 0.

Why it matters on a GPU

WaveReadLaneAt(x, K) returns the value of x from lane K within the current wave. The valid range of K is [0, WaveGetLaneCount() - 1]. On AMD RDNA 1/2/3 the hardware supports both 32-wide and 64-wide waves; the driver picks one per-PSO based on hints and register pressure. On NVIDIA Turing and Ada Lovelace the wave is always 32 lanes. On Intel Xe-HPG the SIMD width is 8, 16, or 32 lanes depending on the compiler's choice. A WaveReadLaneAt(x, 47) is valid on RDNA wave64, undefined on RDNA wave32 / Ada (lane index out of range), and undefined on Xe-HPG wave16 / wave32. The HLSL spec says out-of-range lane indices produce undefined results — in practice the hardware returns garbage or zero depending on the IHV.

Without [WaveSize], the developer cannot rely on the wave size at any specific call site, but a constant K = 47 baked into the source unmistakably implies a wave64 assumption. The bug is that the assumption is invisible to readers — the constant looks innocuous — and the failure mode is silent: the shader runs to completion on wave32 and produces wrong values, with no compiler warning and no runtime fault.

The fix is one of: pin the wave size with [WaveSize(64)] if the algorithm requires wave64, refactor the call to a wave-size-agnostic alternative (WaveActive* reductions don't carry a lane index), or constrain the constant to K < 32 so it works on every supported wave size. The rule surfaces the choice; the author owns the resolution.

Examples

Bad

hlsl
// No [WaveSize]; constant lane index implies wave64 but isn't guaranteed.
[numthreads(64, 1, 1)]
void cs_unpinned(uint gi : SV_GroupIndex) {
    float v = GroupshareInput[gi];
    // K = 47 is OOB on wave32 (Ada, RDNA wave32 mode, Xe-HPG wave32).
    float broadcast = WaveReadLaneAt(v, 47);
    Output[gi] = broadcast;
}

Good

hlsl
// Pin the wave size to match the assumed lane index range.
[WaveSize(64)]
[numthreads(64, 1, 1)]
void cs_pinned64(uint gi : SV_GroupIndex) {
    float v = GroupshareInput[gi];
    float broadcast = WaveReadLaneAt(v, 47);  // valid on wave64
    Output[gi] = broadcast;
}

// Or constrain K to a portable range and let any wave size run.
[numthreads(64, 1, 1)]
void cs_portable(uint gi : SV_GroupIndex) {
    float v = GroupshareInput[gi];
    float broadcast = WaveReadLaneAt(v, 7);  // valid on every supported size
    Output[gi] = broadcast;
}

Options

none

Fix availability

suggestion — Pinning wave size or refactoring the lane index has structural consequences. The diagnostic identifies the constant-K vs wave-size mismatch; the author chooses the resolution.

See also


Edit this page

© 2026 NelCit, CC-BY-4.0.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).