wavesize-32-on-xe2-misses-simd16

Status: shipped (Phase 8) — see CHANGELOG.

What it detects

A compute / mesh / amplification kernel pinned [WaveSize(32)] under the [experimental.target = xe2] config gate.

Why it matters on a GPU

Intel Xe2 / Battlemage SIMD16 native execution saves one address-gen cycle per dispatch over SIMD32. Per Chips and Cheese's Battlemage architecture deep-dive, kernels pinned to SIMD32 hide native efficiency on Xe2 -- the hardware can issue SIMD16 in the same throughput tier as SIMD32 but with lower address-generation latency.

Options

none. Activated only under [experimental] target = "xe2".

Fix availability

suggestion — Use [WaveSize(16)] or [WaveSize(16, 32)] if the kernel's lane utilisation allows.

wavesize-32-on-xe2-misses-simd16 ​

What it detects ​

Why it matters on a GPU ​

Options ​

Fix availability ​

See also ​

wavesize-32-on-xe2-misses-simd16

What it detects

Why it matters on a GPU

Options

Fix availability

See also