wavesize-32-on-xe2-misses-simd16

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/wavesize-32-on-xe2-misses-simd16.md contains the canonical detection logic + GPU reasoning.

TL;DR

Intel Xe2 / Battlemage SIMD16 native execution saves one address-gen cycle per dispatch over SIMD32. Per Chips and Cheese's Battlemage architecture deep-dive, kernels pinned to SIMD32 hide native efficiency on Xe2 -- the hardware can issue SIMD16 in the same throughput tier as SIMD32 but with lower address-generation latency.

What the rule fires on

A compute / mesh / amplification kernel pinned [WaveSize(32)] under the [experimental.target = xe2] config gate.

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see wavesize-32-on-xe2-misses-simd16.md -> Examples.

wavesize-32-on-xe2-misses-simd16 ​

TL;DR ​

What the rule fires on ​

Why it matters ​

Examples ​

See also ​

wavesize-32-on-xe2-misses-simd16

TL;DR

What the rule fires on

Why it matters

Examples

See also