Skip to content

wave64-on-rdna4-compute-misses-dynamic-vgpr

Status: shipped (Phase 8) — see CHANGELOG.

What it detects

A compute entry point declared [WaveSize(64)] (or [WaveSize(64, 64)]) under the [experimental.target = rdna4] config gate.

Why it matters on a GPU

Per AMD's RDNA 4 deep-dives (Hot Chips 2025; Chips and Cheese RDNA 4), the new dynamic-VGPR allocation mode is wave32-only -- the per-wave s_alloc_vgpr instruction works only for the wave32 lane width. wave64 compute on RDNA 4 silently misses the per-block occupancy gain that dynamic-VGPR mode provides over the static allocation on RDNA 3.

Examples

Bad

hlsl
[WaveSize(64)]
[numthreads(64, 1, 1)]
void cs_main() {}

Good

hlsl
[WaveSize(32)]
[numthreads(64, 1, 1)]
void cs_main() {}

Options

none. Activated only under [experimental] target = "rdna4".

Fix availability

suggestion — Wave choice is workload-dependent.

See also

  • AMD Hot Chips 2025 RDNA 4 deck
  • Chips and Cheese: RDNA 4 dynamic VGPR

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).