Skip to content

divergent-buffer-index-on-uniform-resource

Status: shipped (Phase 4) — see CHANGELOG.

(via ADR 0011)

What it detects

An indexed buffer access buf[i] (where buf is a Buffer, StructuredBuffer, ByteAddressBuffer, or ConstantBuffer<T> and i is a wave-divergent expression) on a resource binding that is itself uniform across the wave — that is, buf is referenced through a single descriptor known at compile time, not through an [NonUniformResourceIndex] heap index. The hazard is the index, not the resource. The rule fires when the divergence analysis can prove the index varies across the wave (typical sources: SV_DispatchThreadID, per-lane loaded values, results of WaveReadLaneAt) while the resource itself is bound uniformly.

Why it matters on a GPU

Modern GPUs split memory paths between the scalar / constant cache and the vector L1. On AMD RDNA 2/3, a uniform-resource + uniform-index buffer load is issued as a scalar load through the K$, returning one value to the SGPR file at a fraction of the cost of a vector load. A uniform-resource + divergent-index load is forced onto the vector path: the hardware issues 32 (wave32) or 64 (wave64) parallel L1 transactions, one per lane. On NVIDIA Ada the constant-cache fast path requires both the resource and the offset be uniform; a divergent offset spills to the global L1 / L2 path and serialises by cache-line. On Intel Xe-HPG, the constant-buffer fast path likewise requires uniform offsets and the divergent case falls through to the data-port path, which serialises across distinct cache lines.

The pattern is most painful for ConstantBuffer<T> and small StructuredBuffer accesses that the author intended as constant-time table lookups: a uniform binding seems to "promise" constant-cache behaviour, but the divergent index destroys that. The result is a memory-bound kernel that profilers attribute to the L1 (which then looks contended), when the real cause is that a divergent index forced the access off the scalar path. The fix is one of: (a) restructure the data so the indexed value is wave-uniform (often by hoisting the load outside the divergent code), (b) prove the index is wave-uniform via WaveReadLaneFirst(i) if the algorithm allows, or (c) move the table to a typed Buffer<float4> view designed for the vector cache so the access pattern matches the binding.

The rule shares the uniformity machinery with the locked wave-active-all-equal-precheck rule (per ADR 0011). The diagnostic distinguishes between "definitely divergent" and "could be divergent" indices and only fires on the former to keep false-positive rate manageable. On [NonUniformResourceIndex]-marked accesses the rule is silent — that case is handled by non-uniform-resource-index.

Examples

Bad

hlsl
ConstantBuffer<MaterialParams> g_MaterialTable[64] : register(b0);  // uniform binding

float4 ps_main(float3 worldPos : POSITION, uint matId : MAT_ID) : SV_Target {
    // matId varies per pixel — divergent index on a uniform resource binding.
    // Forces every lane onto the vector cache path; loses the K$ fast path
    // that the ConstantBuffer view was bound for.
    MaterialParams p = g_MaterialTable[matId];
    return shade(worldPos, p);
}

Good

hlsl
// Move the table to a typed buffer view designed for divergent vector loads,
// which routes through L1 instead of the K$ and avoids the broken-fast-path tax.
StructuredBuffer<MaterialParams> g_MaterialTable : register(t0);

float4 ps_main(float3 worldPos : POSITION, uint matId : MAT_ID) : SV_Target {
    MaterialParams p = g_MaterialTable[matId];
    return shade(worldPos, p);
}

// Or, if the algorithm permits, reduce to a wave-uniform index first:
float4 ps_main_uniform(float3 worldPos : POSITION, uint matId : MAT_ID) : SV_Target {
    uint uniformMatId = WaveReadLaneFirst(matId);  // safe only if all lanes agree
    MaterialParams p = g_MaterialTable[uniformMatId];
    return shade(worldPos, p);
}

Options

none

Fix availability

suggestion — Three valid fixes exist (resource-view change, hoist-and-uniformise, or accept the cost). The diagnostic identifies the divergent index expression and the uniform binding so the author can choose; no automated rewrite is offered.

See also


Edit this page

© 2026 NelCit, CC-BY-4.0.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).