Skip to content

sample-use-no-interleave

Status: shipped (Phase 8) — see CHANGELOG.

What it detects

A Texture.Sample*() call whose result is consumed within the next 3 statements without intervening compute. The default heuristic uses a 3-statement sliding window over the enclosing compound block.

Why it matters on a GPU

NVIDIA Nsight surfaces this pattern as "Warp Stalled by L1 Long Scoreboard" -- texture fetches without enough work between sample and use cause warp stalls when the L1 cache misses (~150-300 cycles on Turing/Ada/Blackwell; ~120-200 on RDNA 2-4). Interleaving compute (or other independent work) between sample and use lets the scheduler hide the latency.

Examples

Bad

hlsl
float4 c = tex.Sample(ss, uv);
return c * 2.0; // immediate use

Good

hlsl
float4 c = tex.Sample(ss, uv);
float ax = a * b;        // independent compute
float bx = ax + c0;
return c * (ax + bx);    // hidden under sample latency

Options

none

Fix availability

suggestion — Hardware-specific; the rule is heuristic and may have false-positives.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).