redundant-saturate
Status: shipped (Phase 2) — see CHANGELOG.
What it detects
Calls of the form saturate(saturate(x)) where the outer saturate is applied to an expression already guaranteed to be in [0, 1] because it is itself a saturate call. The rule matches both the direct nested form — saturate(saturate(expr)) — and the split-variable form where a saturate result is stored in an intermediate variable and then passed to a second saturate (see lines 8-11 of tests/fixtures/phase2/redundant.hlsl). It does not fire when the argument to the outer saturate could originate from any source other than a prior saturate.
Why it matters on a GPU
On AMD RDNA, RDNA 2, and RDNA 3 hardware, saturate is not an independent instruction. It is an output modifier bit (_clamp) that is folded into whichever ALU instruction produces the value — an ADD, MUL, MAD, FMA, or similar — at zero additional cycle cost. The compiler can attach _clamp to the last instruction that writes the register and the hardware enforces the [0, 1] clamp during writeback with no extra cycles.
When the source is saturate(saturate(x)), the inner saturate can still be lowered to a free _clamp modifier on the instruction that produces x. The outer saturate, however, cannot be folded into the same instruction because its input is already a distinct value in a VGPR. The compiler must emit a separate real ALU instruction — typically v_max_f32 followed by v_min_f32, or a v_med3_f32 — to clamp the already-clamped value. On NVIDIA Turing and Ada Lovelace the situation is analogous: .sat is an instruction modifier, but a second .sat applied to the result of a previous .sat cannot share the same modifier slot and therefore materialises as a real FP32 clamp sequence. On Intel Xe-HPG the pattern similarly costs one extra ALU operation.
The fix eliminates the outer call entirely, reducing a two-instruction clamp sequence (or one wasted ALU op) to the zero-cost output modifier the inner saturate already provides. In pixel shaders that process HDR accumulation buffers and tone-map per channel — where saturate is called at multiple levels of a call graph — the redundant forms accumulate into a measurable instruction count increase.
Examples
Bad
// From tests/fixtures/phase2/redundant.hlsl, line 3-6
// HIT(redundant-saturate): saturate is idempotent.
float3 nested_saturate(float3 c) {
return saturate(saturate(c));
}
// From tests/fixtures/phase2/redundant.hlsl, lines 8-11
// HIT(redundant-saturate): saturate of an already-saturated value.
float3 nested_saturate_split(float3 c) {
float3 a = saturate(c);
return saturate(a);
}Good
// After machine-applicable fix — outer saturate dropped:
float3 nested_saturate(float3 c) {
return saturate(c);
}
float3 nested_saturate_split(float3 c) {
float3 a = saturate(c);
return a;
}Options
none
Fix availability
machine-applicable — The fix is a pure textual substitution with no observable semantic change. saturate is idempotent: for any x, saturate(saturate(x)) == saturate(x). shader-clippy fix applies it automatically.
See also
- Related rule: clamp01-to-saturate — replaces
clamp(x, 0.0, 1.0)withsaturate(x) - Related rule: redundant-abs — drops
absaround expressions proven non-negative, includingsaturateoutput - HLSL intrinsic reference:
saturatein the DirectX HLSL Intrinsics documentation - Companion blog post: saturate-redundancy overview
© 2026 NelCit, CC-BY-4.0.