clamp01-to-saturate

Status: shipped (Phase 2) — see CHANGELOG.

What it detects

Calls to clamp(x, 0.0, 1.0) — or the equivalent integer-literal form clamp(x, 0, 1) — where both the lower and upper bounds are compile-time constants equal to exactly 0.0 and 1.0 respectively, after type coercion. The rule fires on scalar, vector, and matrix operands. It does not fire when either bound is a variable, a constant buffer field, or any expression that is not a numeric literal, even if the value happens to evaluate to 0 or 1 at runtime.

Why it matters on a GPU

clamp(x, 0.0, 1.0) is defined by the HLSL specification as min(max(x, 0.0), 1.0). Without further optimisation, the compiler lowers this to two separate ALU instructions: a max (or v_max_f32 on RDNA, FMAX on NVIDIA) followed by a min (v_min_f32, FMIN). Each instruction occupies a VALU issue slot. Even if the two ops are scheduled back-to-back, they represent a real two-cycle dependency chain on any hardware where a VALU result must be written back to the VGPR file before the next instruction can read it.

saturate(x), by contrast, lowers to the zero-cost _clamp output modifier on AMD RDNA, RDNA 2, and RDNA 3: the hardware clamps the result of whatever instruction produced x during register writeback, consuming no extra VALU cycles. On NVIDIA Turing and Ada Lovelace, the .sat instruction modifier provides the same guarantee — the clamp is performed by the warp execution units during result commit, not by a separate instruction. This means the clamp(x, 0.0, 1.0) form pays two ALU instructions where saturate pays zero. On Intel Xe-HPG, the compiler similarly has a dedicated saturate modifier path that a min(max(...)) sequence may not fold into, depending on context.

In practice, the clamp(x, 0.0, 1.0) form appears frequently in shader code ported from OpenGL/GLSL (where clamp is the canonical idiom), in code generated by material graph compilers, and in hand-authored PBR shaders where authors are unfamiliar with saturate. In a fragment shader that tone-maps an HDR value per channel — three clamp calls per pixel, at 4K resolution, 60 Hz — the difference is roughly 3 × 2 = 6 wasted VALU instructions per pixel thread per frame versus 3 × 0 with saturate. The absolute saving per invocation is small but scales linearly with wave count.

Examples

Bad

hlsl

// From tests/fixtures/phase2/redundant.hlsl, lines 14-16
// HIT(clamp01-to-saturate): clamp(x, 0, 1) is saturate.
float clamp_zero_one(float x) {
    return clamp(x, 0.0, 1.0);
}

// From tests/fixtures/phase2/redundant.hlsl, lines 18-21
// HIT(clamp01-to-saturate): clamp(x, 0.0, 1.0) is saturate.
float clamp_zero_one_explicit(float x) {
    return clamp(x, 0.0, 1.0);
}

// Vector form also triggers the rule:
float3 tone_map(float3 hdr) {
    return clamp(hdr / (1.0 + hdr), 0.0, 1.0);
}

Good

hlsl

// After machine-applicable fix:
float clamp_zero_one(float x) {
    return saturate(x);
}

float clamp_zero_one_explicit(float x) {
    return saturate(x);
}

float3 tone_map(float3 hdr) {
    return saturate(hdr / (1.0 + hdr));
}

Options

none

Fix availability

machine-applicable — Replacing clamp(x, 0.0, 1.0) with saturate(x) is a pure textual substitution. The two forms are semantically identical for all finite and non-finite float inputs, including NaN (both clamp to [0, 1]). shader-clippy fix applies it automatically.

clamp01-to-saturate ​

What it detects ​

Why it matters on a GPU ​

Examples ​

Bad ​

Good ​

Options ​

Fix availability ​

See also ​