pow-const-squared

What it detects

Calls to pow(x, 2.0) — or pow(x, 2) — where the exponent is a literal integer or floating-point constant equal to 2. The rule fires on any expression tree whose second argument is a numeric literal that evaluates to exactly 2.0 after type coercion. It does not fire when the exponent is a variable, a constant buffer field, or any non-literal expression.

Why it matters on a GPU

pow(x, n) is not compiled to repeated multiplication. On all current GPU hardware — AMD RDNA/RDNA2/RDNA3, NVIDIA Turing/Ada Lovelace, Intel Xe-HPG — the pow intrinsic lowers to a transcendental sequence: exp2(n * log2(x)). That sequence occupies a transcendental (or TALU) execution slot, which typically runs at one-quarter the throughput of the standard ALU. On RDNA 3, for example, v_log_f32 and v_exp_f32 each issue at 1/4 of the VALU rate, so a single pow(x, 2.0) carries a cost of roughly 4 full-rate cycles — identical to pow(x, 37.5).

x * x is a single VALU multiply. On every architecture listed above, an FP32 multiply issues at full VALU rate: one cycle per instruction (throughput, not latency). The absolute cost reduction per invocation is small, but shaders run millions of times per frame. In a PBR fragment shader, the Schlick Fresnel approximation pow(1 - NdotV, 5) is evaluated per-pixel, per-material, often in a deferred lighting pass that covers the entire G-buffer. Even the exponent-2 sub-term (1 - NdotV) * (1 - NdotV) — the first step in a hand-unrolled Schlick — appears as pow(x, 2.0) frequently in shader code generated by offline material compilers or ported from reference implementations.

Beyond raw throughput, log2(x) is undefined for x <= 0, so pow(x, 2.0) introduces a latent NaN risk that x * x does not. Replacing the call also removes a dependency on the transcendental pipeline, which on some architectures shares execution resources with sin, cos, exp, and rcp — reducing contention when those intrinsics appear in the same basic block.

Examples

Bad

hlsl

// From tests/fixtures/phase2/math.hlsl — HIT(pow-to-mul)
float pow_squared(float x) {
    return pow(x, 2.0);
}

// Common in Schlick Fresnel implementations:
float3 fresnel_schlick(float n_dot_v, float3 f0) {
    // HIT(pow-to-mul): pow(x, 5.0) involves repeated squaring;
    // the x^2 sub-expression is often written as pow(x, 2.0) separately.
    float k = pow(1.0 - n_dot_v, 5.0);
    return f0 + (1.0 - f0) * k;
}

Good

hlsl

// After machine-applicable fix:
float pow_squared(float x) {
    return x * x;
}

// Schlick with the squaring unrolled manually (or via pow-integer-decomposition fix):
float3 fresnel_schlick(float n_dot_v, float3 f0) {
    float v  = 1.0 - n_dot_v;
    float v2 = v * v;
    float k  = v2 * v2 * v;   // (1 - NdotV)^5
    return f0 + (1.0 - f0) * k;
}

Options

none — this rule has no configurable thresholds. To silence it on a specific call site, use inline suppression:

hlsl

// shader-clippy: allow(pow-to-mul)
float k = pow(x, 2.0);

To silence it project-wide, add to .shader-clippy.toml:

toml

[rules]
pow-to-mul = "allow"

Fix availability

machine-applicable — Replacing pow(x, 2.0) with x * x is a pure textual substitution. The result is semantically identical for all finite positive values of x; the NaN behaviour for x <= 0 is actually improved (fewer NaN sources). shader-clippy fix applies it without human confirmation.

pow-const-squared ​

What it detects ​

Why it matters on a GPU ​

Examples ​

Bad ​

Good ​

Options ​

Fix availability ​

See also ​

pow-const-squared

What it detects

Why it matters on a GPU

Examples

Bad

Good

Options

Fix availability

See also