pow-const-squared
What it detects
Calls to pow(x, 2.0) — or pow(x, 2) — where the exponent is a literal integer or floating-point constant equal to 2. The rule fires on any expression tree whose second argument is a numeric literal that evaluates to exactly 2.0 after type coercion. It does not fire when the exponent is a variable, a constant buffer field, or any non-literal expression.
Why it matters on a GPU
pow(x, n) is not compiled to repeated multiplication. On all current GPU hardware — AMD RDNA/RDNA2/RDNA3, NVIDIA Turing/Ada Lovelace, Intel Xe-HPG — the pow intrinsic lowers to a transcendental sequence: exp2(n * log2(x)). That sequence occupies a transcendental (or TALU) execution slot, which typically runs at one-quarter the throughput of the standard ALU. On RDNA 3, for example, v_log_f32 and v_exp_f32 each issue at 1/4 of the VALU rate, so a single pow(x, 2.0) carries a cost of roughly 4 full-rate cycles — identical to pow(x, 37.5).
x * x is a single VALU multiply. On every architecture listed above, an FP32 multiply issues at full VALU rate: one cycle per instruction (throughput, not latency). The absolute cost reduction per invocation is small, but shaders run millions of times per frame. In a PBR fragment shader, the Schlick Fresnel approximation pow(1 - NdotV, 5) is evaluated per-pixel, per-material, often in a deferred lighting pass that covers the entire G-buffer. Even the exponent-2 sub-term (1 - NdotV) * (1 - NdotV) — the first step in a hand-unrolled Schlick — appears as pow(x, 2.0) frequently in shader code generated by offline material compilers or ported from reference implementations.
Beyond raw throughput, log2(x) is undefined for x <= 0, so pow(x, 2.0) introduces a latent NaN risk that x * x does not. Replacing the call also removes a dependency on the transcendental pipeline, which on some architectures shares execution resources with sin, cos, exp, and rcp — reducing contention when those intrinsics appear in the same basic block.
Examples
Bad
// From tests/fixtures/phase2/math.hlsl — HIT(pow-to-mul)
float pow_squared(float x) {
return pow(x, 2.0);
}
// Common in Schlick Fresnel implementations:
float3 fresnel_schlick(float n_dot_v, float3 f0) {
// HIT(pow-to-mul): pow(x, 5.0) involves repeated squaring;
// the x^2 sub-expression is often written as pow(x, 2.0) separately.
float k = pow(1.0 - n_dot_v, 5.0);
return f0 + (1.0 - f0) * k;
}Good
// After machine-applicable fix:
float pow_squared(float x) {
return x * x;
}
// Schlick with the squaring unrolled manually (or via pow-integer-decomposition fix):
float3 fresnel_schlick(float n_dot_v, float3 f0) {
float v = 1.0 - n_dot_v;
float v2 = v * v;
float k = v2 * v2 * v; // (1 - NdotV)^5
return f0 + (1.0 - f0) * k;
}Options
none — this rule has no configurable thresholds. To silence it on a specific call site, use inline suppression:
// shader-clippy: allow(pow-to-mul)
float k = pow(x, 2.0);To silence it project-wide, add to .shader-clippy.toml:
[rules]
pow-to-mul = "allow"Fix availability
machine-applicable — Replacing pow(x, 2.0) with x * x is a pure textual substitution. The result is semantically identical for all finite positive values of x; the NaN behaviour for x <= 0 is actually improved (fewer NaN sources). shader-clippy fix applies it without human confirmation.
See also
- Related rule: pow-integer-decomposition — generalises to
pow(x, 3.0),pow(x, 4.0),pow(x, 5.0) - Related rule: pow-base-two-to-exp2 — handles
pow(2.0, x)→exp2(x) - HLSL intrinsic reference:
pow,exp2,log2in the DirectX HLSL Intrinsics documentation - Companion blog post: Where the cycles go: pow(x, 2.0)