Skip to content

redundant-precision-cast

Status: stub. The full-length analysis is queued for a v1.0.x patch release per ADR 0018, section 5, criterion #6. The companion rule page at docs/rules/redundant-precision-cast.md contains the canonical detection logic + GPU reasoning.

TL;DR

Each type conversion — v_cvt_f32_i32, v_cvt_i32_f32, v_cvt_f32_f16, v_cvt_f16_f32 on RDNA; the equivalent FCONV/I2F/F2I family on Turing and Xe-HPG — is a real ALU instruction. Pairs of such instructions in a round-trip pattern consume two instruction-issue slots and two VGPR reads/writes. On RDNA 3, conversion instructions execute in the VALU pipeline at full throughput, so a two-instruction round-trip costs two cycles of VALU occupancy per lane — identical in cost to two FP32 multiplies — for zero arithmetic progress.

What the rule fires on

Nested cast expressions that form precision-degrading or no-op round-trips. Three specific patterns are detected:

See the What it detects section of the rule page for the full pattern definition.

Why it matters

The full GPU-mechanism analysis lives in the Why it matters on a GPU section of the companion rule page.

Examples

The bad / good code snippets are kept canonical on the rule page; see redundant-precision-cast.md -> Examples.

See also


This is a v1.0-ship stub. Full analysis pending; track issue link TBD.

TODO: category-overview missing for misc; linked overview is the closest sibling.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).