Skip to content

mesh-output-decl-exceeds-256

Status: shipped (Phase 3) — see CHANGELOG.

(via ADR 0007)

What it detects

A mesh-shader entry point whose out vertices or out indices array declarations exceed 256 elements in either dimension. The D3D12 mesh-pipeline specification caps both per-group output declarations at 256: a maximum of 256 vertices and 256 primitives. The rule constant-folds the array size literals on the out vertices / out indices parameters and fires when either exceeds 256 on a function annotated [shader("mesh")].

Why it matters on a GPU

Mesh-shader output buffers live in a fixed per-group slot allocated by the pipeline at workgroup launch time. On NVIDIA Turing and Ada Lovelace, the per-group output region is sized for 256 vertices and 256 primitives multiplied by the configured per-vertex output stride. On AMD RDNA 2/3, the mesh shader writes through a primitive-shader pipeline that uses an LDS-resident output region carved at the same cap. Intel Xe-HPG (Arc Alchemist, Battlemage) implements the same 256/256 ceiling as part of its mesh-pipeline conformance to the D3D12 spec.

Declaring more than 256 of either is a hard PSO-creation failure: D3D12CreateGraphicsPipelineState returns E_INVALIDARG. As with mesh-numthreads-over-128, catching this at lint time replaces a runtime "your PSO won't compile" error with a source-located diagnostic. The diagnostic includes the actual declared count so the author knows the magnitude of the over-shoot.

The fix is to reduce the meshlet size: typical production meshlets target 64 vertices / 124 primitives (the meshoptimizer convention) or 128 vertices / 128 primitives (the NVIDIA-recommended starting point). Larger nominal output capacities almost never improve culling effectiveness and waste output-region memory, which translates to lower wave occupancy on RDNA 2/3 because the LDS allocation per group goes up.

Examples

Bad

hlsl
// 512 vertices — over the 256 cap. PSO creation fails.
[shader("mesh")]
[numthreads(64, 1, 1)]
[outputtopology("triangle")]
void main(uint tid : SV_GroupThreadID,
          out vertices Vertex verts[512], // ERROR: exceeds 256 cap
          out indices  uint3  tris[124])
{
    /* ... */
}

Good

hlsl
// 64 verts / 124 prims — the meshoptimizer convention; well-supported on
// every IHV and leaves enough LDS headroom for high wave occupancy.
[shader("mesh")]
[numthreads(64, 1, 1)]
[outputtopology("triangle")]
void main(uint tid : SV_GroupThreadID,
          out vertices Vertex verts[64],
          out indices  uint3  tris[124])
{
    /* ... */
}

Options

none

Fix availability

none — Reducing the cap changes the meshlet packing and is a content-side decision. The diagnostic names the offending dimension(s) so the author can pick a new shape.

See also


Edit this page

© 2026 NelCit, CC-BY-4.0.

© 2026 NelCit — Apache-2.0 (code), CC-BY-4.0 (docs).