Query Tuning Report – SQLStorm.RankedPosts (QueryHash 0xCB06A3190DEA642C) – 2026-05-17

Tuning Goal: Query Tuner
Server: RockyPC  |  Version: SQL Server 2022 (16.0.1180.1) Developer Edition (64-bit), Enterprise engine  |  Database: SQLStorm
Estimated Plan Cost: 201.56 Critical  |  Observed runtime: ~40s, 2,105 rows

Executive Summary

The query has an estimated subtree cost of 201.56 (Critical band). The dominant cost is concentrated in a single operator subtree: the Nested Loops with a LIKE '%' + t.TagName + '%' predicate against dbo.Posts.Tags, contributing ~193 of the 201 cost units (≈96%). The remaining plan (the user/votes/badges aggregation) is well-shaped and costs only ~7 units.

Additionally, the final LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId joins UserId to PostId, which is almost certainly a logic bug — those ID domains do not match. This means the RankedPosts CTE is computed at enormous cost (~104K rows × 1,232 tags via a non-sargable wildcard scan) only to produce essentially incidental matches in the final result. Fixing this single defect is the largest available win.

  1. Rewrite the Posts–Tags relationship to remove the leading-wildcard LIKE join (use STRING_SPLIT or a normalized PostTags table). Critical
  2. Correct the final join predicatetp.UserId = rp.PostId is semantically wrong; join on OwnerUserId or remove the join. Critical
  3. Add targeted indexes on Posts(OwnerUserId,PostTypeId), Votes(PostId) INCLUDE(VoteTypeId), and Badges(UserId). High
  4. Pre-aggregate Votes and Badges per user before joining, to remove the cross-product effect feeding COUNT(DISTINCT). Medium
  5. Remove unnecessary SELECT DISTINCT after the rewrite to avoid the final Hash Aggregate / Sort spill risk. Low

Top Findings (Evidence)

Detailed Prioritized Recommendations

1. Eliminate the LIKE '%' + TagName + '%' join on Posts.Tags Critical

Why: Posts.Tags is a delimited string column (e.g. <sql-server><tuning>). Joining to Tags.TagName with a double-wildcard LIKE is non-sargable: the optimizer cannot seek, must scan all 246,672 posts, and re-evaluates the predicate per tag (1,232 tags) using a Table Spool. This is the root cost.

Before (current):

FROM Posts p
JOIN Tags t ON p.Tags LIKE '%' + t.TagName + '%'
WHERE p.PostTypeId = 1

After — Option A: use STRING_SPLIT with the < > delimiter (SQL 2022 supports ordinal):

FROM Posts p
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(p.Tags,'>',''),'<','|'), '|') s
JOIN  Tags t ON t.TagName = s.value
WHERE p.PostTypeId = 1
  AND s.value <> '';

After — Option B (best): normalize via a PostTags bridge table (one-time materialization). This converts the LIKE pattern join into a hash/merge join on integer keys.

Confidence: High on the impact (≥80% cost reduction for this subtree). Risk: Option B requires a one-time backfill and a maintenance trigger or scheduled rebuild; Option A is zero-schema but still scans Posts once.

2. Correct the final join predicate Critical

LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId joins user identifiers to post identifiers. This is almost certainly unintended. Two valid interpretations:

Recommended rewrite (latest post per user):

WITH UserLatestPost AS (
    SELECT p.OwnerUserId, p.Id AS PostId, p.Title, p.CreationDate,
           ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId ORDER BY p.CreationDate DESC) AS rn
    FROM Posts p
    WHERE p.PostTypeId = 1
)
SELECT ... FROM TopUsers tp
LEFT JOIN UserLatestPost rp ON rp.OwnerUserId = tp.UserId AND rp.rn = 1;

This removes Tags from the plan entirely if tag context is not actually required by the report. Confidence: High.

3. Pre-aggregate Votes & Badges before joining Medium

The current shape multiplies rows: Users ⨝ Posts ⨝ Votes ⨝ Badges produces ≈1.7M rows feeding a Hash Aggregate with COUNT(DISTINCT). Pre-aggregating to one row per user keeps cardinality low and eliminates the need for DISTINCT.

WITH PostAgg AS (
    SELECT p.OwnerUserId, COUNT(*) AS TotalPosts
    FROM Posts p WHERE p.OwnerUserId IS NOT NULL
    GROUP BY p.OwnerUserId
),
VoteAgg AS (
    SELECT p.OwnerUserId,
           SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
           SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
    FROM Posts p
    JOIN Votes v ON v.PostId = p.Id
    GROUP BY p.OwnerUserId
),
BadgeAgg AS (
    SELECT b.UserId, COUNT(*) AS BadgeCount
    FROM Badges b GROUP BY b.UserId
)
SELECT ...
FROM Users u
LEFT JOIN PostAgg  pa ON pa.OwnerUserId = u.Id
LEFT JOIN VoteAgg  va ON va.OwnerUserId = u.Id
LEFT JOIN BadgeAgg ba ON ba.UserId      = u.Id
WHERE pa.TotalPosts > 10;

Confidence: High. Expected reduction of intermediate row volume by ~10×.

4. Targeted supporting indexes High

The following indexes directly serve the rewritten query and align with DMV missing-index hints.

IndexEst. StorageRead BenefitWrite OverheadMaintenance Notes / Risk
IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title ~28–34 MB ~55–70% Low–Medium (Posts is moderate write) Supports PostAgg, UserLatestPost CTE, and Posts→Users join. PAGE compression recommended. Use ONLINE = ON.
IX_Votes_PostId_INC_VoteTypeId ~16–20 MB ~40–55% Medium (Votes is write-heavy) Replaces full Votes scan (926K rows). Watch insert hotspots; PAGE compression.
IX_Badges_UserId ~6–8 MB ~30–45% Low Tiny; high benefit for BadgeAgg. Aligns with DMV impact 21.41%.
IX_Posts_PostTypeId_INC_OwnerUserId_CreationDate_Title (only if Option A LIKE rewrite retained) ~26–32 MB ~25–35% (filter only) Low–Medium Replaces the 399 MB clustered scan with a narrower scan; does not eliminate LIKE cost.

Risk assessment: Posts and Votes are large; each new index adds insert/update cost and a small extra log volume. Build with ONLINE = ON, DATA_COMPRESSION = PAGE. Combined estimated added storage is ~60–90 MB — acceptable on this database.

5. Remove SELECT DISTINCT and the trailing ORDER BY on a left-joined column Low

Once joins are corrected, DISTINCT is no longer needed and the final Hash Match Aggregate (NodeId 2, est. 67,958 rows) plus Sort (NodeId 1) can be removed. Also, ORDER BY tp.UserRank, rp.CreationDate DESC orders by a column that can be NULL from the LEFT JOIN — keep UserRank only or add NULLS LAST semantics via CASE if needed.

6. Statistics & parameter considerations Low

Table-Specific Insights

TableRowsObserved UsageInsight
dbo.Posts740,016 (399 MB)170 scans, 20 seeksDominated by scans; large row size (avg 2051 bytes). Highest-leverage table for new indexes.
dbo.Votes926,084 (12 MB)93 scans, 0 seeksZero seeks indicates no non-clustered support on PostId or UserId. Strong DMV hints confirm.
dbo.Users534,386 (44 MB)120 scans, 36 seeksUsed as the driving aggregation key — well-served by clustered PK once Posts/Votes/Badges are indexed.
dbo.Badges439,352 (8 MB)43 scans, 0 seeksSmall and write-light — safe to add UserId index.
dbo.Tags1,2324 scansTiny; only problematic because of how it's joined (LIKE %tag%).

Validation Steps

  1. Capture a baseline: SET STATISTICS IO ON; SET STATISTICS TIME ON; then run the original query, save outputs.
  2. Apply scripts in order: indexes first, then rewritten query.
  3. Re-run with STATISTICS IO/TIME and compare logical reads on Posts/Votes, CPU time, and elapsed time. Target: >5× reduction in elapsed time, >10× reduction in logical reads on Posts.
  4. Inspect the new actual plan for absence of the Nested Loops + LIKE predicate and absence of Table Spool.
  5. Verify result correctness by comparing row counts and a sample of UserRank, DisplayName, TotalPosts values against the original (or document intended semantic change for the join fix).

Scripts

Script 1 — Baseline measurement (run before changes)

USE [SQLStorm];
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SET NOCOUNT ON;
GO
-- Run the original query here and capture STATISTICS IO/TIME output and the actual plan.
-- (Paste the original query in this batch when executing the baseline.)
GO
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

Script 2 — Create supporting indexes (Recommendation #4)

USE [SQLStorm];
GO
-- Posts: support OwnerUserId joins, PostTypeId filter, and latest-post lookup.
CREATE NONCLUSTERED INDEX [IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title]
ON [dbo].[Posts] ([OwnerUserId], [PostTypeId])
INCLUDE ([CreationDate], [Title])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 90, SORT_IN_TEMPDB = ON);
GO

-- Votes: support PostId join with VoteTypeId aggregation.
CREATE NONCLUSTERED INDEX [IX_Votes_PostId_INC_VoteTypeId]
ON [dbo].[Votes] ([PostId])
INCLUDE ([VoteTypeId])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 95, SORT_IN_TEMPDB = ON);
GO

-- Badges: support UserId aggregation.
CREATE NONCLUSTERED INDEX [IX_Badges_UserId]
ON [dbo].[Badges] ([UserId])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 95, SORT_IN_TEMPDB = ON);
GO

-- Optional: Posts secondary covering index for PostTypeId-only scans (keep only if Option A LIKE rewrite is retained).
-- CREATE NONCLUSTERED INDEX [IX_Posts_PostTypeId_INC_OwnerUserId_CreationDate_Title]
-- ON [dbo].[Posts] ([PostTypeId])
-- INCLUDE ([OwnerUserId], [CreationDate], [Title])
-- WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 90, SORT_IN_TEMPDB = ON);
-- GO

Script 3 — Rewritten query (Recommendations #1, #2, #3, #5)

USE [SQLStorm];
GO
-- Rewritten version: pre-aggregates, removes LIKE join, fixes UserId/PostId mismatch by
-- joining latest post per user via OwnerUserId. Adjust if the original intent was latest-per-tag.
WITH UserLatestPost AS (
    SELECT  p.OwnerUserId,
            p.Id            AS PostId,
            p.Title,
            p.CreationDate,
            ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId
                               ORDER BY p.CreationDate DESC) AS rn
    FROM    dbo.Posts p
    WHERE   p.PostTypeId  = 1
      AND   p.OwnerUserId IS NOT NULL
),
PostAgg AS (
    SELECT  p.OwnerUserId, COUNT(*) AS TotalPosts
    FROM    dbo.Posts p
    WHERE   p.OwnerUserId IS NOT NULL
    GROUP BY p.OwnerUserId
),
VoteAgg AS (
    SELECT  p.OwnerUserId,
            SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
            SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
    FROM    dbo.Posts p
    JOIN    dbo.Votes v ON v.PostId = p.Id
    GROUP BY p.OwnerUserId
),
BadgeAgg AS (
    SELECT  b.UserId, COUNT(*) AS BadgeCount
    FROM    dbo.Badges b
    GROUP BY b.UserId
),
TopUsers AS (
    SELECT  u.Id          AS UserId,
            u.DisplayName,
            pa.TotalPosts,
            ISNULL(va.UpVotes,   0) AS UpVotes,
            ISNULL(va.DownVotes, 0) AS DownVotes,
            ISNULL(ba.BadgeCount,0) AS BadgeCount,
            DENSE_RANK() OVER (ORDER BY pa.TotalPosts DESC,
                                       ISNULL(va.UpVotes,0) - ISNULL(va.DownVotes,0) DESC) AS UserRank
    FROM    dbo.Users u
    JOIN    PostAgg   pa ON pa.OwnerUserId = u.Id
    LEFT JOIN VoteAgg  va ON va.OwnerUserId = u.Id
    LEFT JOIN BadgeAgg ba ON ba.UserId      = u.Id
    WHERE   pa.TotalPosts > 10
)
SELECT  tp.UserRank,
        tp.DisplayName,
        tp.TotalPosts,
        tp.UpVotes,
        tp.DownVotes,
        tp.BadgeCount,
        rp.Title        AS LatestPostTitle,
        rp.CreationDate AS LatestPostDate,
        CASE WHEN rp.rn = 1 THEN 'Latest' ELSE 'Older' END AS PostStatus
FROM    TopUsers tp
LEFT JOIN UserLatestPost rp
       ON rp.OwnerUserId = tp.UserId
      AND rp.rn = 1
ORDER BY tp.UserRank;
GO

Script 4 — Alternative rewrite preserving Tags semantics via STRING_SPLIT (Recommendation #1, Option A)

USE [SQLStorm];
GO
-- Use this variant only if the original "latest post per Tag" semantics are required.
-- Replaces the non-sargable LIKE join with STRING_SPLIT against the Tags column.
WITH RankedPosts AS (
    SELECT  p.Id           AS PostId,
            p.Title,
            p.CreationDate,
            t.Id            AS TagId,
            t.TagName,
            ROW_NUMBER() OVER (PARTITION BY t.Id ORDER BY p.CreationDate DESC) AS rn
    FROM    dbo.Posts p
    CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(p.Tags, '>', ''), '<', '|'), '|') s
    JOIN    dbo.Tags t ON t.TagName = s.value
    WHERE   p.PostTypeId = 1
      AND   s.value <> ''
)
SELECT TagId, TagName, PostId, Title, CreationDate
FROM   RankedPosts
WHERE  rn = 1
ORDER BY TagName;
GO

Script 5 — Optional: enable async auto-update stats (Recommendation #6)

USE [master];
GO
ALTER DATABASE [SQLStorm] SET AUTO_UPDATE_STATISTICS_ASYNC ON WITH NO_WAIT;
GO

Script 6 — Post-change validation

USE [SQLStorm];
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SET NOCOUNT ON;
GO
-- Re-run the rewritten query (Script 3) here and capture STATISTICS IO/TIME + actual plan.
-- Compare logical reads on Posts/Votes/Badges and elapsed time against the baseline (Script 1).
GO
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

Script 7 — Rollback (drop new indexes if needed)

USE [SQLStorm];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title' AND object_id = OBJECT_ID('dbo.Posts'))
    DROP INDEX [IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title] ON [dbo].[Posts];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Votes_PostId_INC_VoteTypeId' AND object_id = OBJECT_ID('dbo.Votes'))
    DROP INDEX [IX_Votes_PostId_INC_VoteTypeId] ON [dbo].[Votes];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Badges_UserId' AND object_id = OBJECT_ID('dbo.Badges'))
    DROP INDEX [IX_Badges_UserId] ON [dbo].[Badges];
GO