Query Tuning Report – SQLStorm.RankedPosts (QueryHash 0xCB06A3190DEA642C) – 2026-05-17

Tuning Goal: Query Tuner
Server: RockyPC | Version: SQL Server 2022 (16.0.1180.1) Developer Edition (64-bit), Enterprise engine | Database: SQLStorm
Estimated Plan Cost: 201.56 Critical | Observed runtime: ~40s, 2,105 rows

Executive Summary

The query has an estimated subtree cost of 201.56 (Critical band). The dominant cost is concentrated in a single operator subtree: the Nested Loops with a LIKE '%' + t.TagName + '%' predicate against dbo.Posts.Tags, contributing ~193 of the 201 cost units (≈96%). The remaining plan (the user/votes/badges aggregation) is well-shaped and costs only ~7 units.

Additionally, the final LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId joins UserId to PostId, which is almost certainly a logic bug — those ID domains do not match. This means the RankedPosts CTE is computed at enormous cost (~104K rows × 1,232 tags via a non-sargable wildcard scan) only to produce essentially incidental matches in the final result. Fixing this single defect is the largest available win.

Rewrite the Posts–Tags relationship to remove the leading-wildcard LIKE join (use STRING_SPLIT or a normalized PostTags table). Critical
Correct the final join predicate — tp.UserId = rp.PostId is semantically wrong; join on OwnerUserId or remove the join. Critical
Add targeted indexes on Posts(OwnerUserId,PostTypeId), Votes(PostId) INCLUDE(VoteTypeId), and Badges(UserId). High
Pre-aggregate Votes and Badges per user before joining, to remove the cross-product effect feeding COUNT(DISTINCT). Medium
Remove unnecessary SELECT DISTINCT after the rewrite to avoid the final Hash Aggregate / Sort spill risk. Low

Top Findings (Evidence)

Non-sargable LIKE drives 96% of cost.
Nested Loops (NodeId 8) EstimateCPU="67.2275", subtree cost 193.282; predicate Posts.Tags LIKE '%' + Tags.TagName + '%' with a Table Spool (NodeId 10) rewound 104,435 times.
Wrong-domain join.
Query text: LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId. UserId is a dbo.Users.Id domain (267,193 distinct); PostId belongs to dbo.Posts.Id (246,672 distinct). Overlap of value ranges is coincidental.
Posts full scan.
Clustered Index Scan on PK__Posts__3214EC07AF21AC3E (NodeId 9) reads EstimatedRowsRead="246672", AvgRowSize 2051, table size 399 MB. No usable index for the PostTypeId=1 predicate or for the tag join.
COUNT(DISTINCT) over a join cross-product.
Hash Match Aggregate (NodeId 18) ingests ~1,700,990 rows because Posts ⨝ Votes ⨝ Badges multiplies row counts before COUNT(DISTINCT p.Id) / COUNT(DISTINCT b.Id) deduplicate.
Memory grant pressure risk.
SerialDesiredMemory="1224792" KB requested; multiple Sort/Hash operators with batch-mode and MemoryFractions across the plan; tempdb spill possible at higher concurrency.
Missing-index DMV hints align with rewrite targets.
DMV: Votes(PostId) INCLUDE(VoteTypeId) impact 19.36%; Posts(OwnerUserId) impact 13.58%; Badges(UserId) impact 21.41%.

Detailed Prioritized Recommendations

1. Eliminate the `LIKE '%' + TagName + '%'` join on Posts.Tags Critical

Why: Posts.Tags is a delimited string column (e.g. <sql-server><tuning>). Joining to Tags.TagName with a double-wildcard LIKE is non-sargable: the optimizer cannot seek, must scan all 246,672 posts, and re-evaluates the predicate per tag (1,232 tags) using a Table Spool. This is the root cost.

Before (current):

FROM Posts p
JOIN Tags t ON p.Tags LIKE '%' + t.TagName + '%'
WHERE p.PostTypeId = 1

After — Option A: use STRING_SPLIT with the < > delimiter (SQL 2022 supports ordinal):

FROM Posts p
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(p.Tags,'>',''),'<','|'), '|') s
JOIN  Tags t ON t.TagName = s.value
WHERE p.PostTypeId = 1
  AND s.value <> '';

After — Option B (best): normalize via a PostTags bridge table (one-time materialization). This converts the LIKE pattern join into a hash/merge join on integer keys.

Confidence: High on the impact (≥80% cost reduction for this subtree). Risk: Option B requires a one-time backfill and a maintenance trigger or scheduled rebuild; Option A is zero-schema but still scans Posts once.

2. Correct the final join predicate Critical

LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId joins user identifiers to post identifiers. This is almost certainly unintended. Two valid interpretations:

Latest post owned by the user: include OwnerUserId in RankedPosts and join tp.UserId = rp.OwnerUserId AND rp.rn = 1.
Latest post per tag (independent list): remove the join entirely and return the latest-per-tag list separately.

Recommended rewrite (latest post per user):

WITH UserLatestPost AS (
    SELECT p.OwnerUserId, p.Id AS PostId, p.Title, p.CreationDate,
           ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId ORDER BY p.CreationDate DESC) AS rn
    FROM Posts p
    WHERE p.PostTypeId = 1
)
SELECT ... FROM TopUsers tp
LEFT JOIN UserLatestPost rp ON rp.OwnerUserId = tp.UserId AND rp.rn = 1;

This removes Tags from the plan entirely if tag context is not actually required by the report. Confidence: High.

3. Pre-aggregate Votes & Badges before joining Medium

The current shape multiplies rows: Users ⨝ Posts ⨝ Votes ⨝ Badges produces ≈1.7M rows feeding a Hash Aggregate with COUNT(DISTINCT). Pre-aggregating to one row per user keeps cardinality low and eliminates the need for DISTINCT.

WITH PostAgg AS (
    SELECT p.OwnerUserId, COUNT(*) AS TotalPosts
    FROM Posts p WHERE p.OwnerUserId IS NOT NULL
    GROUP BY p.OwnerUserId
),
VoteAgg AS (
    SELECT p.OwnerUserId,
           SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
           SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
    FROM Posts p
    JOIN Votes v ON v.PostId = p.Id
    GROUP BY p.OwnerUserId
),
BadgeAgg AS (
    SELECT b.UserId, COUNT(*) AS BadgeCount
    FROM Badges b GROUP BY b.UserId
)
SELECT ...
FROM Users u
LEFT JOIN PostAgg  pa ON pa.OwnerUserId = u.Id
LEFT JOIN VoteAgg  va ON va.OwnerUserId = u.Id
LEFT JOIN BadgeAgg ba ON ba.UserId      = u.Id
WHERE pa.TotalPosts > 10;

Confidence: High. Expected reduction of intermediate row volume by ~10×.

4. Targeted supporting indexes High

The following indexes directly serve the rewritten query and align with DMV missing-index hints.

Index	Est. Storage	Read Benefit	Write Overhead	Maintenance Notes / Risk
`IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title`	~28–34 MB	~55–70%	Low–Medium (Posts is moderate write)	Supports PostAgg, UserLatestPost CTE, and Posts→Users join. PAGE compression recommended. Use `ONLINE = ON`.
`IX_Votes_PostId_INC_VoteTypeId`	~16–20 MB	~40–55%	Medium (Votes is write-heavy)	Replaces full Votes scan (926K rows). Watch insert hotspots; PAGE compression.
`IX_Badges_UserId`	~6–8 MB	~30–45%	Low	Tiny; high benefit for BadgeAgg. Aligns with DMV impact 21.41%.
`IX_Posts_PostTypeId_INC_OwnerUserId_CreationDate_Title` (only if Option A LIKE rewrite retained)	~26–32 MB	~25–35% (filter only)	Low–Medium	Replaces the 399 MB clustered scan with a narrower scan; does not eliminate LIKE cost.

Risk assessment: Posts and Votes are large; each new index adds insert/update cost and a small extra log volume. Build with ONLINE = ON, DATA_COMPRESSION = PAGE. Combined estimated added storage is ~60–90 MB — acceptable on this database.

5. Remove `SELECT DISTINCT` and the trailing ORDER BY on a left-joined column Low

Once joins are corrected, DISTINCT is no longer needed and the final Hash Match Aggregate (NodeId 2, est. 67,958 rows) plus Sort (NodeId 1) can be removed. Also, ORDER BY tp.UserRank, rp.CreationDate DESC orders by a column that can be NULL from the LEFT JOIN — keep UserRank only or add NULLS LAST semantics via CASE if needed.

6. Statistics & parameter considerations Low

All referenced stats are 100% sampled and current (latest update 2026-03-19). No statistics action required immediately.
The query has no parameters — Parameter Sensitive Plan (PSP) optimization and parameter sniffing are not applicable. Do not introduce a plan guide.
Consider enabling AUTO_UPDATE_STATISTICS_ASYNC on SQLStorm to avoid compile-time stalls on large tables (current: Disabled).
SQL Server 2022 Batch Mode on Rowstore is already engaged (BatchModeOnRowStoreUsed="true") — keep CE 160.

Table-Specific Insights

Table	Rows	Observed Usage	Insight
dbo.Posts	740,016 (399 MB)	170 scans, 20 seeks	Dominated by scans; large row size (avg 2051 bytes). Highest-leverage table for new indexes.
dbo.Votes	926,084 (12 MB)	93 scans, 0 seeks	Zero seeks indicates no non-clustered support on `PostId` or `UserId`. Strong DMV hints confirm.
dbo.Users	534,386 (44 MB)	120 scans, 36 seeks	Used as the driving aggregation key — well-served by clustered PK once Posts/Votes/Badges are indexed.
dbo.Badges	439,352 (8 MB)	43 scans, 0 seeks	Small and write-light — safe to add `UserId` index.
dbo.Tags	1,232	4 scans	Tiny; only problematic because of how it's joined (LIKE %tag%).

Validation Steps

Capture a baseline: SET STATISTICS IO ON; SET STATISTICS TIME ON; then run the original query, save outputs.
Apply scripts in order: indexes first, then rewritten query.
Re-run with STATISTICS IO/TIME and compare logical reads on Posts/Votes, CPU time, and elapsed time. Target: >5× reduction in elapsed time, >10× reduction in logical reads on Posts.
Inspect the new actual plan for absence of the Nested Loops + LIKE predicate and absence of Table Spool.
Verify result correctness by comparing row counts and a sample of UserRank, DisplayName, TotalPosts values against the original (or document intended semantic change for the join fix).

Scripts

Script 1 — Baseline measurement (run before changes)

USE [SQLStorm];
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SET NOCOUNT ON;
GO
-- Run the original query here and capture STATISTICS IO/TIME output and the actual plan.
-- (Paste the original query in this batch when executing the baseline.)
GO
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

Script 2 — Create supporting indexes (Recommendation #4)

USE [SQLStorm];
GO
-- Posts: support OwnerUserId joins, PostTypeId filter, and latest-post lookup.
CREATE NONCLUSTERED INDEX [IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title]
ON [dbo].[Posts] ([OwnerUserId], [PostTypeId])
INCLUDE ([CreationDate], [Title])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 90, SORT_IN_TEMPDB = ON);
GO

-- Votes: support PostId join with VoteTypeId aggregation.
CREATE NONCLUSTERED INDEX [IX_Votes_PostId_INC_VoteTypeId]
ON [dbo].[Votes] ([PostId])
INCLUDE ([VoteTypeId])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 95, SORT_IN_TEMPDB = ON);
GO

-- Badges: support UserId aggregation.
CREATE NONCLUSTERED INDEX [IX_Badges_UserId]
ON [dbo].[Badges] ([UserId])
WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 95, SORT_IN_TEMPDB = ON);
GO

-- Optional: Posts secondary covering index for PostTypeId-only scans (keep only if Option A LIKE rewrite is retained).
-- CREATE NONCLUSTERED INDEX [IX_Posts_PostTypeId_INC_OwnerUserId_CreationDate_Title]
-- ON [dbo].[Posts] ([PostTypeId])
-- INCLUDE ([OwnerUserId], [CreationDate], [Title])
-- WITH (ONLINE = ON, DATA_COMPRESSION = PAGE, FILLFACTOR = 90, SORT_IN_TEMPDB = ON);
-- GO

Script 3 — Rewritten query (Recommendations #1, #2, #3, #5)

USE [SQLStorm];
GO
-- Rewritten version: pre-aggregates, removes LIKE join, fixes UserId/PostId mismatch by
-- joining latest post per user via OwnerUserId. Adjust if the original intent was latest-per-tag.
WITH UserLatestPost AS (
    SELECT  p.OwnerUserId,
            p.Id            AS PostId,
            p.Title,
            p.CreationDate,
            ROW_NUMBER() OVER (PARTITION BY p.OwnerUserId
                               ORDER BY p.CreationDate DESC) AS rn
    FROM    dbo.Posts p
    WHERE   p.PostTypeId  = 1
      AND   p.OwnerUserId IS NOT NULL
),
PostAgg AS (
    SELECT  p.OwnerUserId, COUNT(*) AS TotalPosts
    FROM    dbo.Posts p
    WHERE   p.OwnerUserId IS NOT NULL
    GROUP BY p.OwnerUserId
),
VoteAgg AS (
    SELECT  p.OwnerUserId,
            SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
            SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
    FROM    dbo.Posts p
    JOIN    dbo.Votes v ON v.PostId = p.Id
    GROUP BY p.OwnerUserId
),
BadgeAgg AS (
    SELECT  b.UserId, COUNT(*) AS BadgeCount
    FROM    dbo.Badges b
    GROUP BY b.UserId
),
TopUsers AS (
    SELECT  u.Id          AS UserId,
            u.DisplayName,
            pa.TotalPosts,
            ISNULL(va.UpVotes,   0) AS UpVotes,
            ISNULL(va.DownVotes, 0) AS DownVotes,
            ISNULL(ba.BadgeCount,0) AS BadgeCount,
            DENSE_RANK() OVER (ORDER BY pa.TotalPosts DESC,
                                       ISNULL(va.UpVotes,0) - ISNULL(va.DownVotes,0) DESC) AS UserRank
    FROM    dbo.Users u
    JOIN    PostAgg   pa ON pa.OwnerUserId = u.Id
    LEFT JOIN VoteAgg  va ON va.OwnerUserId = u.Id
    LEFT JOIN BadgeAgg ba ON ba.UserId      = u.Id
    WHERE   pa.TotalPosts > 10
)
SELECT  tp.UserRank,
        tp.DisplayName,
        tp.TotalPosts,
        tp.UpVotes,
        tp.DownVotes,
        tp.BadgeCount,
        rp.Title        AS LatestPostTitle,
        rp.CreationDate AS LatestPostDate,
        CASE WHEN rp.rn = 1 THEN 'Latest' ELSE 'Older' END AS PostStatus
FROM    TopUsers tp
LEFT JOIN UserLatestPost rp
       ON rp.OwnerUserId = tp.UserId
      AND rp.rn = 1
ORDER BY tp.UserRank;
GO

Script 4 — Alternative rewrite preserving Tags semantics via STRING_SPLIT (Recommendation #1, Option A)

USE [SQLStorm];
GO
-- Use this variant only if the original "latest post per Tag" semantics are required.
-- Replaces the non-sargable LIKE join with STRING_SPLIT against the Tags column.
WITH RankedPosts AS (
    SELECT  p.Id           AS PostId,
            p.Title,
            p.CreationDate,
            t.Id            AS TagId,
            t.TagName,
            ROW_NUMBER() OVER (PARTITION BY t.Id ORDER BY p.CreationDate DESC) AS rn
    FROM    dbo.Posts p
    CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(p.Tags, '>', ''), '<', '|'), '|') s
    JOIN    dbo.Tags t ON t.TagName = s.value
    WHERE   p.PostTypeId = 1
      AND   s.value <> ''
)
SELECT TagId, TagName, PostId, Title, CreationDate
FROM   RankedPosts
WHERE  rn = 1
ORDER BY TagName;
GO

Script 5 — Optional: enable async auto-update stats (Recommendation #6)

USE [master];
GO
ALTER DATABASE [SQLStorm] SET AUTO_UPDATE_STATISTICS_ASYNC ON WITH NO_WAIT;
GO

Script 6 — Post-change validation

USE [SQLStorm];
GO
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SET NOCOUNT ON;
GO
-- Re-run the rewritten query (Script 3) here and capture STATISTICS IO/TIME + actual plan.
-- Compare logical reads on Posts/Votes/Badges and elapsed time against the baseline (Script 1).
GO
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
GO

Script 7 — Rollback (drop new indexes if needed)

USE [SQLStorm];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title' AND object_id = OBJECT_ID('dbo.Posts'))
    DROP INDEX [IX_Posts_OwnerUserId_PostTypeId_INC_CreationDate_Title] ON [dbo].[Posts];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Votes_PostId_INC_VoteTypeId' AND object_id = OBJECT_ID('dbo.Votes'))
    DROP INDEX [IX_Votes_PostId_INC_VoteTypeId] ON [dbo].[Votes];
GO
IF EXISTS (SELECT 1 FROM sys.indexes WHERE name = 'IX_Badges_UserId' AND object_id = OBJECT_ID('dbo.Badges'))
    DROP INDEX [IX_Badges_UserId] ON [dbo].[Badges];
GO