AI SQL Tuner Recommendations

Tuning Goal: Query Tuner — Stored Procedure sp00060
Server: RockyPC
Database: SQLStorm
Version: SQL Server 2022 (16.0.1175.1)
Edition: Developer Edition (Enterprise features)

Executive Summary

199.87
Estimated Cost
Critical
Severity (≥100)
5
Issues Found
~5.6 GB
Est. Memory Grant
100%
Stats Sampled

This query has an estimated subtree cost of 199.87, placing it in the CRITICAL category. The dominant bottleneck is the Nested Loops with LIKE '%…%' join between Posts (246K rows) and Tags (1,232 rows), which effectively produces a cross join with per-row LIKE evaluation costing ~193 out of 199.87 (96.7%) of total plan cost. Additionally, the query contains a logical bug in the final LEFT JOIN that joins on the wrong column, and produces an inflated intermediate row set of 1.7M rows from a Cartesian-like expansion via Badges. The requested memory grant of ~5.6 GB will very likely spill to tempdb on most systems.

Cost Overview

Plan ComponentEst. Cost% of TotalSeverity
Nested Loops (Posts × Tags LIKE)193.2896.7%Critical
UserStats CTE (Hash Aggregates + Joins)5.472.7%Low
Final Hash Join + Sort1.120.6%Low

Top Priorities

  1. Critical Fix the LIKE '%…%' cross join pattern — The join p.Tags LIKE '%' + t.TagName + '%' forces a Nested Loops scan of 246K Posts × 1,232 Tags = ~304M comparisons. Replace with a relational tag-to-post mapping table or use full-text search.
  2. Critical Fix the logical bug in the final JOINLEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId incorrectly joins UserId to PostId. This should be ON tp.UserId = rp.PostOwnerUserId (or an appropriate column). The current join produces meaningless results.
  3. High Eliminate Cartesian expansion from Badges — The UserStats CTE joins Users → Posts → Votes → Badges with LEFT JOINs and COUNT(DISTINCT), producing 1.7M intermediate rows. Pre-aggregate each dimension separately before joining.
  4. Medium Add a covering index on Posts for PostTypeId filter — The clustered index scan of the entire 399MB Posts table could be narrowed with an index on PostTypeId that includes the necessary columns.
  5. Low Monitor memory grant — SerialDesiredMemory is 5.8GB; on real execution this will almost certainly spill to tempdb.

Detailed Prioritized Recommendations

1. Eliminate the LIKE '%…%' Cross Join Pattern

Critical Confidence: 98%

The LIKE '%' + t.TagName + '%' join predicate between Posts and Tags is the single largest cost driver. It is a non-SARGable predicate applied inside a Nested Loops join. The optimizer scans all 246,672 Posts rows (filtered to ~104K by PostTypeId=1), then for each row, evaluates the LIKE against all 1,232 Tags via a Lazy Spool. This results in approximately 128 million string comparisons.

Evidence: NodeId=8 Nested Loops Inner Join, EstimateRows=104,436, EstimateCPU=67.23, EstimatedTotalSubtreeCost=193.28. NodeId=10 Table Spool (Lazy Spool) with EstimateRewinds=104,435. Predicate: [Posts].[Tags] LIKE [Expr1020].

Option A: Create a PostTags Bridge Table (Recommended)

Since StackOverflow-style Posts store tags in a delimited string like <c#><.net><linq>, the most effective fix is to normalize this into a relational mapping table:

-- One-time ETL to create a proper many-to-many relationship
CREATE TABLE dbo.PostTags (
    PostId   INT NOT NULL,
    TagId    INT NOT NULL,
    CONSTRAINT PK_PostTags PRIMARY KEY CLUSTERED (PostId, TagId),
    CONSTRAINT FK_PostTags_Posts FOREIGN KEY (PostId) REFERENCES dbo.Posts(Id),
    CONSTRAINT FK_PostTags_Tags  FOREIGN KEY (TagId)  REFERENCES dbo.Tags(Id)
);

-- Create reverse index for tag-centric lookups
CREATE NONCLUSTERED INDEX IX_PostTags_TagId
ON dbo.PostTags (TagId, PostId)
WITH (DATA_COMPRESSION = PAGE);

-- Populate (one-time)
INSERT INTO dbo.PostTags (PostId, TagId)
SELECT DISTINCT p.Id, t.Id
FROM dbo.Posts p
CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(p.Tags, '<', ''), '>', ' '), ' ') s
INNER JOIN dbo.Tags t ON t.TagName = LTRIM(RTRIM(s.value))
WHERE s.value <> '' AND p.Tags IS NOT NULL;
MetricValue
estimatedStorageBytes~7 MB (PK) + ~7 MB (IX) = ~14 MB total
expectedReadBenefitPercent95–99% cost reduction for tag-related joins
expectedWriteOverheadPercentNegligible for reads; inserts/updates to Posts need ETL sync
maintenanceNotesRequires trigger or application-layer logic to keep in sync with Posts.Tags column
Risk: Requires a schema change and ongoing maintenance to keep PostTags synchronized. However, this eliminates the fundamental anti-pattern and transforms O(n×m) string comparison into O(n) hash/merge joins. If Posts.Tags column is not being updated frequently (index usage shows 0 updates), risk is minimal.

Option B: Use Full-Text Search (Alternative)

If schema changes are not feasible, create a full-text index on Posts.Tags and use CONTAINS or FREETEXT. This is less optimal than a bridge table but eliminates the LIKE cross join.

Rewritten RankedPosts CTE (using bridge table):

-- BEFORE (current — ~193 cost units)
FROM Posts p
JOIN Tags t ON p.Tags LIKE '%' + t.TagName + '%'
WHERE p.PostTypeId = 1

-- AFTER (with PostTags bridge — estimated <2 cost units)
FROM Posts p
INNER JOIN dbo.PostTags pt ON p.Id = pt.PostId
INNER JOIN Tags t ON pt.TagId = t.Id
WHERE p.PostTypeId = 1

2. Fix the Logical Bug: Final JOIN on Wrong Column

Critical Confidence: 99%

The final SELECT joins TopUsers tp LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId. This joins a User's ID to a Post's ID — these are different entities and will produce semantically incorrect results. The RankedPosts CTE selects p.Id AS PostId but does not expose the Post's OwnerUserId. The intent appears to be joining on the user who authored the post.

Evidence: NodeId=4 Hash Match Right Outer Join — HashKeysBuild references [Posts].[Id], HashKeysProbe references [Users].[Id]. The CTE output PostId is p.Id (Post primary key), not a user identifier.

Fix:

-- BEFORE (buggy)
WITH RankedPosts AS (
    SELECT 
        p.Id AS PostId,
        p.Title,
        p.CreationDate,
        ROW_NUMBER() OVER (PARTITION BY t.Id ORDER BY p.CreationDate DESC) AS rn,
        t.TagName
    FROM Posts p
    JOIN Tags t ON p.Tags LIKE '%' + t.TagName + '%'
    WHERE p.PostTypeId = 1
)
...
LEFT JOIN RankedPosts rp ON tp.UserId = rp.PostId  -- BUG: UserId ≠ PostId

-- AFTER (corrected — add OwnerUserId to CTE)
WITH RankedPosts AS (
    SELECT 
        p.Id AS PostId,
        p.OwnerUserId,              -- ADD THIS
        p.Title,
        p.CreationDate,
        ROW_NUMBER() OVER (PARTITION BY t.Id ORDER BY p.CreationDate DESC) AS rn,
        t.TagName
    FROM Posts p
    INNER JOIN dbo.PostTags pt ON p.Id = pt.PostId
    INNER JOIN Tags t ON pt.TagId = t.Id
    WHERE p.PostTypeId = 1
)
...
LEFT JOIN RankedPosts rp ON tp.UserId = rp.OwnerUserId  -- FIXED
Risk: This is a correctness fix, not a performance-only change. However, fixing this may significantly change the result set cardinality and thus also impact performance (likely reducing rows since the current join is semi-random).

3. Eliminate Cartesian Expansion in UserStats CTE

High Confidence: 95%

The UserStats CTE performs Users LEFT JOIN Posts LEFT JOIN Votes LEFT JOIN Badges in a single pass, producing a Cartesian-like expansion to 1.7M intermediate rows before aggregating with COUNT(DISTINCT). A user with 10 posts, 50 votes across those posts, and 5 badges would contribute 10 × 50 × 5 = 2,500 rows instead of just needing 3 pre-aggregated lookups.

Evidence: NodeId=19 Compute Scalar, EstimateRows=1,700,990. NodeId=18 Hash Match Aggregate, EstimateRows=177,639. The row explosion from 267K users to 1.7M rows demonstrates the Cartesian multiplication effect.

Rewrite: Pre-aggregate Each Dimension

-- BEFORE (Cartesian explosion to 1.7M rows)
UserStats AS (
    SELECT u.Id AS UserId, u.DisplayName,
        COUNT(DISTINCT p.Id) AS TotalPosts,
        SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
        SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes,
        COUNT(DISTINCT b.Id) AS BadgeCount
    FROM Users u
    LEFT JOIN Posts p ON u.Id = p.OwnerUserId
    LEFT JOIN Votes v ON p.Id = v.PostId
    LEFT JOIN Badges b ON u.Id = b.UserId
    GROUP BY u.Id, u.DisplayName
)

-- AFTER (pre-aggregate, ~300K rows max per subquery)
PostCounts AS (
    SELECT OwnerUserId, COUNT(*) AS TotalPosts
    FROM Posts
    GROUP BY OwnerUserId
),
VoteCounts AS (
    SELECT p.OwnerUserId,
        SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
        SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
    FROM Posts p
    INNER JOIN Votes v ON p.Id = v.PostId
    WHERE v.VoteTypeId IN (2, 3)
    GROUP BY p.OwnerUserId
),
BadgeCounts AS (
    SELECT UserId, COUNT(*) AS BadgeCount
    FROM Badges
    GROUP BY UserId
),
UserStats AS (
    SELECT 
        u.Id AS UserId,
        u.DisplayName,
        ISNULL(pc.TotalPosts, 0) AS TotalPosts,
        ISNULL(vc.UpVotes, 0) AS UpVotes,
        ISNULL(vc.DownVotes, 0) AS DownVotes,
        ISNULL(bc.BadgeCount, 0) AS BadgeCount
    FROM Users u
    LEFT JOIN PostCounts pc ON u.Id = pc.OwnerUserId
    LEFT JOIN VoteCounts vc ON u.Id = vc.OwnerUserId
    LEFT JOIN BadgeCounts bc ON u.Id = bc.UserId
)

This eliminates COUNT(DISTINCT) and reduces intermediate rows from 1.7M to approximately 267K (one per user).

Risk: Minimal. This is a semantically equivalent rewrite that reduces memory grant requirements and eliminates Cartesian inflation. The pre-aggregation pattern is standard best practice.

4. Complete Query Rewrite

Medium Confidence: 90%

Combining all fixes from recommendations 1–3, here is the complete rewritten stored procedure:

ALTER PROCEDURE sp00060
AS
BEGIN
    SET NOCOUNT ON;

    WITH PostCounts AS (
        SELECT OwnerUserId, COUNT(*) AS TotalPosts
        FROM dbo.Posts
        WHERE OwnerUserId IS NOT NULL
        GROUP BY OwnerUserId
        HAVING COUNT(*) > 10   -- Push filter early
    ),
    VoteCounts AS (
        SELECT p.OwnerUserId,
            SUM(CASE WHEN v.VoteTypeId = 2 THEN 1 ELSE 0 END) AS UpVotes,
            SUM(CASE WHEN v.VoteTypeId = 3 THEN 1 ELSE 0 END) AS DownVotes
        FROM dbo.Posts p
        INNER JOIN dbo.Votes v ON p.Id = v.PostId
        WHERE v.VoteTypeId IN (2, 3)
          AND p.OwnerUserId IS NOT NULL
        GROUP BY p.OwnerUserId
    ),
    BadgeCounts AS (
        SELECT UserId, COUNT(*) AS BadgeCount
        FROM dbo.Badges
        GROUP BY UserId
    ),
    TopUsers AS (
        SELECT 
            u.Id AS UserId,
            u.DisplayName,
            pc.TotalPosts,
            ISNULL(vc.UpVotes, 0) AS UpVotes,
            ISNULL(vc.DownVotes, 0) AS DownVotes,
            ISNULL(bc.BadgeCount, 0) AS BadgeCount,
            DENSE_RANK() OVER (
                ORDER BY pc.TotalPosts DESC, 
                         ISNULL(vc.UpVotes, 0) - ISNULL(vc.DownVotes, 0) DESC
            ) AS UserRank
        FROM dbo.Users u
        INNER JOIN PostCounts pc ON u.Id = pc.OwnerUserId
        LEFT JOIN VoteCounts vc ON u.Id = vc.OwnerUserId
        LEFT JOIN BadgeCounts bc ON u.Id = bc.UserId
    ),
    RankedPosts AS (
        SELECT 
            p.Id AS PostId,
            p.OwnerUserId,
            p.Title,
            p.CreationDate,
            ROW_NUMBER() OVER (
                PARTITION BY pt.TagId 
                ORDER BY p.CreationDate DESC
            ) AS rn,
            t.TagName
        FROM dbo.Posts p
        INNER JOIN dbo.PostTags pt ON p.Id = pt.PostId
        INNER JOIN dbo.Tags t ON pt.TagId = t.Id
        WHERE p.PostTypeId = 1
    )
    SELECT 
        tp.UserRank,
        tp.DisplayName,
        tp.TotalPosts,
        tp.UpVotes,
        tp.DownVotes,
        tp.BadgeCount,
        rp.Title AS LatestPostTitle,
        rp.CreationDate AS LatestPostDate,
        CASE 
            WHEN rp.rn = 1 THEN 'Latest'
            ELSE 'Older'
        END AS PostStatus
    FROM TopUsers tp
    LEFT JOIN RankedPosts rp ON tp.UserId = rp.OwnerUserId
    ORDER BY tp.UserRank, rp.CreationDate DESC;
END;

Expected impact: Estimated cost reduction from ~200 to <10. Memory grant reduction from ~5.6GB to <200MB. Execution time improvement of 90%+.

5. Index Recommendations

5a. PostTags Bridge Table Indexes (if created per Rec #1)

Already specified in Recommendation 1. The PK on (PostId, TagId) and the secondary index on (TagId, PostId) are essential.

5b. Index on Posts for PostTypeId with Covering Columns

Currently the plan scans the full 399MB clustered index on Posts to filter PostTypeId = 1 (returning ~104K of 246K rows, a 42% selectivity). With the rewrite, an index becomes more beneficial since the bridge table join changes the access pattern.

CREATE NONCLUSTERED INDEX IX_Posts_PostTypeId_Covering
ON dbo.Posts (PostTypeId, CreationDate DESC)
INCLUDE (OwnerUserId, Title, Tags)
WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);
MetricValue
estimatedStorageBytes~25 MB (with PAGE compression; Posts avg Tags=13.5 bytes, Title=24 bytes)
expectedReadBenefitPercent60–75% for RankedPosts CTE; avoids 399MB clustered scan
expectedWriteOverheadPercent3–5% on Posts INSERT/UPDATE (already has 2 NC indexes)
maintenanceNotesPAGE compression recommended; ONLINE build available on Enterprise/Developer. Consider dropping if PostTags bridge eliminates need for Tags column in this index.
Evidence: NodeId=9 Clustered Index Scan on PK__Posts, EstimateIO=2.32, reads all 246,672 rows to filter PostTypeId=1 returning 104,436. Column cardinality shows PostTypeId has only 5 distinct values.
Risk: Low. Posts table already has 2 nonclustered indexes. This adds ~25MB. If using the PostTags bridge table, consider a simpler version without the Tags INCLUDE column.

5c. Filtered Index on Votes for VoteTypeId IN (2, 3)

The rewritten VoteCounts CTE filters on VoteTypeId IN (2, 3). A filtered index avoids scanning all 926K votes (most of which are type 1 = AcceptedByOriginator, type 5 = Favorite, etc.).

CREATE NONCLUSTERED INDEX IX_Votes_PostId_UpDownVotes
ON dbo.Votes (PostId)
INCLUDE (VoteTypeId)
WHERE VoteTypeId IN (2, 3)
WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);
MetricValue
estimatedStorageBytes~3–5 MB (filtered; upvotes + downvotes are a subset of 926K)
expectedReadBenefitPercent40–60% for VoteCounts CTE; reads only relevant vote types
expectedWriteOverheadPercent1–2% (only maintained for VoteTypeId 2 or 3 inserts)
maintenanceNotesFiltered indexes not usable with parameterized queries unless OPTION(RECOMPILE) or the filter matches. Safe here since values are hardcoded in the SP.
Evidence: NodeId=21 Clustered Index Scan on PK__Votes, EstimateRows=926,084, EstimateIO=1.12. Currently reads all votes when only VoteTypeId 2 and 3 are needed.
Risk: Very low. Filtered index is small (~3–5MB) and only maintained on relevant inserts. Already have IX_Votes_PostId_VoteTypeId which covers this query, but a filtered version is smaller.

5d. Consider Dropping Unused Indexes

All index usage stats show 0 seeks, 0 scans, 0 lookups across every index. This suggests either the server was recently restarted, statistics were just rebuilt, or these indexes are genuinely unused. After the server has been running with production workload for a sufficient period, review and consider dropping:

6. Statistics Maintenance

Low Confidence: 95%

Statistics are in excellent shape:

Recommendation: No immediate action needed. Consider enabling AUTO_UPDATE_STATISTICS_ASYNC to avoid query compile delays on large tables:

ALTER DATABASE [SQLStorm] SET AUTO_UPDATE_STATISTICS_ASYNC ON;
Evidence: All 15 StatisticsInfo entries in the plan show SamplingPercent=100, ModificationCount=0. Cardinality estimates appear reasonable for the given data (e.g., Posts estimate of 104,436 for PostTypeId=1 out of 246,672 total).

7. Parameter Sniffing Considerations

Low Confidence: 90%

This stored procedure takes no parameters, so traditional parameter sniffing is not a concern. The query uses only literal constants (PostTypeId = 1, VoteTypeId = 2, VoteTypeId = 3, TotalPosts > 10). The plan will always be compiled with the same values.

However, note that ARITHABORT is set to false in the plan's StatementSetOptions. This can cause the plan to not be shared with SSMS sessions (which default to ARITHABORT ON), potentially leading to two cached plans. Consider adding SET ARITHABORT ON to the stored procedure.

ALTER PROCEDURE sp00060
AS
BEGIN
    SET NOCOUNT ON;
    SET ARITHABORT ON;  -- Ensure plan sharing with SSMS
    ...

SQL Server 2022's Parameter Sensitive Plan (PSP) optimization and DOP feedback features are available on this instance but are not applicable since there are no parameters and the query runs parallel (Branches=1 indicates a parallel plan).

Table-Specific Insights

Table Rows Size Indexes Key Observations
Posts 246,672
(1.23M total incl. all types)
399 MB (CI)
+ 23 MB (NC)
3 Largest table. Tags column (avg 13.5 bytes) stores delimited tag names — the root cause of the LIKE anti-pattern. PostTypeId has only 5 distinct values; PostTypeId=1 selects ~42% of rows. Title has ~104K distinct values (avg 24 bytes).
Votes 926,084 12 MB (CI)
+ 31 MB (NC)
4 Most indexes by count. PostId has ~243K distinct values. VoteTypeId has 13 distinct values. The NC indexes total 31MB — more than 2.5× the clustered index, suggesting potential over-indexing. Only VoteTypeId 2 and 3 are needed by this query.
Badges 439,352 8 MB (CI)
+ 5 MB (NC)
2 154K distinct UserIds means avg ~2.85 badges per user. IX_Badges_UserId includes Class column but this query only needs UserId and Id. The existing index is adequate.
Users 267,193 45 MB (CI) 1 Only clustered index exists. DisplayName avg 10.2 bytes, 227K distinct values. No additional indexes needed for this query pattern since Users is the dimension table being probed via hash joins.
Tags 1,232 0.1 MB 1 Tiny table, no compression (consider PAGE compression though benefit is negligible at 0.1MB). TagName avg 10.6 bytes. The entire table fits in a single data page extent — scan is appropriate.

Validation Steps

After implementing the recommended changes, validate performance improvements by comparing before/after metrics:

-- Run BEFORE and AFTER applying changes
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

EXEC sp00060;

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Key Metrics to Compare

Additional Validation

-- Check for tempdb spills in actual plan
SET STATISTICS XML ON;
EXEC sp00060;
SET STATISTICS XML OFF;
-- Look for SpillToTempDb="1" in Sort and Hash operators

-- Monitor memory grants
SELECT 
    session_id, 
    requested_memory_kb, 
    granted_memory_kb, 
    used_memory_kb,
    max_used_memory_kb
FROM sys.dm_exec_query_memory_grants
WHERE session_id = @@SPID;

Verify PostTags Bridge Table Accuracy

-- Spot-check: compare LIKE-based results with bridge table results
SELECT TOP 100 p.Id, p.Tags, t.TagName
FROM dbo.Posts p
INNER JOIN dbo.PostTags pt ON p.Id = pt.PostId
INNER JOIN dbo.Tags t ON pt.TagId = t.Id
WHERE p.PostTypeId = 1
ORDER BY p.Id;

-- Count comparison
SELECT COUNT(*) FROM dbo.PostTags;
-- Should approximate the ~104K rows estimated by the LIKE join