July 30, 2025

Deep Dive into MySQL Semijoin Transformation

This article explores MySQL's semijoin transformation, optimizing IN and EXISTS subqueries for better performance.

Deep Dive into MySQL Semijoin Transformation

1. Application Scenario

Semijoin transformation optimizes queries with IN or EXISTS subqueries, common in scenarios like retrieving courses with enrolled students.

Example:​

  • Tables:​class (courses) and roster (enrollments).
  • Query Goal:​​ List all courses with at least one enrollment.

Traditional JOIN (Inefficient):​

SELECT DISTINCT class.class_num, class.class_name 
FROM class 
INNER JOIN roster ON class.class_num = roster.class_num;

  • Issue:​​ Duplicates due to multiple enrollments per course.

Optimized IN Subquery (Semijoin):​

SELECT class_num, class_name 
FROM class 
WHERE class_num IN (SELECT class_num FROM roster);

  • Execution Plan:​​ Uses Nested loop semijoin, outperforming the JOIN approach.

2. What is Semijoin Transformation?​

A query optimization technique converting IN/EXISTS subqueries into efficient joins.

Key Strategies (MySQL 8.0.16+):​

  1. Table Pullout:​​ Merges subquery tables into the main query if possible.
  2. Duplicate Weedout:​​ Removes duplicates via temporary tables.
  3. First Match:​​ Stops scanning after the first match per outer row.
  4. Loose Scan:​​ Skips duplicate index values using index jumps.
  5. Materialization:​​ Precomputes subquery results for fast lookup.

Supported Since:​

  • IN/EXISTS: MySQL 8.0.16.
  • NOT IN/NOT EXISTS: MySQL 8.0.17 (as anti-semijoins).

Control Parameters:​

-- Check if enabled (default: ON)
SELECT @@optimizer_switch LIKE '%semijoin=on%';

-- Disable globally
SET GLOBAL optimizer_switch = 'semijoin=off';

3. How Semijoin Works

Optimization Process:​

  1. Query Preparation:​​ Converts IN (SELECT) to semijoin syntax.
  2. Cost-Based Selection:​​ Chooses the best strategy (e.g., FirstMatch).
  3. Execution:​​ Applies the selected strategy (e.g., index lookup).

Example Execution Plan:​

EXPLAIN ANALYZE 
SELECT class_num, class_name FROM class 
WHERE class_num IN (SELECT class_num FROM roster);

  • Output:​​ Shows Nested loop semijoin with FirstMatch strategy.

4. Additional Optimizations (MySQL 8.0.41+)​

Conclusion:​
Semijoin transformation significantly boosts query performance for subqueries. Use EXPLAIN ANALYZE to verify optimizations and adjust optimizer_switch settings if needed.

You will get best features of ChatDBA