Deep Dive into MySQL Semijoin Transformation
1. Application Scenario
Semijoin transformation optimizes queries with IN or EXISTS subqueries, common in scenarios like retrieving courses with enrolled students.
Example:
- Tables:
class(courses) androster(enrollments). - Query Goal: List all courses with at least one enrollment.
Traditional JOIN (Inefficient):
SELECT DISTINCT class.class_num, class.class_name
FROM class
INNER JOIN roster ON class.class_num = roster.class_num;
- Issue: Duplicates due to multiple enrollments per course.
Optimized IN Subquery (Semijoin):
SELECT class_num, class_name
FROM class
WHERE class_num IN (SELECT class_num FROM roster);
- Execution Plan: Uses
Nested loop semijoin, outperforming the JOIN approach.
2. What is Semijoin Transformation?
A query optimization technique converting IN/EXISTS subqueries into efficient joins.
Key Strategies (MySQL 8.0.16+):
- Table Pullout: Merges subquery tables into the main query if possible.
- Duplicate Weedout: Removes duplicates via temporary tables.
- First Match: Stops scanning after the first match per outer row.
- Loose Scan: Skips duplicate index values using index jumps.
- Materialization: Precomputes subquery results for fast lookup.
Supported Since:
IN/EXISTS: MySQL 8.0.16.NOT IN/NOT EXISTS: MySQL 8.0.17 (as anti-semijoins).
Control Parameters:
-- Check if enabled (default: ON)
SELECT @@optimizer_switch LIKE '%semijoin=on%';
-- Disable globally
SET GLOBAL optimizer_switch = 'semijoin=off';
3. How Semijoin Works
Optimization Process:
- Query Preparation: Converts
IN (SELECT)to semijoin syntax. - Cost-Based Selection: Chooses the best strategy (e.g.,
FirstMatch). - Execution: Applies the selected strategy (e.g., index lookup).
Example Execution Plan:
EXPLAIN ANALYZE
SELECT class_num, class_name FROM class
WHERE class_num IN (SELECT class_num FROM roster);
- Output: Shows
Nested loop semijoinwithFirstMatchstrategy.
4. Additional Optimizations (MySQL 8.0.41+)
- Group Skip Scan: Reduces rows scanned by leveraging index grouping.
- Reference: Improving Semi-join Performance in MySQL.
Conclusion:
Semijoin transformation significantly boosts query performance for subqueries. Use EXPLAIN ANALYZE to verify optimizations and adjust optimizer_switch settings if needed.

%20(2048%20x%201000%20%E5%83%8F%E7%B4%A0)%20(3).png)

%20(2048%20x%201000%20%E5%83%8F%E7%B4%A0)%20(2).png)
