Deep Dive into MySQL Semijoin Transformation
1. Application Scenario
Semijoin transformation optimizes queries with IN
or EXISTS
subqueries, common in scenarios like retrieving courses with enrolled students.
Example:
- Tables:
class
(courses) androster
(enrollments). - Query Goal: List all courses with at least one enrollment.
Traditional JOIN (Inefficient):
SELECT DISTINCT class.class_num, class.class_name
FROM class
INNER JOIN roster ON class.class_num = roster.class_num;
- Issue: Duplicates due to multiple enrollments per course.
Optimized IN
Subquery (Semijoin):
SELECT class_num, class_name
FROM class
WHERE class_num IN (SELECT class_num FROM roster);
- Execution Plan: Uses
Nested loop semijoin
, outperforming the JOIN approach.
2. What is Semijoin Transformation?
A query optimization technique converting IN
/EXISTS
subqueries into efficient joins.
Key Strategies (MySQL 8.0.16+):
- Table Pullout: Merges subquery tables into the main query if possible.
- Duplicate Weedout: Removes duplicates via temporary tables.
- First Match: Stops scanning after the first match per outer row.
- Loose Scan: Skips duplicate index values using index jumps.
- Materialization: Precomputes subquery results for fast lookup.
Supported Since:
IN
/EXISTS
: MySQL 8.0.16.NOT IN
/NOT EXISTS
: MySQL 8.0.17 (as anti-semijoins).
Control Parameters:
-- Check if enabled (default: ON)
SELECT @@optimizer_switch LIKE '%semijoin=on%';
-- Disable globally
SET GLOBAL optimizer_switch = 'semijoin=off';
3. How Semijoin Works
Optimization Process:
- Query Preparation: Converts
IN (SELECT)
to semijoin syntax. - Cost-Based Selection: Chooses the best strategy (e.g.,
FirstMatch
). - Execution: Applies the selected strategy (e.g., index lookup).
Example Execution Plan:
EXPLAIN ANALYZE
SELECT class_num, class_name FROM class
WHERE class_num IN (SELECT class_num FROM roster);
- Output: Shows
Nested loop semijoin
withFirstMatch
strategy.
4. Additional Optimizations (MySQL 8.0.41+)
- Group Skip Scan: Reduces rows scanned by leveraging index grouping.
- Reference: Improving Semi-join Performance in MySQL.
Conclusion:
Semijoin transformation significantly boosts query performance for subqueries. Use EXPLAIN ANALYZE
to verify optimizations and adjust optimizer_switch
settings if needed.