During a recent technical sharing event hosted by Maliu Study, a participant uncovered a subtle yet critical MySQL subquery behavior. The issue arises when referencing a non-existent column in a subquery under specific conditions, leading to unexpected results.
Reproduction Steps
Create tables t1_0807
and t2_0807
:
CREATE TABLE `t1_0807` (
`id` int NOT NULL AUTO_INCREMENT,
`a` int DEFAULT NULL,
`b` varchar(10),
PRIMARY KEY (`id`),
KEY `idx_a` (`a`)
) ENGINE=InnoDB;
CREATE TABLE `t2_0807` (
`id` int NOT NULL AUTO_INCREMENT,
`c` int DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO t1_0807(a,b) VALUES (1,'one'),(2,'two'),(3,NULL);
INSERT INTO t2_0807(c) VALUES (1),(2),(3);
Execute the problematic query:
SELECT * FROM t1_0807 WHERE b IN (SELECT b FROM t2_0807);
Despite t2_0807
lacking a b
column, the query returns rows. This occurs because MySQL’s optimizer rewrites the subquery into a semi-join operation, altering its original intent.
Root Cause Analysis
The execution plan reveals the optimizer transformed the query into:
SELECT /* ... */
FROM t1_0807 SEMI JOIN (t2_0807)
WHERE t1_0807.b = t1_0807.b;
This effectively becomes WHERE TRUE
, causing the subquery to always evaluate as valid.
Key Takeaways
- Avoid Column Name Ambiguity: Ensure subqueries reference columns from the correct tables.
- Validate Queries Before Production: Always test subqueries in isolation to confirm behavior.
- Check Optimizer Rewrites: Use
SHOW WARNINGS;
post-query to analyze transformations.
This case highlights the importance of understanding MySQL’s optimizer behavior. Developers who bypassed preliminary checks risk unintended data modifications, as warned in the original forum discussion.
For deeper insights into database performance and query optimization, explore related articles on index utilization and execution plan analysis.