1. Core Problem: Performance Disparity in LIMIT offset, size
1.1 Execution Mechanism Differences (e.g., LIMIT 100000000, 10
vs. LIMIT 10
)
MySQL employs a two-layer architecture:
- Server Layer: Handles SQL parsing, optimization, execution scheduling.
- Storage Engine Layer (e.g., InnoDB): Manages physical data I/O.
For LIMIT offset, size
:
- Storage engine must read and discard first
offset
rows before returningsize
rows. - Problem:
LIMIT 100000000, 10
requires reading and discarding 100 million rows, causing severe I/O, memory, and CPU waste. - Solution:
LIMIT 10
reads only 10 rows directly → superior performance.
2. Deep Pagination (Large Offset) Optimization Strategies
Core Challenge: Large offsets force scanning massive irrelevant data.
Strategy 1: Keyset Pagination (Cursor-based)
SELECT * FROM table
WHERE id > #{last_id} -- Last ID from previous page
ORDER BY id
LIMIT 10;
- Use Cases: Infinite scroll (e.g., TikTok feeds, comment sections).
- Pros: Minimal I/O, no offset calculation.
- Cons: Cannot jump to arbitrary pages.
Strategy 2: Subquery Optimization (Leveraging Secondary Indexes)
Original Slow Query:
SELECT * FROM A
WHERE name = 'user_longyu'
ORDER BY id
LIMIT 100000000, 10;
Optimized Version:
SELECT * FROM A
WHERE name = 'user_longyu'
AND id >= (
SELECT id FROM A
WHERE name = 'user_longyu'
ORDER BY id
LIMIT 100000000, 1 -- Fetch starting ID via lightweight index
)
ORDER BY id
LIMIT 10;
- Use Cases: Filtered pagination with indexed columns (e.g.,
name
). - How It Works:
- Use secondary index to find boundary ID (reduced I/O).
- Fetch full rows via primary key.
- Alternative JOIN Syntax:
SELECT A.* FROM A
JOIN (
SELECT id FROM A
WHERE name = 'user_longyu'
ORDER BY id
LIMIT 10 OFFSET 100000000
) AS t ON A.id = t.id;
Strategy 3: Search Engine Integration (e.g., Elasticsearch)
- Use Cases: Complex filters + high-concurrency deep pagination (e.g., e-commerce search).
- Warning:
- Elasticsearch has its own deep pagination limits (
from + size
). - Requires
scroll
orsearch_after
for optimization.
- Elasticsearch has its own deep pagination limits (
Additional Optimization: Sharding Middleware
- Route queries by shard key (e.g.,
user_id % 100
) to avoid cross-shard scans. - Tools: ShardingSphere, MyCat (intelligent pagination routing).
Key Takeaways:
- Avoid large
offset
inLIMIT
—exponential performance degradation. - Prefer keyset pagination for sequential access.
- Use indexed subqueries to minimize full-table scans.
- Elasticsearch for search-heavy workloads (with pagination safeguards).
- Sharding optimizes distributed deep queries.
Performance Impact:
