PostgreSQL Bloom Extension
The bloom index in PostgreSQL is implemented via the built-in bloom
extension. It accelerates multi-column equality queries using a probabilistic Bloom filter structure, trading minor precision loss for high-speed performance. Key features:
- Multi-column indexing: Supports indexing multiple columns simultaneously.
- Efficient filtering: Ideal for high-cardinality columns (values with low duplication).
- Space optimization: More storage-efficient than traditional B-Tree indexes.
- Probabilistic results: May yield false positives but never false negatives.
- Equality-only: Does not support range queries.
Use Cases
- Multi-column conditional filtering (e.g.,
WHERE col1=x AND col2=y
). - Wide-table analytics (hundreds of columns) with unpredictable query conditions.
Usage
1. Install Extension:
CREATE EXTENSION bloom;
2. Create Index:
CREATE INDEX idx_name ON table_name USING bloom (col1, col2, ...)
WITH (length=..., col1=..., col2=...);
length
: Signature bit length (default 80
, max 4096
; higher reduces false positives).
col1
,col2
, ...: Bits per column (default2
, max4095
; higher reduces false positives).
Example
1. Create a test table with 10M rows:
CREATE TABLE tbloom AS
SELECT
(random() * 1000000)::int AS i1,
(random() * 1000000)::int AS i2,
(random() * 1000000)::int AS i3,
(random() * 1000000)::int AS i4,
(random() * 1000000)::int AS i5,
(random() * 1000000)::int AS i6
FROM generate_series(1, 10000000);
2. Sequential Scan (Baseline):
EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
-- Execution Time: 1648.901 ms
3. B-Tree Index Comparison:
CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); -- 386 MB
EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
-- Execution Time: 876.106 ms
4. Bloom Index Performance:
CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); -- 153 MB
EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
-- Execution Time: 115.951 ms (3.5x faster than B-Tree)
Limitations
- ❌ No updates: Rebuild index for write-heavy scenarios.
- ❌ False positives: Tune
length
/column bits to balance precision. - ❌ Equality-only: No range queries, sorting (
ORDER BY
), or uniqueness (UNIQUE
). - ❌ Limited types: Only supports
int4
andtext
(e.g., fails fortimestamp
):
CREATE INDEX bloom_idx ON t4 USING bloom (a, b, c); -- ERROR: Unsupported type
Conclusion
Bloom indexes excel in analytical workloads with multi-column equality filters, offering space savings and speed gains over B-Tree for targeted use cases. Choose based on query patterns and data characteristics.