PostgreSQL Aggregation Implementation Explained
Aggregation is a fundamental feature in SQL, and PostgreSQL provides a rich set of aggregate functions such as COUNT, SUM, AVG, MAX, and MIN. This article analyzes its implementation principles.
Aggregation Execution Plan Example
First, let’s prepare some data:
-- Create table
postgres=# create table t1(id int, name varchar(20), course varchar(20), score int);
CREATE TABLE
-- Insert data
postgres=# select * from t1;
id | name | course | score
----+--------+---------+-------
1 | qupeng | math | 90
2 | qupeng | english | 60
3 | yangwei| math | 69
4 | yangwei| english | 80
5 | lijing | math | 88
6 | lijing | english | 80
(6 rows)
-- Aggregation query (no grouping)
postgres=# explain select sum(score) from t1;
QUERY PLAN
----------------------------------------------------
Aggregate (cost=1.07..1.08 rows=1 width=8)
-> Seq Scan on t1 (cost=0.00..1.06 rows=6 width=4)
(2 rows)
-- FILTER condition filtering (allows filtering before aggregation)
postgres=# explain select sum(score) FILTER (WHERE course = 'math') from t1;
QUERY PLAN
----------------------------------------------------
Aggregate (cost=1.09..1.10 rows=1 width=8)
-> Seq Scan on t1 (cost=0.00..1.06 rows=6 width=62)
(2 rows)
HashAggregate vs. GroupAggregate
For SUM aggregation without grouping, PostgreSQL uses a hash-based approach. Hashing is intuitive—each GROUP BY key maps to a hash value, enabling efficient grouping before summation.
Example: Summing scores by name:
postgres=# select name, sum(score) from t1 group by name;
name | sum
--------+-----
yangwei| 149
qupeng | 150
lijing | 168
(3 rows)
postgres=# explain select name, sum(score) from t1 group by name;
QUERY PLAN
----------------------------------------------------
HashAggregate (cost=17.95..19.95 rows=200 width=66)
Group Key: name
-> Seq Scan on t1 (cost=0.00..15.30 rows=530 width=62)
(3 rows)
When an index exists on the GROUP BY column (e.g., name), PostgreSQL may opt for GroupAggregate, which sorts the data first and then groups it. This is more efficient if the column is indexed or pre-sorted.
Example with indexing:
postgres=# create index idx_t1_name on t1 (name);
CREATE INDEX
postgres=# explain select name, sum(score) from t1 group by name order by name;
QUERY PLAN
----------------------------------------------------
GroupAggregate (cost=1.14..1.24 rows=6 width=66)
Group Key: name
-> Sort (cost=1.14..1.15 rows=6 width=62)
Sort Key: name
-> Seq Scan on t1 (cost=0.00..1.06 rows=6 width=62)
(5 rows)
Aggregation Function Execution Steps
PostgreSQL abstracts aggregation into three steps:
- Initialization: Sets initial conditions (e.g.,
0forSUM,(0,0)forAVG). - Transformation: Calls
transfuncfor each input tuple (e.g.,sum += new_valueforSUM). - Finalization: Applies
finalfuncif defined (e.g.,sum/countforAVG).
Example: AVG implementation in pg_aggregate:
postgres=# select * from pg_aggregate where aggfnoid = 2101;
-[ RECORD 1 ]--+------------------------------------
aggfnoid | pg_catalog.avg
aggkind | n
aggnumdirectargs | 0
aggtransfn | int4_avg_accum
aggfinalfn | int8_avg
...
agginitval | {0,0} -- Initial state: {sum, count}
The AVG Function uses an intermediate struct:
typedef struct Int8TransTypeData {
int64 count;
int64 sum;
} Int8TransTypeData;
Datum int4_avg_accum(PG_FUNCTION_ARGS) {
// Updates sum and count
}
Datum int8_avg(PG_FUNCTION_ARGS) {
// Computes sum/count
}
Parallel Aggregation
PostgreSQL supports parallel aggregation, where:
- Worker processes scan and locally aggregate data.
- Main process combines partial results via
combinefunc.
Example execution plan:
postgres=# explain select sum(amount) from bigtable where amount > 100;
QUERY PLAN
----------------------------------------------------
Finalize Aggregate (cost=10981.05..10981.06 rows=1 width=8)
-> Gather (cost=10980.83..10981.04 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=9980.83..9980.84 rows=1 width=8)
-> Parallel Seq Scan on bigtable (cost=0.00..9633.59 rows=138896 width=4)
Filter: (amount > 100)
(6 rows)
Source Code Analysis
The core function ExecAgg handles aggregation:
- HashAggregate: Uses
agg_fill_hash_tableto build a hash table, thenagg_retrieve_hash_tableto compute results. - GroupAggregate: Relies on sorted data, grouping tuples sequentially.
Key functions:
advance_aggregates: Updates aggregation state.finalize_aggregates: Appliesfinalfuncif needed.

%20(2048%20x%201000%20%E5%83%8F%E7%B4%A0)%20(3).png)

%20(2048%20x%201000%20%E5%83%8F%E7%B4%A0)%20(2).png)
