July 30, 2025

PostgreSQL Aggregation Implementation Explained

Explore PostgreSQL aggregation mechanics, including HashAggregate, GroupAggregate, and parallel aggregation techniques with examples.

PostgreSQL Aggregation Implementation Explained

Aggregation is a fundamental feature in SQL, and PostgreSQL provides a rich set of aggregate functions such as COUNT, SUM, AVG, MAX, and MIN. This article analyzes its implementation principles.

Aggregation Execution Plan Example

First, let’s prepare some data:

-- Create table  
postgres=# create table t1(id int, name varchar(20), course varchar(20), score int);  
CREATE TABLE  

-- Insert data  
postgres=# select * from t1;  
 id | name   | course  | score  
----+--------+---------+-------  
  1 | qupeng | math    |    90  
  2 | qupeng | english |    60  
  3 | yangwei| math    |    69  
  4 | yangwei| english |    80  
  5 | lijing | math    |    88  
  6 | lijing | english |    80  
(6 rows)  

-- Aggregation query (no grouping)  
postgres=# explain select sum(score) from t1;  
 QUERY PLAN  
----------------------------------------------------  
 Aggregate  (cost=1.07..1.08 rows=1 width=8)  
   ->  Seq Scan on t1  (cost=0.00..1.06 rows=6 width=4)  
(2 rows)  

-- FILTER condition filtering (allows filtering before aggregation)  
postgres=# explain select sum(score) FILTER (WHERE course = 'math') from t1;  
 QUERY PLAN  
----------------------------------------------------  
 Aggregate  (cost=1.09..1.10 rows=1 width=8)  
   ->  Seq Scan on t1  (cost=0.00..1.06 rows=6 width=62)  
(2 rows)  

HashAggregate vs. GroupAggregate

For SUM aggregation without grouping, PostgreSQL uses a hash-based approach. Hashing is intuitive—each GROUP BY key maps to a hash value, enabling efficient grouping before summation.

Example: Summing scores by name:

postgres=# select name, sum(score) from t1 group by name;  
 name   | sum  
--------+-----  
 yangwei| 149  
 qupeng | 150  
 lijing | 168  
(3 rows)  

postgres=# explain select name, sum(score) from t1 group by name;  
 QUERY PLAN  
----------------------------------------------------  
 HashAggregate  (cost=17.95..19.95 rows=200 width=66)  
   Group Key: name  
   ->  Seq Scan on t1  (cost=0.00..15.30 rows=530 width=62)  
(3 rows)  

When an index exists on the GROUP BY column (e.g., name), PostgreSQL may opt for GroupAggregate, which sorts the data first and then groups it. This is more efficient if the column is indexed or pre-sorted.

Example with indexing:

postgres=# create index idx_t1_name on t1 (name);  
CREATE INDEX  

postgres=# explain select name, sum(score) from t1 group by name order by name;  
 QUERY PLAN  
----------------------------------------------------  
 GroupAggregate  (cost=1.14..1.24 rows=6 width=66)  
   Group Key: name  
   ->  Sort  (cost=1.14..1.15 rows=6 width=62)  
         Sort Key: name  
         ->  Seq Scan on t1  (cost=0.00..1.06 rows=6 width=62)  
(5 rows)  

Aggregation Function Execution Steps

PostgreSQL abstracts aggregation into three steps:

  1. Initialization: Sets initial conditions (e.g., 0 for SUM, (0,0) for AVG).
  2. Transformation: Calls transfunc for each input tuple (e.g., sum += new_value for SUM).
  3. Finalization: Applies finalfunc if defined (e.g., sum/count for AVG).

Example: AVG implementation in pg_aggregate:

postgres=# select * from pg_aggregate where aggfnoid = 2101;  
-[ RECORD 1 ]--+------------------------------------  
aggfnoid       | pg_catalog.avg  
aggkind        | n  
aggnumdirectargs | 0  
aggtransfn     | int4_avg_accum  
aggfinalfn     | int8_avg  
...  
agginitval     | {0,0}  -- Initial state: {sum, count}  

The AVG Function uses an intermediate struct:

typedef struct Int8TransTypeData {  
    int64 count;  
    int64 sum;  
} Int8TransTypeData;  

Datum int4_avg_accum(PG_FUNCTION_ARGS) {  
    // Updates sum and count  
}  

Datum int8_avg(PG_FUNCTION_ARGS) {  
    // Computes sum/count  
}  

Parallel Aggregation

PostgreSQL supports parallel aggregation, where:

  1. Worker processes​ scan and locally aggregate data.
  2. Main process​ combines partial results via combinefunc.

Example execution plan:

postgres=# explain select sum(amount) from bigtable where amount > 100;  
 QUERY PLAN  
----------------------------------------------------  
 Finalize Aggregate  (cost=10981.05..10981.06 rows=1 width=8)  
   ->  Gather  (cost=10980.83..10981.04 rows=2 width=8)  
         Workers Planned: 2  
         ->  Partial Aggregate  (cost=9980.83..9980.84 rows=1 width=8)  
               ->  Parallel Seq Scan on bigtable  (cost=0.00..9633.59 rows=138896 width=4)  
                     Filter: (amount > 100)  
(6 rows)  

Source Code Analysis

The core function ExecAgg handles aggregation:

  • HashAggregate: Uses agg_fill_hash_table to build a hash table, then agg_retrieve_hash_table to compute results.
  • GroupAggregate: Relies on sorted data, grouping tuples sequentially.

Key functions:

  • advance_aggregates: Updates aggregation state.
  • finalize_aggregates: Applies finalfunc if needed.

You will get best features of ChatDBA