Skip to main content

Partitioning

Why?

Partitioning lets the framework update the summary table incrementally. Without partitioning, processing a source table with billions of records might probably take days or even weeks to finish.

Because an RDBMS is ACID compliant, it will also need to store both old and new records of the source table while the processing is ongoing, which will take up a lot of space and may cause fragmentation.

How It Works

The following table shows how the records are partitioned using the hypothetical IntegerPartition with 1-2-3-4-5-6 bits of partitioning width. The leftmost column indicates the level. Other cells are the partitions of that level. Numbers in the cells indicate the partitioning key range that are rolled-up in the partition.

L60-63
L50-3132-63
L40-1516-3132-4748-63
L30-78-1516-2324-3132-3940-4748-5556-63
L20-34-78-1112-1516-1920-2324-2728-3132-3536-3940-4344-4748-5152-5556-5960-63
L10-12-34-56-78-910-1112-1314-1516-1718-1920-2122-2324-2526-2728-2930-3132-3334-3536-3738-3940-4142-4344-4546-4748-4950-5152-5354-5556-5758-5960-6162-63

Example of Partitioning

With the above partitioning scheme above, now suppose our source entity has the following records:

IDValue
82
93
106
117
1210
135
144
151

The framework will roll up those records into the following level 1 partitions:

LevelKeySum of Value
L185
L11013
L11215
L1145

Then, in turn those level 1 partitions will be rolled up into level 2 partitions:

LevelKeySum of Value
L2818
L21220

Then, again, those level 2 partitions will be rolled up into level 3 partitions:

LevelKeySum of Value
L3838

And so forth until we reach the top level.

info

In the summary table, a partition is identified by its level and key.

Queries

If we currently have 21 records already rolled-up, these are the partition that we will have. If we were to perform a query, the framework will union the records from the highlighted partitions to get the result:

L6
L5
L40-15
L30-78-15
L20-34-78-1112-1516-19
L10-12-34-56-78-910-1112-1314-1516-1718-1920-21

Updates

Now suppose the framework knows that the record #11 has been updated. It will refresh the partition L1 10-11. Afterward, it will refresh its parent partition, which is the partition L2 8-11, and so on until it reaches the top level.

These are the partitions that will get updated when that happens:

L6
L5
L40-15
L30-78-15
L20-34-78-1112-1516-19
L10-12-34-56-78-910-1112-1314-1516-1718-1920-21

In the Real World

Having a partitioning scheme with 1-bit width as above is useful for explanation, but will be pretty inefficient. Our default integer partition is using 11-22-33-44-55 bits of partitioning width. This means that the first level contains the data rolled-up from up to 2048 records, the second level contains 4 millions records that are rolled-up from the first level, and so on.