Partitioning

Why?

Partitioning lets the framework update the summary table incrementally. Without partitioning, processing a source table with billions of records might probably take days or even weeks to finish.

Because an RDBMS is ACID compliant, it will also need to store both old and new records of the source table while the processing is ongoing, which will take up a lot of space and may cause fragmentation.

How It Works

The following table shows how the records are partitioned using the hypothetical IntegerPartition with 1-2-3-4-5-6 bits of partitioning width. The leftmost column indicates the level. Other cells are the partitions of that level. Numbers in the cells indicate the partitioning key range that are rolled-up in the partition.

L6	0-63
L5	0-31																32-63
L4	0-15								16-31								32-47								48-63
L3	0-7				8-15				16-23				24-31				32-39				40-47				48-55				56-63
L2	0-3		4-7		8-11		12-15		16-19		20-23		24-27		28-31		32-35		36-39		40-43		44-47		48-51		52-55		56-59		60-63
L1	0-1	2-3	4-5	6-7	8-9	10-11	12-13	14-15	16-17	18-19	20-21	22-23	24-25	26-27	28-29	30-31	32-33	34-35	36-37	38-39	40-41	42-43	44-45	46-47	48-49	50-51	52-53	54-55	56-57	58-59	60-61	62-63

Example of Partitioning

With the above partitioning scheme above, now suppose our source entity has the following records:

ID	Value
8	2
9	3
10	6
11	7
12	10
13	5
14	4
15	1

The framework will roll up those records into the following level 1 partitions:

Level	Key	Sum of Value
L1	8	5
L1	10	13
L1	12	15
L1	14	5

Then, in turn those level 1 partitions will be rolled up into level 2 partitions:

Level	Key	Sum of Value
L2	8	18
L2	12	20

Then, again, those level 2 partitions will be rolled up into level 3 partitions:

Level	Key	Sum of Value
L3	8	38

And so forth until we reach the top level.

info

In the summary table, a partition is identified by its level and key. With our integer partition, the key is the same as the first possible source ID in the partition.

Queries

If we currently have 21 records already rolled-up, these are the partition that we will have. If we were to perform a query, the framework will union the records from the highlighted partitions to get the result:

L6
L5
L4	0-15
L3	0-7				8-15
L2	0-3		4-7		8-11		12-15		16-19
L1	0-1	2-3	4-5	6-7	8-9	10-11	12-13	14-15	16-17	18-19	20-21

Updates

Now suppose the framework knows that the record #11 has been updated. It will refresh the partition L1 10-11. Afterward, it will refresh its parent partition, which is the partition L2 8-11, and so on until it reaches the top level.

These are the partitions that will get updated when that happens:

L6
L5
L4	0-15
L3	0-7				8-15
L2	0-3		4-7		8-11		12-15		16-19
L1	0-1	2-3	4-5	6-7	8-9	10-11	12-13	14-15	16-17	18-19	20-21

In the Real World

Having a partitioning scheme with 1-bit width as above is useful for explanation, but will be pretty inefficient. Our default integer partition is using 11-22-33-44-55 bits of partitioning width. This means that the first level contains the data rolled-up from up to 2048 records, the second level contains 4 millions records that are rolled-up from the first level, and so on.

Why?​

How It Works​

Example of Partitioning​

Queries​

Updates​

In the Real World​