Partitioning
Why?
Partitioning lets the framework update the summary table incrementally. Without partitioning, processing a source table with billions of records might probably take days or even weeks to finish.
Because an RDBMS is ACID compliant, it will also need to store both old and new records of the source table while the processing is ongoing, which will take up a lot of space and may cause fragmentation.
How It Works
The following table shows how the records are partitioned using the hypothetical
IntegerPartition
with 1-2-3-4-5-6 bits of partitioning width. The leftmost
column indicates the level. Other cells are the partitions of that level.
Numbers in the cells indicate the partitioning key range that are rolled-up in
the partition.
L6 | 0-63 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
L5 | 0-31 | 32-63 | ||||||||||||||||||||||||||||||
L4 | 0-15 | 16-31 | 32-47 | 48-63 | ||||||||||||||||||||||||||||
L3 | 0-7 | 8-15 | 16-23 | 24-31 | 32-39 | 40-47 | 48-55 | 56-63 | ||||||||||||||||||||||||
L2 | 0-3 | 4-7 | 8-11 | 12-15 | 16-19 | 20-23 | 24-27 | 28-31 | 32-35 | 36-39 | 40-43 | 44-47 | 48-51 | 52-55 | 56-59 | 60-63 | ||||||||||||||||
L1 | 0-1 | 2-3 | 4-5 | 6-7 | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 | 20-21 | 22-23 | 24-25 | 26-27 | 28-29 | 30-31 | 32-33 | 34-35 | 36-37 | 38-39 | 40-41 | 42-43 | 44-45 | 46-47 | 48-49 | 50-51 | 52-53 | 54-55 | 56-57 | 58-59 | 60-61 | 62-63 |
Example of Partitioning
With the above partitioning scheme above, now suppose our source entity has the following records:
ID | Value |
---|---|
8 | 2 |
9 | 3 |
10 | 6 |
11 | 7 |
12 | 10 |
13 | 5 |
14 | 4 |
15 | 1 |
The framework will roll up those records into the following level 1 partitions:
Level | Key | Sum of Value |
---|---|---|
L1 | 8 | 5 |
L1 | 10 | 13 |
L1 | 12 | 15 |
L1 | 14 | 5 |
Then, in turn those level 1 partitions will be rolled up into level 2 partitions:
Level | Key | Sum of Value |
---|---|---|
L2 | 8 | 18 |
L2 | 12 | 20 |
Then, again, those level 2 partitions will be rolled up into level 3 partitions:
Level | Key | Sum of Value |
---|---|---|
L3 | 8 | 38 |
And so forth until we reach the top level.
In the summary table, a partition is identified by its level and key.
Queries
If we currently have 21 records already rolled-up, these are the partition that we will have. If we were to perform a query, the framework will union the records from the highlighted partitions to get the result:
L6 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
L5 | |||||||||||
L4 | 0-15 | ||||||||||
L3 | 0-7 | 8-15 | |||||||||
L2 | 0-3 | 4-7 | 8-11 | 12-15 | 16-19 | ||||||
L1 | 0-1 | 2-3 | 4-5 | 6-7 | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 | 20-21 |
Updates
Now suppose the framework knows that the record #11 has been updated. It will refresh the partition L1 10-11. Afterward, it will refresh its parent partition, which is the partition L2 8-11, and so on until it reaches the top level.
These are the partitions that will get updated when that happens:
L6 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
L5 | |||||||||||
L4 | 0-15 | ||||||||||
L3 | 0-7 | 8-15 | |||||||||
L2 | 0-3 | 4-7 | 8-11 | 12-15 | 16-19 | ||||||
L1 | 0-1 | 2-3 | 4-5 | 6-7 | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 | 20-21 |
In the Real World
Having a partitioning scheme with 1-bit width as above is useful for explanation, but will be pretty inefficient. Our default integer partition is using 11-22-33-44-55 bits of partitioning width. This means that the first level contains the data rolled-up from up to 2048 records, the second level contains 4 millions records that are rolled-up from the first level, and so on.