Partitioning
For performance and to facilitate incremental updates, the summary table is
partitioned according to the property tagged with the #[Analytics\Partition]
attribute. This property must be a Doctrine embeddable that implements
Partition
.
The partitioning term used here is different from the partitioning term used in the database. Although, it might be possible to use the summary partitioning key as the database partition key.
Best Practices, or TLDR;
If the source entity uses an auto-incrementing integer primary key, use this partitioning scheme:
use Doctrine\ORM\Mapping as ORM;
use Rekalogika\Analytics\Attribute as Analytics;
use Rekalogika\Analytics\Model\Summary;
use Rekalogika\Analytics\Partition\DefaultIntegerPartition;
use Rekalogika\Analytics\ValueResolver\PropertyValueResolver;
class YourSummary extends Summary
{
#[ORM\Embedded()]
#[Analytics\Partition(new PropertyValueResolver('id'))]
private DefaultIntegerPartition $partition;
}
If your source entity uses UUIDv7 (or ULID) as the primary key, use this partitioning scheme:
use Doctrine\ORM\Mapping as ORM;
use Rekalogika\Analytics\Attribute as Analytics;
use Rekalogika\Analytics\Model\Partition\UuidV7IntegerPartition;
use Rekalogika\Analytics\Model\Summary;
use Rekalogika\Analytics\ValueResolver\UuidToTruncatedIntegerResolver;
class YourSummary extends Summary
{
#[ORM\Embedded()]
#[Analytics\Partition(new UuidToTruncatedIntegerResolver('id'))]
private UuidV7IntegerPartition $partition;
}
Concepts
A property of the source entity is designated the partitioning key. The key is used to partition the data. The key is usually the primary key of the source entity, but not necessarily so. The key must be monotonic, or always increasing, but not necessarily unique.
Partitioning is divided into levels. Each level consists of multiple partition of the same length, one after the other. Levels are indicated by a number. A lower level has a shorter length than a higher level.
A partition is indicated by the level and the key. A partition of a level consists of several partitions of the lower level, except the lowermost level.
Records from the source entity are grouped by a specific lowest level partition according to the partitioning key, and rolled up into that partition. Then, eventually, the lowest level accumulates enough partitions, and in turn they are rolled up into the next higher level partition. And so on, until the highest level is reached.
If new source entities are added, they will be rolled up to the newest lowest level partition, and the framework does not need to reprocess the entire summary table.
If changes are detected in the old records, the lowest partition is marked as dirty. The framework will reprocess the dirty partition, then mark the higher level partition as dirty, and so on, until it bubbles up to the highest level. Again, the framework does not need to reprocess the entire summary table.
Available Partitioning Strategies
DefaultIntegerPartition
Suitable for partitioning auto-incrementing integer primary keys. It partitions using 11, 22, 33, 44, and 55 bits of width. A 11-bit partition aggregates up to 2048 records.
UuidV7IntegerPartition
Suitable for partitioning UUIDv7 (or ULID) primary keys. It should be coupled
by a UuidToTruncatedIntegerResolver
value resolver that truncates the 128-bit
UUID to a 48-bit integer.
UUIDv7 stores the time in the first 48 bits. So, the widths of each level correspond to the following intervals:
- 22 bits: corresponds to 1.165 hour interval
- 27 bits: corresponds to 1.6 days interval
- 32 bits: corresponds to 50 days interval
- 37 bits: corresponds to 4.3 years interval
Custom Integer Partition
You can create a custom integer partition by extending IntegerPartition
.
Custom Non-Integer Partition
You might be able to create your own non-integer partition by implementing the
Partition
interface, but currently this is untested and unsupported.
How Partitioning Works
The following table shows how the records are partitioned using the hypothetical
IntegerPartition
with 1-2-3-4-5-6 bits of partitioning width. The leftmost
column indicates the level. Other cells are the partitions of that level.
Numbers in the cells indicate the partitioning key range that are rolled-up in
the partition.
L6 | 0-63 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
L5 | 0-31 | 32-63 | ||||||||||||||||||||||||||||||
L4 | 0-15 | 16-31 | 32-47 | 48-63 | ||||||||||||||||||||||||||||
L3 | 0-7 | 8-15 | 16-23 | 24-31 | 32-39 | 40-47 | 48-55 | 56-63 | ||||||||||||||||||||||||
L2 | 0-3 | 4-7 | 8-11 | 12-15 | 16-19 | 20-23 | 24-27 | 28-31 | 32-35 | 36-39 | 40-43 | 44-47 | 48-51 | 52-55 | 56-59 | 60-63 | ||||||||||||||||
L1 | 0-1 | 2-3 | 4-5 | 6-7 | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 | 20-21 | 22-23 | 24-25 | 26-27 | 28-29 | 30-31 | 32-33 | 34-35 | 36-37 | 38-39 | 40-41 | 42-43 | 44-45 | 46-47 | 48-49 | 50-51 | 52-53 | 54-55 | 56-57 | 58-59 | 60-61 | 62-63 |
If we currently have 21 records already rolled-up, these are the partition that we will have. If we were to perform a query, the framework will combine the highlighted partitions to get the result:
L6 | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
L5 | |||||||||||
L4 | 0-15 | ||||||||||
L3 | 0-7 | 8-15 | |||||||||
L2 | 0-3 | 4-7 | 8-11 | 12-15 | 16-19 | ||||||
L1 | 0-1 | 2-3 | 4-5 | 6-7 | 8-9 | 10-11 | 12-13 | 14-15 | 16-17 | 18-19 | 20-21 |