Iterating Large Collections
Problem
Using foreach()
directly on a large collection will trigger full
initialization of the collection, and can cause an out-of-memory error.
Doctrine Collection
does provide the slice()
method to paginate the
collection. However it uses offset pagination that has these drawbacks:
- With a large collection, it will become slower and slower as you go further away from the start.
- If the underlying data changes while you are iterating it, the entire set will drift, and the iteration is going to miss or duplicate some records. It basically will only work on static data.
slice()
is rather low level. You need to supply the iterating logic yourself.
Solution
foreach()
-ing a collection from this package is subject to $softLimit
and
$hardLimit
checks as described in the Potential Out-of-Memory
Handling section. It will stop you before it becomes an
out-of-memory problem.
All of our classes implement higher-level PageableInterface
from our
rekalogika/rekapager-contracts
which add keyset pagination feature to the
underlying data. Unlike offset pagination, keyset pagination does not have the
aforementioned drawbacks.
To iterate over a large collection, you can simply do this:
use Doctrine\ORM\EntityManagerInterface;
use Rekalogika\Rekapager\PageableInterface;
/** @var EntityManagerInterface $entityManager */
// $collection is any collection object from this package
foreach ($collection->withItemsPerPage(1000)->getPages() as $page) {
foreach ($page as $entity) {
// Do something with the $entity
}
// Do something after each page here
// With Doctrine, you'd usually want to flush() and clear() here
$entityManager->flush(); // if required
$entitymanager->clear();
}
There is no need to create ad-hoc queries every time you need to perform safe iteration over a large collection.
For more information about batch processing using PageableInterface
, see
Batch Processing.