How and when a generator can reduce the memory usage

  1. Comments

Recently I noticed some misunderstanding regarding memory usage with generators. Several people approached me with a question, "I've got a huge array, how can I save the memory with generators?". At first I wondered, why would people use a generator for this, if there are more conventional methods, you know - reading from the database a line after line, and such? And then I noticed the phrasing on the manual page:

"A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate".

which, although being technically correct, doesn't clearly sate the limitations for this method.
Finally, I came across some article that gone as far as bluntly telling you that

"if you are building huge arrays in your application which cause memory issues on the server, so yield suits your case".

Which sounds quite exciting on the first glance, but, unfortunately, is even farther from being practical.

Let's see how and when we can reduce the memory usage with generators.

First of all, in case we need the full array functionality, such as random access, no generator-based solution is even possible. This is why the man page doesn't say "array" but just a "set of data". This is important, and sadly, not every article explaining generators follows this strict phrasing.

And of course generators don't add some magical functionality in regard of the memory usage. When the memory usage can be reduced, it can be achieved by other means, using more traditional tools. I have to emphasize: it is not a generator that reduces the memory usage, but some good old loop that returns one value at a time. You can always have it without any generators, just like it shown in the first example below.

Well, what's all this stuff about memory and generators then? Let's see:

Imagine we have a file consists of 1M lines with numbers that need to be used in some calculation. Of course we can write a regular while loop to reduce the memory usage:

$sum 0;
$handle fopen("file.txt""r");
while ((
$num fgets($handle)) !== false) {
    
$sum += $num// here goes our calculation
}

Fast, clean, low memory usage. No generators involved.

But imagine we've got an existing function that already does the required calculation but it accepts an array which is then iterated over using foreach:

function calculate($array) {
    
$result 0;
    foreach (
$array as $item) {
        
$result += $item;
    }
    return 
$result;
}

When using traditional tools, we are bound to waste a lot of memory

$sum calculate(file("file.txt"));

as we are inevitably reading all the file contents into array.

And only here generators are to the rescue!

Using a generator, we can create a file() function substitute that doesn't read the entire contents in memory, but instead returns the lines one by one

function filerator($filename) {
    
$handle fopen($filename"r");
    while ((
$line fgets($handle)) !== false) {
        yield 
$line;
    }
}

and now we can use this generator with calculate() function keeping the low memory footprint

$sum calculate(filerator("file.txt"));

So now we can tell that a generator can help with the memory usage only when the following conditions are met:

Now you can see where all the confusion comes from: "allows" doesn't mean "guarantees". Many people take "without needing to build an array in memory" for granted, while there are limitations.

If you take a look at the example on the manual page, it perfectly falls under the conditions listed above: the result of xrange() function is never accessed randomly, but only iterated over using foreach and the data can be produced one by one being generated on the fly!

Now you can see the real benefit of generators: they can provide a low memory footprint interface when an iterating over array is expected.

Where a generator really can do is to help us write a nicer code or to reuse the existing code with reduced memory consumption. Let's see how.


Related articles: