How and when a generator can reduce the memory usage
Recently I noticed some misunderstanding regarding memory usage with generators. Several people approached me with a question, "I've got a huge array, how can I save the memory with generators?". At first I wondered, why would people use a generator for this, if there are more conventional methods, you know - reading from the database a line after line, and such? And then I noticed the phrasing on the manual page:
"A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate".
which, although being technically correct, doesn't clearly sate the limitations for this method.
Finally, I came across some article that gone as far as bluntly telling you that
"if you are building huge arrays in your application which cause memory issues on the server, so yield suits your case".
Which sounds quite exciting on the first glance, but, unfortunately, is even farther from being practical.
Let's see how and when we can reduce the memory usage with generators.
First of all, in case we need the full array functionality, such as random access, no generator-based solution is even possible. This is why the man page doesn't say "array" but just a "set of data". This is important, and sadly, not every article explaining generators follows this strict phrasing.
And of course generators don't add some magical functionality in regard of the memory usage. When the memory usage can be reduced, it can be achieved by other means, using more traditional tools. I have to emphasize: it is not a generator that reduces the memory usage, but some good old loop that returns one value at a time. You can always have it without any generators, just like it shown in the first example below.
Well, what's all this stuff about memory and generators then? Let's see:
Imagine we have a file consists of 1M lines with numbers that need to be used in some calculation. Of course we can write a regular while
loop to reduce the memory usage:
$sum = 0;
$handle = fopen("file.txt", "r");
while (($num = fgets($handle)) !== false) {
$sum += $num; // here goes our calculation
}
Fast, clean, low memory usage. No generators involved.
But imagine we've got an existing function that already does the required calculation but it accepts an array which is then iterated over using foreach
:
function calculate($array) {
$result = 0;
foreach ($array as $item) {
$result += $item;
}
return $result;
}
When using traditional tools, we are bound to waste a lot of memory
$sum = calculate(file("file.txt"));
as we are inevitably reading all the file contents into array.
And only here generators are to the rescue!
Using a generator, we can create a file()
function substitute that doesn't read the entire contents in memory, but instead returns the lines one by one
function filerator($filename) {
$handle = fopen($filename, "r");
while (($line = fgets($handle)) !== false) {
yield $line;
}
}
and now we can use this generator with calculate()
function keeping the low memory footprint
$sum = calculate(filerator("file.txt"));
So now we can tell that a generator can help with the memory usage only when the following conditions are met:
- if we don't need the full array functionality, such as random access
- if, for some reason, we have to iterate over the data set using
foreach
- if the data for such a set can be produced one by one, either being generated on the fly or taken from the external source, such as a file or a database
Now you can see where all the confusion comes from: "allows" doesn't mean "guarantees". Many people take "without needing to build an array in memory" for granted, while there are limitations.
If you take a look at the example on the manual page, it perfectly falls under the conditions listed above: the result of xrange()
function is never accessed randomly, but only iterated over using foreach and the data can be produced one by one being generated on the fly!
Now you can see the real benefit of generators: they can provide a low memory footprint interface when an iterating over array is expected.
Where a generator really can do is to help us write a nicer code or to reuse the existing code with reduced memory consumption. Let's see how.
Related articles:
- Relative and absolute paths, in the file system and on the web server.
- PHP error reporting
- Do you really need to check for both isset() and empty() at the same time?
- What's wrong with popular articles telling you that foo is faster than bar?
- MVC in simpler terms or the structure of a modern web-application
- How to get a single player's rank based on the score
- Operator precedence or how does 'or die()' work.
- Do you abuse the null coalescing operator (and isset/empty as well)?
- Numerical strings comparison
- Why should I use prepared statements if escaping is safe?
- Articles
- How to make Stack Overflow a nice place
Add a comment
Please refrain from sending spam or advertising of any sort.
Messages with hyperlinks will be pending for moderator's review.
Markdown is now supported:
>
before and an empty line after for a quote