IEnumerable Performance Gotcha
CSharp
Consider the following C# code:
static void Main(string[] args) {
IEnumerable<int> numbers = Enumerable.Range(1, 5).Select(n => {
Console.WriteLine($"Performing costly computation on {n}");
return n;
});
for (int i = 0; i < 2; i++) {
foreach (var n in numbers) {
// Select for each number executes here
}
Console.WriteLine("\n-------------------\n");
}
}
Here we have an IEnumerable instance, which for each element an expensive operation is performed. This IEnumerable instance is then iterated over twice. At first glance, this doesn’t appear to be a big deal, so let’s look at the output from this code:
Performing costly computation on 1
Performing costly computation on 2
Performing costly computation on 3
Performing costly computation on 4
Performing costly computation on 5
-------------------
Performing costly computation on 1
Performing costly computation on 2
Performing costly computation on 3
Performing costly computation on 4
Performing costly computation on 5
-------------------
From the output we can see that the costly computations are performed twice as many times as necessary. Enumerating over the same IEnumerable more than once can easily lead to inefficient code.
Java
Now let’s look at how we might implement this using Java Streams:
public static void main(String[] args) {
IntStream numbers = IntStream.range(1, 6).map(n -> {
System.out.println(String.format("Performing costly computation on %s", n));
return n;
});
for (int i = 0; i < 2; i++) {
numbers.forEach(n -> {
// map for each number executes here
});
}
}
The output from executing this code is as follows:
Performing costly computation on 1
Performing costly computation on 2
Performing costly computation on 3
Performing costly computation on 4
Performing costly computation on 5
Exception in thread "main" java.lang.IllegalStateException: stream has already been operated upon or closed
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:229)
at java.util.stream.IntPipeline.forEach(IntPipeline.java:404)
at com.jkspad.blog.Main.main(Main.java:41)
Process finished with exit code 1
See what happened there? Java Streams just saved us from a whole lot of performance pain :)
A More Realistic CSharp Example
In real life code, things are rarely as straight forward as the example provided above. What is more likely to happen is that we decide to iterate multiple times over an IEnumerable provided by an external library. Then in turn, each time we iterate, not only do we have the chance of performance issues related to accessing each element, but more likely when IEnumerable.GetEnumerator() is called (which it will be for each iteration of the outer for loop), that call could be extremely expensive. For example, it could be performing complex queries on a back-end database.
The Fix
The fix is fairly trivial, but we need to remember this, or rely on our IDE to give us helpful hints. We simply need to force the deferred execution by converting the underlying collection to an array, and then we can iterate over that array until the Cows come home:
static void Main(string[] args) {
IEnumerable<int> numbers = Enumerable.Range(1, 5).Select(n => {
Console.WriteLine($"Performing costly computation on {n}");
return n;
}).ToArray(); // <= FIX
for (int i = 0; i < 2; i++) {
foreach (var n in numbers) {
// Select for each number executes here
}
Console.WriteLine("\n-------------------\n");
}
}
The output for the fixed code is:
Performing costly computation on 1
Performing costly computation on 2
Performing costly computation on 3
Performing costly computation on 4
Performing costly computation on 5
-------------------
-------------------