Haskell’s foldl and foldr Explained

If you want to (or have to) learn Haskell, you can’t avoid the two functions foldl and foldr. They are two of the most powerful tools in Haskell’s toolbox. However, they continue to confuse beginners and experts alike, so here’s my take on making these functions a little more accessible.

What do foldl and foldr do? They both fold a list to a single value, albeit in different ways.¹ Folding, in this context, is synonymous to reducing, a term you might already know if you’re familiar with Python. If you never heard of any of these terms, think of the addition or, put differently, the operator + which reduces a list of numbers with length two to a single value: the sum of the two operands (that is, the result of the addition). Using foldl and/or foldr, you can implement every other function. In fact, many functions use foldl or foldr internally—that’s how they are implemented in the first place! Both foldl and foldr are so mighty, you could merely think of them as extremely powerful low-level functions used by those functions you normally use. You could wonder why you should bother understanding low-level code and learning the interna of your high-level abstractions, when you can also use the high-level abstractions themselves. But that would do those functions injustice. When you really understand foldl and foldr, you can use these mighty functions to implement your own high-level abstractions and harness their power to make your Haskell code so much better and faster.

Disclaimer: I myself am only just beginning to learn Haskell, so there’s no guarantee that everything I state is actually correct.

Comparison Between Haskell and Python

If you know Python, Python’s reduce function would be the analogon to a slight variation of Haskell’s foldl. I will now briefly go off at a tangent because I presume that you a more proficient in Python than you are in Haskell. I figure that laying a groundwork first and building upon it will help you understand Haskell better. If you’re not interested in this tangent, skip the rest of this section and jump directly to this section.

Let’s examine an example. In Python, the function call sum([1,2,3,4,5]) of course calculates the value 15. You could achieve the same goal writing the Python code

reduce(lambda x, y: x+y, [1,2,3,4,5])

instead, which calculates ((((1+2)+3)+4)+5) which is also 15. “Hold on a minute. How exactly does this work?” First, lambda x, y: x+y creates a function that simply adds two values x and y. Let’s name this function add_two_values. Now we provide this function add_two_values as an argument to reduce. That is, we call reduce(add_two_values, [1,2,3,4,5]). This is what’s known as higher-order functions. One function serves as parameter of a different function. After all, we could also want to calculate the product 1*2*3*4*5=120 and provide reduce with a self-defined function named multiply_two_values. In this case, however, we want to calculate the sum and thus provide our function add_two_values. According to the documentation, internally, a variable—let’s call it result—would be initiated with the first list item. That is, the first step is to say result=1, except that this internal variable doesn’t really have a name. Then, in a for-loop we would add list item for list item to that internal variable. We begin with 1+2 which equals 3, then continue with 3+3 which equals 6, and so on, until we return result=15. This is almost like foldl in Haskell works.

foldl and foldl'

Why do I keep saying almost and slight variation? Because foldl doesn’t immediately evaluate the intermediate results. That is, it doesn’t add 1+2 to 3 and then 3+3, et cetera, like Python’s reduce does. Instead, foldl keeps 1+2 as 1+2. Adding 3 to that would then generate (1+2)+3 instead of 3+3. The foldl function builds the so-called thunk chain until we get ((((1+2)+3)+4)+5) (instead of (10+5)). This is because of a thing in Haskell called lazy evaluation. And in contrast to Python’s reduce function, foldl first has to build the entire thunk chain before it can start reducing that thunk chain. This means that we have to allocate memory on the heap for all elements of the list until the thunk chain is finally established. Only then can we start reducing the thunk chain. Whereas Python’s reduce and Haskell’s foldl' use an accumulator in order to immediately reduce these expressions. More on that in a second. Haskell’s equivalent to the way Python evaluates that function call would be foldl' (notice the apostrophe). In Haskell, we use the variable name together with an ' at the end when the implementation is so similar and only such a slight variation, that this new function doesn’t deserve a name of its own. In Haskell, foldl' is way more efficient than foldl because you don’t have to first build up a huge thunk chain before you can finally start reducing the expression. As I said, with foldl you have to first allocate memory on the heap for every single list item until you have the finished thunk chain. The heap is limited only by the amount of RAM your computer has available and the swap. Basically, for large lists Haskell will eat all your RAM and for very large lists, your computer will become very slow because the content of your exhausted RAM has to be shoveled to the relatively slow SSD. Only when the thunk chain is built in its entirety can you start reducing it. The entire thunk chain in our example is ((((1+2)+3)+4)+5).

But wait! It get’s worse. The foldl function now has to work its way from the outside (((...)+5)) to the inside ((1+2)) again. That is, before you can reduce something something something plus 5, you first have to know what that something is. This is akin to mathematics where you can’t know the result of $x+5$ if you never come to know what $x$ is. Thus, Haskell first has to examine what that unknown something is and before it can do that, it first has to push the value 5 to the stack (that is, save that value for later). Now it’s the same all over again, only this time with the value 4. That 4 too gets pushed to the stack. We repeat that same procedure and push every value we encounter to the stack for the time being until we finally arrive at 1+2. Finally the + operator now has two known operands instead of an unknown something plus a known value. Finally the + operator can do it’s job and actually calculate a result. Once we’ve arrived at this point, we can walk forward again, retrieve those values saved on the stack as we go and finally determine the ultimate result, in the same fashion reduce and foldl' would have done it from the getgo.

Similarly to how you can’t peel an onion from the inside out, or how you can’t open a matryoshka doll before you open the doll in which it is contained, you have to start at the outermost layer. Once you arrived at the innermost layer of the thunk chain (here (1+2)) and pushed all other operands (here, 5, 4, 3, in that order) to the stack in the meantime,

((((1+2)+3)+4)+5)

becomes

(((3+3)+4)+5)

becomes

((6+4)+5)

becomes

(10+5)

becomes

(15)

The catch is … if we arrived at that point 1+2 at all. Another problem we’d encounter with foldl is that for very large lists, we’d have to push many, many values to the stack before we reach the innermost expression 1+2. This can lead to a stack overflow before we even get to 1+2. That means two things:

not only did Haskell eat all our RAM and thereby made our computer very sluggish while building the thunk chain in the first place
when we traverse the thunk chain from its end back to its beginning in order to evaluate/reduce the chain, we sometimes have to save so many items for later that the entire Haskell program crashes due to a stack overflow

As we now know, foldl' is much better than the regular foldl. So much so, that in reality, the decision is not between foldr and foldl but between foldr and foldl'. We usually don’t use foldl at all for this reason. However, foldl' also has its disadvantage. Lazy evaluation is one of Haskell’s most powerful and elegant features, and foldl' can’t make use of it. However … although Haskell is extremely smart and although lazy evaluation is an extremely smart feature, in some very rare cases Haskell can be too smart for its own good. This is one of those rare cases. With foldl' we tell Haskell to not use lazy evaluation.

If you want to know whether it’s bad to give up lazy evaluation or what the implications are, check out this site. Example: let’s define a weird multiplication operator and name it ?.

_ ? 0 = 0
x ? y = x*y

The important takeaway is that

0 ? undefined hits an exception because you can’t multiply with y if y equals undefined
undefined ? 0, however, returns 0 because we said that in case the second operand is 0, we return 0, no matter what the first operand is

Now consider the following example:

foldl' (?) 1 [2,3,undefined,5,0]
== foldl' (?) 6 [undefined,5,0]

We start with foldl' (?) 1 [2,3,undefined,5,0] and, after some number of steps, we arrive at foldl' (?) 6 [undefined,5,0].

This leads to an exception, since the next step would be to multiply 6 with undefined.

In contrast, the regular foldl wouldn’t even try to evaluate the undefined value and would just skip it until the entire thunk chain is built, or rather, until the time has come that Haskell absolutely needs to evaluate undefined (that’s the power of lazy evaluation):

foldl (?) 1 [2,3,undefined,5,0]
== ((((1 ? 2) ? 3) ? undefined) ? 5) ? 0

As soon as the complete thunk chain is established, we can start reducing it. And the very first step in this reduction is to apply our operator ? to something and 0. Since the second operand of ? is 0, we return 0. (Notice that while with the addition, we had to push 0 to the stack and save it for later when we knew what that something was, with ? we don’t need to push 0 to the stack. The implementation of ? is clear: when the second operand is 0, return 0. We don’t need to first know what that something on the left side of ? is. Thus we never evaluate undefined because we never have to and foldl never hits an exception. That’s lazy evaluation.)

Therefore, if you thought of some clever trick that needs to make use of lazy evaluation in order to work, use foldl, otherwise use foldl'.

foldl and foldr

Well, this was a long prelude (Ha! See what I did there?) to foldl and foldr. We wanted to talk Haskell, not Python. We wanted to talk the difference between foldl and foldr. You would surely like to hear easy ways to memorize their differences. And we will do that. Nonetheless, I think now that we laid the groundwork and have a very basic understanding of foldl, we will have a much easier time grasping the things to come.

Differences Between foldl and foldr

How exactly does foldl in Haskell work? In Python’s reduce, you can provide an optional initializer value. In Haskell, this initial value is non-optional. In Python, when you provide an initializer (for example, 100), the result variable will not be initialized with the first list item but with that initial value. The call reduce(add_two_values, [1,2,3,4,5], 100) would return 115 instead of 15. Similarly, reduce(add_two_values, [1,2,3,4,5], 0) is equivalent to reduce(add_two_values, [1,2,3,4,5]) (that is, to omitting the initial value). In Haskell, being a functional language by design, we don’t need a lambda expression to express the function add_two_values. We can simply use the + operator. Therefore, in Haskell notation, the equivalent to our sum implementation written in Python as reduce(add_two_values, [1,2,3,4,5]) (apart from that foldl vs foldl' thing) would be:

foldl (+) 0 [1,2,3,4,5]

Now, what’s the difference between this and foldr, i.e.

foldr (+) 0 [1,2,3,4,5]

They both calculate 15. And since it’s only addition which is an associative and commutative operation, in this case it makes absolutely no matter whether we use foldl or foldr. However, there are functions (for example, subtraction or division) where the distinction between foldl and foldr is crucial. More on that in a moment.

Here’s one very simple way to memorize the difference between foldl and foldr. My professor would most certainly not approve of it, but between you and me it’s okay to be a little informal and toss the formalities aside to simplify things. With that being said, imagine it this way: no matter whether you examine foldl or foldr, write down the individual list elements in their original order.

With foldl you then place the initial value (here 0) at the left-most position and fill the spaces between the numbers with left-to-right arrows.
With foldr you then place the initial value (here 0) at the right-most position and fill the spaces between the numbers with right-to-left arrows.

This gives you the following ordering:

foldl: 0→1→2→3→4
foldr: 1←2←3←4←0

In both cases, in accordance with the direction of the arrows, the initial value (here 0) comes before the list element that is first viewed.

In other words:

With foldl you go through the list (here [1,2,3,4]) from left to right.
With foldr you come at the list from the right side, but rather than thinking like a Westerner and writing the list elements down in their reversed order as 4 3 2 1 to conform them to your usual way of thinking you must think in a right-to-left fashion, like an Arabic or Hebrew person, and write down the list elements in their original order and only reverse mentally the directions of imagined arrows.

Now put brackets around those two numbers that come first (be mindful, first does not mean left-most here). When you then treat this bracketed expression as one single value, you can think of it as always bracketing the first two values. You will then get this:

foldl: ((((0→1)→2)→3)→4)
foldr: (1←(2←(3←(4←0))))

Now replace the imaginary arrow with your function and you have the infix notation (for example, 5 `mod` 2 rather than mod 5 2 or, similarly, 5 + 2 rather than (+) 5 2). For our example, this will give you:

foldl (+) 0 [1,2,3,4] == ((((0+1)+2)+3)+4)

foldr (+) 0 [1,2,3,4] == (1+(2+(3+(4+0))))

Why the Differentiation Is So Important

Now, why and when is this important? As I said, for simple calculations that are associative as well as commutative such as addition or multiplication, there’s no mathematical difference. The result is the same, regardless whether we use left-to-right associativity or right-to-left associativity (and reposition the initial value). However, consider the subtraction or, in this case, the division:

foldl (/) 0 [1,2,3,4] == ((((0 / 1) / 2) / 3) / 4)

Nothing surprising here, foldl returns 0.0. Now comes the interesting part:

foldr (/) 0 [1,2,3,4] == (1 / (2 / (3 / (4 / 0))))

Division by zero! 😱 What’s even more surprising: this doesn’t cause an error. Instead, the result of 4/0 is treated as Infinity. And therefore, the result of that function call depends on the length of the list, since in the following steps you alternatingly divide by Infinity or 0.0. Wow! The above foldr with an even number of list elements returns 0.0 whereas foldr with an odd number of list elements returns Infinity … two completely opposite results!

A Simple Reordering

The infix notation is familiar. We’re used to it since first grade. Unfortunately, functions in Haskell typically don’t use the infix notation but the prefix notation. So let’s rewrite the notation in the above example evaluations to make it more general and align it with how code would look like in production. To keep it clear, let’s also replace (+) with addTwoValues or, more generally, f. Otherwise, the many ( and ) characters introduced by (+) would only unnecessarily add visual clutter.

foldl f 0 [1,2,3,4] == (f (f (f (f 0 1) 2) 3) 4)

foldr f 0 [1,2,3,4] == (f 1 (f 2 (f 3 (f 4 0))))

This basically really only is a reordering of the above.

How to Memorize foldl and foldr

Now that we have these examples clear before our eyes, we can ponder an alternative way to memorize folding/reducing which is probably more formally correct and would more likely be accepted by my professor.

foldl

With foldl, you walk through the list from left to right.

The left argument is the result of the previous iteration. In the first iteration, it’s the initial value.
The right argument is the current list element.

The result of the current iteration becomes the left argument of the next iteration.

In our example, we start with the left-most list element (that is, 1). The “result” of the (non-existing) iteration prior to the first iteration is the initial value (that is, 0). This means:

f 0 1

becomes

f (f 0 1) 2

becomes

f (f (f 0 1) 2) 3

becomes

f (f (f (f 0 1) 2) 3) 4

where f here equals to (+).

foldr

With foldr, you walk through the list from right to left.

The left argument is the current list element.
The right argument is the result of the previous iteration. In the first iteration, it’s the initial value.

The result of the current iteration becomes the right argument of the next iteration.

We start with the right-most list element (that is, 4). The “result” of the (non-existing) iteration prior to the first iteration is the initial value (that is, 0). This means:

f 4 0

becomes

f 3 (f 4 0)

becomes

f 2 (f 3 (f 4 0))

becomes

f 2 (f 3 (f 4 0))

becomes

f 1 (f 2 (f 3 (f 4 0)))

where f here equals to (+).

Verification

If we compare this approach to the other approach presented earlier, we can see that the resulting structures of the evaluated expressions are identical.

Final Words

I hope you now have a better understanding of foldl and foldr in Haskell. If you could learn something from this article, consider showing some appreciation for the time I invested to write this article. Leave a comment and tell me what you think, or contact me on Twitter. If you follow me, you will also be notified when I publish my next guide. Thanks, and be well.

More precisely, not just lists but any data structure of the type Foldable (see documentation). ↩︎