Trey Hunner

Python/Django programming, team training

Hire Me For Training

The Iterator Protocol: How for Loops Work in Python

| Comments

We’re interviewing for a job and our interviewer has asked us to remove all for loops from a block of code. They then mentioned something about iterators and cackled maniacally while rapping their fingers on the table. We’re nervous and frustrated about being assigned this ridiculous task, but we’ll try our best.

To understand how to loop without a for loop, we’ll need to discover what makes for loops tick.

We’re about to learn how for loops work in Python. Along the way we’ll need to learn about iterables, iterators, and the iterator protocol. Let’s loop. āžæ

Looping with indexes: a failed attempt

We might initially try to remove our for loops by using a traditional looping idiom from the world of C: looping with indexes.

1
2
3
4
5
colors = ["red", "green", "blue", "purple"]
i = 0
while i < len(colors):
    print(colors[i])
    i += 1

This works on lists, but it fails on sets:

1
2
3
4
5
6
7
8
9
>>> colors = {"red", "green", "blue", "purple"}
>>> i = 0
>>> while i < len(colors):
...     print(colors[i])
...     i += 1
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: 'set' object does not support indexing

This approach only works on sequences, which are data types that have indexes from 0 to one less than their length. Lists, strings, and tuples are sequences. Dictionaries, sets, and many other iterables are not sequences.

We’ve been instructed to implement a looping construct that works on all iterables, not just sequences.

Iterables: what are they?

In the Python world, an iterable is any object that you can loop over with a for loop.

Iterables are not always indexable, they don’t always have lengths, and they’re not always finite.

Here’s an infinite iterable which provides every multiple of 5 as you loop over it:

1
2
from itertools import count
multiples_of_five = count(step=5)

When we were using for loops, we could have looped over the beginning of this iterable like this:

1
2
3
4
for n in multiples_of_five:
    if n > 100:
        break
    print(n)

If we removed the break condition from that for loop, it would go on printing forever.

So iterables can be infinitely long: which means that we can’t always convert an iterable to a list (or any other sequence) before we loop over it. We need to somehow ask our iterable for each item of our iterable individually, the same way our for loop works.

Iterables & Iterators

Okay we’ve defined iterable, but how do iterables actually work in Python?

All iterables can be passed to the built-in iter function to get an iterator from them.

1
2
3
4
5
6
>>> iter(['some', 'list'])
<list_iterator object at 0x7f227ad51128>
>>> iter({'some', 'set'})
<set_iterator object at 0x7f227ad32b40>
>>> iter('some string')
<str_iterator object at 0x7f227ad51240>

That’s an interesting fact but… what’s an iterator?

Iterators have exactly one job: return the “next” item in our iterable. They’re sort of like tally counters, but they don’t have a reset button and instead of giving the next number they give the next item in our iterable.

You can get an iterator from any iterable:

1
>>> iterator = iter('hi')

And iterators can be passed to next to get their next item:

1
2
3
4
5
6
7
8
>>> next(iterator)
'h'
>>> next(iterator)
'i'
>>> next(iterator)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

So iterators can be passed to the built-in next function to get the next item from them and if there is no next item (because we reached the end), a StopIteration exception will be raised.

Iterators are also iterables

So calling iter on an iterable gives us an iterator. And calling next on an iterator gives us the next item or raises a StopIteration exception if there aren’t any more items.

There’s actually a bit more to it than that though. You can pass iterators to the built-in iter function to get themselves back. That means that iterators are also iterables.

1
2
3
4
>>> iterator = iter('hi')
>>> iterator2 = iter(iterator)
>>> iterator is iterator2
True

That fact leads to some interesting consequences that we don’t have time to go into right now. We’ll save that discussion for a future learning adventure…

The Iterator Protocol

The iterator protocol is a fancy term meaning “how iterables actually work in Python”.

Let’s redefine iterables from Python’s perspective.

Iterables:

  1. Can be passed to the iter function to get an iterator for them.
  2. There is no 2. That’s really all that’s needed to be an iterable.

Iterators:

  1. Can be passed to the next function which gives their next item or raises StopIteration
  2. Return themselves when passed to the iter function.

The inverse of these statements should also hold true. Which means:

  1. Anything that can be passed to iter without an error is an iterable.
  2. Anything that can be passed to next without an error (except for StopIteration) is an iterator.
  3. Anything that returns itself when passed to iter is an iterator.

Looping with iterators

With what we’ve learned about iterables and iterators, we should now be able to recreate a for loop without actually using a for loop.

This while loop manually loops over some iterable, printing out each item as it goes:

1
2
3
4
5
6
7
8
9
def print_each(iterable):
    iterator = iter(iterable)
    while True:
        try:
            item = next(iterator)
        except StopIteration:
            break  # Iterator exhausted: stop the loop
        else:
            print(item)

We can call this function with any iterable and it will loop over it:

1
2
3
4
>>> print_each({1, 2, 3})
1
2
3

The above function is essentially the same as this one which uses a for loop:

1
2
3
def print_each(iterable):
    for item in iterable:
        print(item)

This for loop is automatically doing what we were doing manually: calling iter to get an iterator and then calling next over and over until a StopIteration exception is raised.

The iterator protocol is used by for loops, tuple unpacking, and all built-in functions that work on generic iterables. Using the iterator protocol (either manually or automatically) is the only universal way to loop over any iterable in Python.

For loops: more complex than they seem

We’re now ready to complete the very silly task our interviewer assigned to us. We’ll remove all for loops from our code by manually using iter and next to loop over iterables. What did we learn in exploring this task?

Everything you can loop over is an iterable. Looping over iterables works via getting an iterator from an iterable and then repeatedly asking the iterator for the next item.

The way iterators and iterables work is called the iterator protocol. List comprehensions, tuple unpacking, for loops, and all other forms of iteration rely on the iterator protocol.

I’ll explore iterators more in future articles. For now know that iterators are hiding behind the scenes of all iteration in Python.

Learn more through weekly Python chats šŸŽ

Like my teaching style? Want to learn more? Sign up for attend my Weekly Python Chat events so I can answer your questions about Python, programming, and life in general.

Comments