I wrote an article sometime ago on the iterator protocol that powers Python’s for
loops.
One thing I left out of that article was how to make your own iterators.
In this article I’m going to discuss why you’d want to make your own iterators and then show you how to do so.
What is an iterator?
First let’s quickly address what an iterator is. For a much more detailed explanation, consider watching my Loop Better talk or reading the article based on the talk.
An iterable is anything you’re able to loop over.
An iterator is the object that does the actual iterating.
You can get an iterator from any iterable by calling the built-in iter
function on the iterable.
1 2 3 |
|
You can use the built-in next
function on an iterator to get the next item from it (you’ll get a StopIteration
exception if there are no more items).
1 2 3 4 5 6 |
|
There’s one more rule about iterators that makes everything interesting: iterators are also iterables and their iterator is themselves. I explain the consequences of that more fully in that Loop Better talk I mentioned above.
Why make an iterator?
Iterators allow you to make an iterable that computes its items as it goes. Which means that you can make iterables that are lazy, in that they don’t determine what their next item is until you ask them for it.
Using an iterator instead of a list, set, or another iterable data structure can sometimes allow us to save memory.
For example, we can use itertools.repeat
to create an iterable that provides 100 million 4
’s to us:
1 2 |
|
This iterator takes up 56 bytes of memory on my machine:
1 2 3 |
|
An equivalent list of 100 million 4
’s takes up many megabytes of memory:
1 2 3 4 |
|
While iterators can save memory, they can also save time. For example if you wanted to print out just the first line of a 10 gigabyte log file, you could do this:
1 2 |
|
File objects in Python are implemented as iterators.
As you loop over a file, data is read into memory one line at a time.
If we instead used the readlines
method to store all lines in memory, we might run out of system memory.
So iterators can save us memory, but iterators can sometimes save us time also.
Additionally, iterators have abilities that other iterables don’t. For example, the laziness of iterators can be used to make iterables that have an unknown length. In fact, you can even make infinitely long iterators.
For example, the itertools.count
utility will give us an iterator that will provide every number from 0
upward as we loop over it:
1 2 3 4 5 6 7 8 |
|
That itertools.count
object is essentially an infinitely long iterable.
And it’s implemented as an iterator.
Making an iterator: the object-oriented way
So we’ve seen that iterators can save us memory, save us CPU time, and unlock new abilities to us.
Let’s make our own iterators.
We’ll start be re-inventing the itertools.count
iterator object.
Here’s an iterator implemented using a class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
This class has an initializer that initializes our current number to 0
(or whatever is passed in as the start
).
The things that make this class usable as an iterator are the __iter__
and __next__
methods.
When an object is passed to the str
built-in function, its __str__
method is called.
When an object is passed to the len
built-in function, its __len__
method is called.
1 2 3 4 5 |
|
Calling the built-in iter
function on an object will attempt to call its __iter__
method.
Calling the built-in next
function on an object will attempt to call its __next__
method.
The iter
function is supposed to return an iterator.
So our __iter__
function must return an iterator.
But our object is an iterator, so should return ourself.
Therefore our Count
object returns self
from its __iter__
method because it is its own iterator.
The next
function is supposed to return the next item in our iterator or raise a StopIteration
exception when there are no more items.
We’re returning the current number and incrementing the number so it’ll be larger during the next __next__
call.
We can manually loop over our Count
iterator class like this:
1 2 3 4 5 |
|
We could also loop over our Count
object like using a for
loop, as with any other iterable:
1 2 3 4 5 6 7 |
|
This object-oriented approach to making an iterator is cool, but it’s not the usual way that Python programmers make iterators. Usually when we want an iterator, we make a generator.
Generators: the easy way to make an iterator
The easiest ways to make our own iterators in Python is to create a generator.
There are two ways to make generators in Python.
Given this list of numbers:
1
|
|
We can make a generator that will lazily provide us with all the squares of these numbers like this:
1 2 3 4 5 |
|
Or we can make the same generator like this:
1
|
|
The first one is called a generator function and the second one is called a generator expression.
Both of these generator objects work the same way.
They both have a type of generator
and they’re both iterators that provide squares of the numbers in our numbers list.
1 2 3 4 5 6 |
|
We’re going to talk about both of these approaches to making a generator, but first let’s talk about terminology.
The word “generator” is used in quite a few ways in Python:
- A generator, also called a generator object, is an iterator whose type is
generator
- A generator function is a special syntax that allows us to make a function which returns a generator object when we call it
- A generator expression is a comprehension-like syntax that allows you to create a generator object inline
With that terminology out of the way, let’s take a look at each one of these things individually. We’ll look at generator functions first.
Generator functions
Generator functions are distinguished from plain old functions by the fact that they have one or more yield
statements.
Normally when you call a function, its code is executed:
1 2 3 4 5 6 7 8 |
|
But if the function has a yield
statement in it, it isn’t a typical function anymore.
It’s now a generator function, meaning it will return a generator object when called.
That generator object can be looped over to execute it until a yield
statement is hit:
1 2 3 4 5 6 7 8 9 10 11 |
|
The mere presence of a yield
statement turns a function into a generator function.
If you see a function and there’s a yield
, you’re working with a different animal.
It’s a bit odd, but that’s the way generator functions work.
Okay let’s look at a real example of a generator function.
We’ll make a generator function that does the same thing as our Count
iterator class we made earlier.
1 2 3 4 5 |
|
Just like our Count
iterator class, we can manually loop over the generator we get back from calling count
:
1 2 3 4 5 |
|
And we can loop over this generator object using a for
loop, just like before:
1 2 3 4 5 6 7 |
|
But this function is considerably shorter than our Count
class we created before.
Generator expressions
Generator expressions are a list comprehension-like syntax that allow us to make a generator object.
Let’s say we have a list comprehension that filters empty lines from a file and strips newlines from the end:
1 2 3 4 5 |
|
We could create a generator instead of a list, by turning the square brackets of that comprehension into parenthesis:
1 2 3 4 5 |
|
Just as our list comprehension gave us a list back, our generator expression gives us a generator object back:
1 2 3 4 5 6 |
|
Generator expressions use a shorter inline syntax compared to generator functions. They’re not as powerful though.
If you can write your generator function in this form:
1 2 3 4 |
|
Then you can replace it with a generator expression:
1 2 3 4 5 6 |
|
If you *can’t write your generator function in that form, then you can’t create a generator expression to replace it.
Note that we’ve changed the example we’re using because we can’t use a generator expression for our previous example (our example that re-implements itertools.count
).
Generator expressions vs generator functions
You can think of generator expressions as the list comprehensions of the generator world.
If you’re not familiar with list comprehensions, I recommend reading my article on list comprehensions in Python.
I note in that article that you can copy-paste your way from a for
loop to a list comprehension.
You can also copy-paste your way from a generator function to a function that returns a generator expression:
Generator expressions are to generator functions as list comprehensions are to a simple for
loop with an append and a condition.
Generator expressions are so similar to comprehensions, that you might even be tempted to say generator comprehension instead of generator expression. That’s not technically the correct name, but if you say it everyone will know what you’re talking about. Ned Batchelder actually proposed that we should all start calling generator expressions generator comprehensions and I tend to agree that this would be a clearer name.
So what’s the best way to make an iterator?
To make an iterator you could create an iterator class, a generator function, or a generator expression. Which way is the best way though?
Generator expressions are very succinct, but they’re not nearly as flexible as generator functions. Generator functions are flexible, but if you need to attach extra methods or attributes to your iterator object, you’ll probably need to switch to using an iterator class.
I’d recommend reaching for generator expressions the same way you reach for list comprehensions. If you’re doing a simple mapping or filtering operation, a generator expression is a great solution. If you’re doing something a bit more sophisticated, you’ll likely need a generator function.
I’d recommend using generator functions the same way you’d use for
loops that append to a list.
Everywhere you’d see an append
method, you’d often see a yield
statement instead.
And I’d say that you should almost never create an iterator class. If you find you need an iterator class, try to write a generator function that does what you need and see how it compares to your iterator class.
Generators can help when making iterables too
You’ll see iterator classes in the wild, but there’s rarely a good opportunity to write your own.
While it’s rare to create your own iterator class, it’s not as unusual to make your own iterable class.
And iterable classes require a __iter__
method which returns an iterator.
Since generators are the easy way to make an iterator, we can use a generator function or a generator expression to create our __iter__
methods.
For example here’s an iterable that provides x-y coordinates:
1 2 3 4 5 6 |
|
Note that our Point
class here creates an iterable when called (not an iterator).
That means our __iter__
method must return an iterator.
The easiest way to create an iterator is by making a generator function, so that’s just what we did.
We stuck yield
in our __iter__
to make it into a generator function and now our Point
class can be looped over, just like any other iterable.
1 2 3 4 5 6 |
|
Generator functions are a natural fit for creating __iter__
methods on your iterable classes.
Generators are the way to make iterators
Dictionaries are the typical way to make a mapping in Python. Functions are the typical way to make a callable object in Python. Likewise, generators are the typical way to make an iterator in Python.
So when you’re thinking “it sure would be nice to implement an iterable that lazily computes things as it’s looped over,” think of iterators.
And when you’re considering how to create your own iterator, think of generator functions and generator expressions.
Practice making an iterator right now
You won’t learn new Python skills by reading, you’ll learn them by writing code.
If you’d like to practice making an iterator right now, sign up for Python Morsels using the form below and I’ll immediately give you an exercise to practice making an iterator.