Sometimes the Pythonic way to solve a problem changes over time. As Python has evolved, so has the Pythonic way to count list items.
Let’s look at different techniques for counting the number of times things appear in a list. While analyzing these techniques, we will only be looking at code style. We’ll worry about performance later.
We will need some historical context to understand these different techniques. Fortunately we live in the __future__
and we have a time machine. Let’s jump in our DeLorean and head to 1997.
if Statement
It’s January 1, 1997 and we’re using Python 1.4. We have a list of colors and we’d love to know how many times each color occurs in this list. Let’s use a dictionary!
1 2 3 4 5 6 7 |
|
Note: we’re not using +=
because augmented assignment won’t be added until Python 2.0 and we’re not using the c in color_counts
idiom because that won’t be invented until Python 2.2!
After running this we’ll see that our color_counts
dictionary now contains the counts of each color in our list:
1 2 |
|
That was pretty simple. We just looped through each color, checked if it was in the dictionary, added the color if it wasn’t, and incremented the count if it was.
We could also write this as:
1 2 3 4 5 6 |
|
This might be a little slower on sparse lists (lists with lots of non-repeating colors) because it executes two statements instead of one, but we’re not worried about performance, we’re worried about code style. After some thought, we decide to stick with this new version.
try Block
It’s January 2, 1997 and we’re still using Python 1.4. We woke up this morning with a sudden realization: our code is practicing “Look Before You Leap” (LBYL) when we should be practicing “Easier to Ask Forgiveness, Than Permission” (EAFP) because EAFP is more Pythonic. Let’s refactor our code to use a try-except block:
1 2 3 4 5 6 7 |
|
Now our code attempts to increment the count for each color and if the color isn’t in the dictionary, a KeyError
will be raised and we will instead set the color count to 1 for the color.
get Method
It’s January 1, 1998 and we’ve upgraded to Python 1.5. We’ve decided to refactor our code to use the new get
method on dictionaries:
1 2 3 4 |
|
Now our code loops through each color, gets the current count for the color from the dictionary, defaulting this count to 0
, adds 1
to the count, and sets the dictionary key to this new value.
It’s cool that this is all one line of code, but we’re not entirely sure if this is more Pythonic. We decide this might be too clever so we revert this change.
setdefault
It’s January 1, 2001 and we’re now using Python 2.0! We’ve heard that dictionaries have a setdefault
method now and we decide to refactor our code to use this new method. We also decide to use the new +=
augmented assignment operator:
1 2 3 4 5 |
|
The setdefault
method is being called on every loop, regardless of whether it’s needed, but this does seem a little more readable. We decide that this is more Pythonic than our previous solutions and commit our change.
fromkeys
It’s January 1, 2004 and we’re using Python 2.3. We’ve heard about a new fromkeys
class method on dictionaries for constructing dictionaries from a list of keys. We refactor our code to use this new method:
1 2 3 4 |
|
This creates a new dictionary using our colors as keys, with all values set to 0
initially. This allows us to increment each key without worrying whether it has been set. We’ve removed the need for any checking or exception handling which seems like an improvement. We decide to keep this change.
Comprehension and set
It’s January 1, 2005 and we’re using Python 2.4. We realize that we could solve our counting problem using sets (released in Python 2.3 and made into a built-in in 2.4) and list comprehensions (released in Python 2.0). After further thought, we remember that generator expressions were also just released in Python 2.4 and we decide to use one of those instead of a list comprehension:
1 2 |
|
Note: we didn’t use a dictionary comprehension because those won’t be invented until Python 2.7.
This works. It’s one line of code. But is it Pythonic?
We remember the Zen of Python, which started in a python-list email thread and was snuck into Python 2.2.1. We type import this
at our REPL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Our code is more complex (O(n2) instead of O(n)), less beautiful, and less readable. That change was a fun experiment, but this one-line solution is less Pythonic than what we already had. We decide to revert this change.
defaultdict
It’s January 1, 2007 and we’re using Python 2.5. We’ve just found out that defaultdict
is in the standard library now. This should allow us to set 0
as the default value in our dictionary. Let’s refactor our code to count using a defaultdict
instead:
1 2 3 4 5 |
|
That for
loop is so simple now! This is almost certainly more Pythonic.
We realize that our color_counts
variable does act differently, however it does inherit from dict
and supports all the same mapping functionality.
1 2 |
|
Instead of converting color_counts
to a dict
, we’ll assume the rest of our code practices duck typing and leave this dict-like object as-is.
Counter
It’s January 1, 2011 and we’re using Python 2.7. We’ve been told that our defaultdict
code is no longer the most Pythonic way to count colors. A Counter
class was included in the standard library in Python 2.7 and it does all of the work for us!
1 2 3 |
|
Could this get any simpler? This must be the most Pythonic way.
Like defaultdict
, this returns a dict-like object (a dict
subclass actually), which should be good enough for our purposes, so we’ll stick with it.
1 2 |
|
After thought: Performance
Notice that we didn’t focus on efficiency for these solutions. Most of these solutions have the same time complexity (O(n)
in big O notation) but runtimes could vary based on the Python implementation.
While performance isn’t our main concern, I did measure the run-times on CPython 3.5.0 (you can measure performance yourself here). It’s interesting to see how each implementation changes in relative efficiency based on the density of color names in the list.
Conclusion
Per the Zen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message. There isn’t always one obvious way to do it. The “obvious” way can vary by time, need, and level of expertise.
“Pythonic” is a relative term.
Related Resources
- import this and the Zen of Python: Zen of Python trivia borrowed from this post
- Permission or Forgiveness: Alex Martelli discusses Grace Hopper’s EAFP
- Python How To: Group and Count with Dictionaries: while writing this post, I discovered this related article
Credits
Thanks to Brian Schrader and David Lord for proof-reading this post and Micah Denbraver for actually testing out these solutions on the correct versions of Python.