Sometimes the Pythonic way to solve a problem changes over time. As Python has evolved, so has the Pythonic way to count list items.
Let’s look at different techniques for counting the number of times things appear in a list. While analyzing these techniques, we will only be looking at code style. We’ll worry about performance later.
We will need some historical context to understand these different techniques. Fortunately we live in the
__future__ and we have a time machine. Let’s jump in our DeLorean and head to 1997.
It’s January 1, 1997 and we’re using Python 1.4. We have a list of colors and we’d love to know how many times each color occurs in this list. Let’s use a dictionary!
1 2 3 4 5 6 7
After running this we’ll see that our
color_counts dictionary now contains the counts of each color in our list:
That was pretty simple. We just looped through each color, checked if it was in the dictionary, added the color if it wasn’t, and incremented the count if it was.
We could also write this as:
1 2 3 4 5 6
This might be a little slower on sparse lists (lists with lots of non-repeating colors) because it executes two statements instead of one, but we’re not worried about performance, we’re worried about code style. After some thought, we decide to stick with this new version.
It’s January 2, 1997 and we’re still using Python 1.4. We woke up this morning with a sudden realization: our code is practicing “Look Before You Leap” (LBYL) when we should be practicing “Easier to Ask Forgiveness, Than Permission” (EAFP) because EAFP is more Pythonic. Let’s refactor our code to use a try-except block:
1 2 3 4 5 6 7
Now our code attempts to increment the count for each color and if the color isn’t in the dictionary, a
KeyError will be raised and we will instead set the color count to 1 for the color.
It’s January 1, 1998 and we’ve upgraded to Python 1.5. We’ve decided to refactor our code to use the new
get method on dictionaries:
1 2 3 4
Now our code loops through each color, gets the current count for the color from the dictionary, defaulting this count to
1 to the count, and sets the dictionary key to this new value.
It’s cool that this is all one line of code, but we’re not entirely sure if this is more Pythonic. We decide this might be too clever so we revert this change.
It’s January 1, 2001 and we’re now using Python 2.0! We’ve heard that dictionaries have a
setdefault method now and we decide to refactor our code to use this new method. We also decide to use the new
+= augmented assignment operator:
1 2 3 4 5
setdefault method is being called on every loop, regardless of whether it’s needed, but this does seem a little more readable. We decide that this is more Pythonic than our previous solutions and commit our change.
It’s January 1, 2004 and we’re using Python 2.3. We’ve heard about a new
fromkeys class method on dictionaries for constructing dictionaries from a list of keys. We refactor our code to use this new method:
1 2 3 4
This creates a new dictionary using our colors as keys, with all values set to
0 initially. This allows us to increment each key without worrying whether it has been set. We’ve removed the need for any checking or exception handling which seems like an improvement. We decide to keep this change.
Comprehension and set
It’s January 1, 2005 and we’re using Python 2.4. We realize that we could solve our counting problem using sets (released in Python 2.3 and made into a built-in in 2.4) and list comprehensions (released in Python 2.0). After further thought, we remember that generator expressions were also just released in Python 2.4 and we decide to use one of those instead of a list comprehension:
Note: we didn’t use a dictionary comprehension because those won’t be invented until Python 2.7.
This works. It’s one line of code. But is it Pythonic?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Our code is more complex (O(n2) instead of O(n)), less beautiful, and less readable. That change was a fun experiment, but this one-line solution is less Pythonic than what we already had. We decide to revert this change.
It’s January 1, 2007 and we’re using Python 2.5. We’ve just found out that
defaultdict is in the standard library now. This should allow us to set
0 as the default value in our dictionary. Let’s refactor our code to count using a
1 2 3 4 5
for loop is so simple now! This is almost certainly more Pythonic.
We realize that our
color_counts variable does act differently, however it does inherit from
dict and supports all the same mapping functionality.
Instead of converting
color_counts to a
dict, we’ll assume the rest of our code practices duck typing and leave this dict-like object as-is.
It’s January 1, 2011 and we’re using Python 2.7. We’ve been told that our
defaultdict code is no longer the most Pythonic way to count colors. A
Counter class was included in the standard library in Python 2.7 and it does all of the work for us!
1 2 3
Could this get any simpler? This must be the most Pythonic way.
defaultdict, this returns a dict-like object (a
dict subclass actually), which should be good enough for our purposes, so we’ll stick with it.
After thought: Performance
Notice that we didn’t focus on efficiency for these solutions. Most of these solutions have the same time complexity (
O(n) in big O notation) but runtimes could vary based on the Python implementation.
While performance isn’t our main concern, I did measure the run-times on CPython 3.5.0 (you can measure performance yourself here). It’s interesting to see how each implementation changes in relative efficiency based on the density of color names in the list.
Per the Zen of Python, “there should be one– and preferably only one– obvious way to do it”. This is an aspirational message. There isn’t always one obvious way to do it. The “obvious” way can vary by time, need, and level of expertise.
“Pythonic” is a relative term.
- import this and the Zen of Python: Zen of Python trivia borrowed from this post
- Permission or Forgiveness: Alex Martelli discusses Grace Hopper’s EAFP
- Python How To: Group and Count with Dictionaries: while writing this post, I discovered this related article