I’ve created dozens of Python Morsels since I started it last year. At this point at least 10 of these exercises involve making a custom collection: often a dict-like, list-like or set-like class.
Since each Python Morsels solutions email involves a walk-through of many ways to solve the same problem, I’ve solved each of these in many ways.
I’ve solved these:
- manually with
__dunder__
methods - with the abstract base classes in collections.abc
- with collections.UserDict and collections.UserList
- by inheriting from
list
,dict
, andset
directly
While creating and solving many exercises involving custom collections, I’ve realized that inheriting from list
, dict
, and set
is often subtly painful.
I’m writing this article to explain why I often don’t recommend inheriting from these built-in classes in Python.
My examples will focus on dict
and list
since those are likely more commonly sub-classed.
Making a custom dictionary
We’d like to make a dictionary that’s bi-directional. When a key-value pair is added, the key maps to the value but the value also maps to the key.
There will always be an even number of elements in this dictionary.
And if d[k] == v
is True
then d[v] == k
will always be True
also.
We could try to implement this by customizing deletion and setting of key-value pairs.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Here we’re ensuring that:
- deleting keys will delete their corresponding values as well
- whenever we set a new value for
k
, that any existing value will be removed properly - whenever we set a key-value pair, that the corresponding value-key pair will be set too
Setting and deleting items from this bi-directional dictionary seems to work as we’d expect:
1 2 3 4 5 6 7 |
|
But calling the update
method on this dictionary leads to odd behavior:
1 2 3 4 5 |
|
Adding 9: 7
should have removed 7: 6
and 6: 7
and adding 8: 2
should have removed 3: 8
and 8: 3
.
We could fix this with a custom update
method:
1 2 3 4 5 |
|
But calling the initializer doesn’t work either:
1 2 3 |
|
So we’ll make a custom initializer that calls update
:
1 2 |
|
But pop
doesn’t work:
1 2 3 4 5 6 7 8 |
|
And neither does setdefault
:
1 2 3 4 5 |
|
The problem is the pop
method doesn’t actually call __delitem__
and the setdefault
method doesn’t actually call __setitem__
.
If we wanted to fix this problem, we have to completely re-implement pop
and setdefault
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
This is all very tedious though.
When inheriting from dict
to create a custom dictionary, we’d expect update
and __init__
would call __setitem__
and pop
and setdefault
would call __delitem__
.
But they don’t!
Likewise, get
and pop
don’t call __getitem__
, as you might expect they would.
Lists and sets have the same problem
The list
and set
classes have similar problems to the dict
class.
Let’s take a look at an example.
We’ll make a custom list that inherits from the list
constructor and overrides the behavior of __delitem__
, __iter__
, and __eq__
.
This list will customize __delitem__
to not actually delete an item but to instead leave a “hole” where that item used to be.
The __iter__
and __eq__
methods will skip over this hole when comparing two HoleList
classes as “equal”.
This class is a bit nonsensical (no it’s not a Python Morsels exercise fortunately), but we’re focused less on the class itself and more on the issue with inheriting from list
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
Unrelated Aside: if you’re curious about that object()
thing, I explain why it’s useful in my article about sentinel values in Python.
If we make two HoleList
objects and delete items from them such that they have the same non-hole items:
1 2 3 4 5 6 |
|
We’ll see that they’re equal:
1 2 3 4 5 6 7 8 |
|
But if we then ask them whether they’re not equal we’ll see that they’re both equal and not equal:
1 2 3 4 5 6 7 8 9 10 |
|
Normally in Python 3, overriding __eq__
would customize the behavior of both equality (==
) and inequality (!=
) checks.
But not for list
or dict
: they define both __eq__
and __ne__
methods which means we need to override both.
1 2 |
|
Dictionaries suffer from this same problem: __ne__
exists which means we need to be careful to override both __eq__
and __ne__
when inheriting from them.
Also like dictionaries, the remove
and pop
methods on lists don’t call __delitem__
:
1 2 3 4 5 6 7 8 9 |
|
We could again fix these issues by re-implementing the remove
and pop
methods:
1 2 3 4 5 6 7 |
|
But this is a pain. And who knows whether we’re done?
Every time we customize a bit of core functionality on a list
or dict
subclass, we’ll need to make sure we customize other methods that also include exactly the same functionality (but which don’t delegate to the method we overrode).
Why did the Python developers do this?
From my understanding, the built-in list
, dict
, and set
types have in-lined a lot of code for performance.
Essentially, they’ve copy-pasted the same code between many different functions to avoid extra function calls and make things a tiny bit faster.
I haven’t found a reference online that explains why this decision was made and what the consequences of the alternatives to this choice were.
But I mostly trust that this was done for my benefit as a Python developer.
If dict
and list
weren’t faster this way, why would the core developers have chosen this odd implementation?
What’s the alternative to inheriting from list and dict?
So inheriting from list
to make a custom list was painful and inheriting from dict
to create a custom dictionary was painful.
What’s the alternative?
How can we create a custom dictionary-like object that doesn’t inherit from the built-in dict
?
There are a few ways to create custom dictionaries:
- Fully embrace duck typing: figure out everything you need for your data structure to be
dict
-like and create a completely custom class (that walks and quacks like adict
) - Inherit from a helper class that’ll point us in the right direction and tell us which methods our object needs to be
dict
-like - Find a more extensible re-implementation of
dict
and inherit from it instead
We’re going to skip over the first approach: reimplementing everything from scratch will take a while and Python has some helpers that’ll make things easier.
We’re going to take a look at those helpers, first the ones that point us in the right direction (2 above) and then the ones that act as full dict
-replacements (3 above).
Abstract base classes: they’ll help you quack like a duck
Python’s collections.abc module includes abstract base classes that can help us implement some of the common protocols (interfaces as Java calls them) seen in Python.
We’re trying to make a dictionary-like object. Dictionaries are mutable mappings. A dictionary-like object is a mapping. That word “mapping” comes from “hash map”, which is what many other programming languages call this kind of data structure.
So we want to make a mutable mapping.
The collections.abc
module provides an abstract base class for that: MutableMapping
!
If we inherit from this abstract base class, we’ll see that we’re required to implement certain methods for it to work:
1 2 3 4 5 6 7 8 |
|
The MutableMapping
class requires us to say how getting, deleting, and setting items works, how iterating works, and how we get the length of our dictionary.
But once we do that, we’ll get the pop
, clear
, update
, and setdefault
methods for free!
Here’s a re-implementation of TwoWayDict
using the MutableMapping
abstract base class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Unlike dict
, these update
and setdefault
methods will call our __setitem__
method and the pop
and clear
methods will call our __delitem__
method.
Abstract base classes might make you think we’re leaving the wonderful land of Python duck typing behind for some sort of strongly-typed OOP land. But abstract base classes actually enhance duck typing. Inheriting from abstract base classes helps us be better ducks. We don’t have to worry about whether we’ve implemented all the behaviors that make a mutable mapping because the abstract base class will yell at us if we forgot to specify some essential behavior.
The HoleList
class we made before would need to inherit from the MutableSequence
abstract base class.
A custom set-like class would probably inherit from the MutableSet
abstract base class.
UserList/UserDict: lists and dictionaries that are actually extensible
When using the collection ABCs, Mapping
, Sequence
, Set
(and their mutable children) you’ll often find yourself creating a wrapper around an existing data structure.
If you’re implementing a dictionary-like object, using a dictionary under the hood makes things easier: the same applies for lists and sets.
Python actually includes two even higher level helpers for creating list-like and dictionary-like classes which wrap around list
and dict
objects.
These two classes live in the collections module as UserList and UserDict.
Here’s a re-implementation of TwoWayDict
that inherits from UserDict
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
You may notice something interesting about the above code.
That code looks extremely similar to the code we originally wrote (the first version that had lots of bugs) when attempting to inherit from dict
:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
The __setitem__
method is identical, but the __delitem__
method has some small differences.
It might seem from these two code blocks that UserDict
just a better dict
.
That’s not quite right though: UserDict
isn’t a dict
replacement so much as a dict
wrapper.
The UserDict
class implements the interface that dictionaries are supposed to have, but it wraps around an actual dict
object under-the-hood.
Here’s another way we could have written the above UserDict
code, without any super
calls:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Both of these methods reference self.data
, which we didn’t define.
The UserDict
class initializer makes a dictionary which it stores in self.data
.
All of the methods on this dictionary-like UserDict
class wrap around this self.data
dictionary.
UserList
works the same way, except its data
attribute wraps around a list
object.
If we want to customize one of the dict
or list
methods of these classes, we can just override it and change what it does.
You can think of UserDict
and UserList
as wrapper classes.
When we inherit from these classes, we’re wrapping around a data
attribute which we proxy all our method lookups to.
In fancy OOP speak, we might consider UserDict
and UserList
to be adapter classes.
So should I use abstract base classes or UserDict and UserList?
The UserList
and UserDict
classes were originally created long before the abstract base classes in collections.abc
.
UserList
and UserDict
have been around (in some form at least) since before Python 2.0 was even released, but the collections.abc
abstract base classes have only been around since Python 2.6.
The UserList
and UserDict
classes are for when you want something that acts almost identically to a list or a dictionary but you want to customize just a little bit of functionality.
The abstract base classes in collections.abc
are useful when you want something that’s a sequence or a mapping but is different enough from a list or a dictionary that you really should be making your own custom class.
Does inheriting from list and dict ever make sense?
Inheriting from list
and dict
isn’t always bad.
For example, here’s a perfectly functional version of a DefaultDict
(which acts a little differently from collections.defaultdict
):
1 2 3 4 5 6 |
|
This DefaultDict
uses the __missing__
method to act as you’d expect:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
There’s no problem with inheriting from dict
here because we’re not overriding functionality that lives in many different places.
If you’re changing functionality that’s limited to a single method or adding your own custom method, it’s probably worth inheriting from list
or dict
directly.
But if your change will require duplicating the same functionality in multiple places (as is often the case), consider reaching for one of the alternatives.
When making a custom list or dictionary, remember you have options
When creating your own set-like, list-like, or dictionary-like object, think carefully about how you need your object to work.
If you need to change some core functionality, inheriting from list
, dict
, or set
will be painful and I’d recommend against it.
If you’re making a variation of list
or dict
and need to customize just a little bit of core functionality, consider inheriting from collections.UserList
or collections.UserDict
.
In general, if you’re making something custom, you’ll often want to reach for the abstract base classes in collections.abc
.
For example if you’re making a slightly more custom sequence or mapping (think collections.deque
, range
, and maybe collections.Counter
) you’ll want MutableSequence
or MutableMapping
.
And if you’re making a custom set-like object, your only options are collections.abc.Set
or collections.abc.MutableSet
(there is no UserSet
).
We don’t need to create our own data structures very often in Python.
When you do need to create your own custom collections, wrapping around a data structure is a great idea.
Remember the collections
and collections.abc
modules when you need them!
You don’t learn by putting information into your head
You don’t learn by putting information into your head, you learn by trying to retrieve information from your head.
This knowledge about inheriting from list
and dict
and the collections.abc
classes and collections.UserList
and collections.UserDict
isn’t going to stick unless you try to apply it!
If you use the below form to sign up for Python Morsels, the first exercise you see when you sign up will involve creating your own custom mapping or sequence (it’ll be a surprise which one). After that first exercise, I’ll send you one exercise every week for the next month. By default they’ll be intermediate-level exercises, though you can change your skill level after you sign up.
If you’d rather get more beginner-friendly exercises, use the Python Morsels sign up form on the right side of this page instead.