Case 1: Month class
Let's take a look at an example of a friendly class.
Comparability and Orderability
>>> m = Month(2018, 6)
>>> m
Month(year=2018, month=6)
>>> m == Month(2018, 6)
True
>>> m != Month(2018, 6)
False
>>> m < Month(2018, 12)
True
>>> m > Month(2019, 1)
False
>>> m <= Month(2018, 12)
True
>>> m >= Month(2018, 6)
True
class Month:
def __init__(self, year, month):
pass
def __repr__(self):
pass
def __eq__(self, other):
pass
def __lt__(self, other):
pass
def __gt__(self, other):
pass
def __le__(self, other):
pass
def __ge__(self, other):
pass
- If you want a class in Python that **(click)** accepts attributes, you'll probably want **(click)** an initializer
- If you want it to have **(click)** a helpful string representation, you'll also want **(click)** a **dunder repr** method
- Dunder, by the way, stands for double underscore and dunder methods are special methods that provide a contract between us and Python
- But what if you also wanted your classes to be **(click)** comparable to other classes?... using **(click)** equality and **(click)** inequality?
- In that case you'll want **(click)** a **dunder eq** method.
- If you wanted to go further, by allowing your classes to be **(click)** sortable, you'll need to implement **(click)** the less than operator with **(click)** **dunder lt**, **(click)** the greater than operator with **(click)** **dunder gt**, the **(click)** less than or equal to operator with **(click)** **dunder le**, and **(click)** the greater than or equal to operator with **(click)** **dunder ge**
- After all this, our class is going to look pretty big
class Month:
def __init__(self, year, month):
self.year, self.month = year, month
def __repr__(self):
return f"Month(year={self.year}, month={self.month})"
def __eq__ (self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) == (other.year, other.month)
def __lt__ (self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) < (other.year, other.month)
def __gt__ (self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) > (other.year, other.month)
def __le__ (self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) <= (other.year, other.month)
def __ge__ (self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) >= (other.year, other.month)
- You might notice there's a lot of code duplication here
- All five of those **(click)** comparison dunder methods are pretty much the same method
- The only real difference is **(click)** the operator they use
- There is a helper in the standard library to make this class a bit shorter
from functools import total_ordering
@total_ordering
class Month:
def __init__(self, year, month):
self.year, self.month = year, month
def __repr__(self):
return f"Month(year={self.year}, month={self.month})"
def __eq__(self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) == (other.year, other.month)
def __lt__(self, other):
if not isinstance(other, Month):
return NotImplemented
return (self.year, self.month) < (other.year, other.month)
- It's called `total\_ordering` and it lives in the `functools` module
- It allows us to implement just 2 comparison operators and it'll implement the others automatically
- Here we're implementing equality with `\_\_eq\_\_` and less than with `\_\_lt\_\_`
- But there's still a lot of boilerplate code needed for this class
- We're going to take a look at an even better way to make this class later, but first we're going to take a look at another example
Case 2: Point class
- Let's take a look at one more example
- This time we're going to look at a class that represents a three-dimensional point
Iterability and Immutability
>>> p = Point(1, 2, 3)
>>> p
Point(x=1, y=2, z=3)
>>> p == Point(1, 2, 3)
True
>>> x, y, z = p
>>> x
1
>>> p.x = 4
Traceback (most recent call last):
File "<stdin>", line 1
File "<string>", line 3
AttributeError: object is immutable
>>> {Point(1, 2, 3), Point(1, 2, 3)}
{Point(x=1, y=2, z=3)}
class Point:
def __init__(self, x, y, z):
pass
def __repr__(self):
pass
def __eq__(self, other):
pass
def __iter__(self):
pass
def __setattr__(self, name, value):
pass
def __hash__(self):
pass
- Just as in our last example, we're going to want our class to **(click)** accept arguments, **(click)** have a nice string representation, and **(click)** allow itself to be compared to other `Point` objects
- But what if we also want to allow our class **(click)** to be unpacked into three coordinate variables using multiple assignment?
- Meaning we can take our point and **(click)** unpack it into `x`, `y`, and `z` values
- Multiple assignment requires that our class **(click)** be iterable
- We'll need a **(click)** dunder iter method for that
- *(pause)*
- We might also want our class to be **(click)** immutable, meaning **(click)** we can't override any of the attributes after we've created a `Point` object
- For this we need to make a **(click)** custom dunder setattr method, which is a bit of an advanced thing to do but it works
- And if our class is immutable it should be **(click)** safe to use it in dictionary keys and sets
- But we'll need **(click)** a dunder hash method for that
- So after all of this work, our `Point` class ends up looking something like this...
class Point:
def __init__(self, x, y, z):
self.x, self.y, self.z = x, y, z
def __repr__(self):
return f"Point(x={self.x}, y={self.y}, z={self.z})"
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return tuple(self) == tuple(other)
def __iter__(self):
yield from (self.x, self.y, self.z)
def __setattr__(self, attribute, value):
raise AttributeError("object is immutable")
__delattr__ = __setattr__
def __hash__(self):
return hash(tuple(self))
- This class does quite a bit without too many lines of code
- Some of this is probably new to you
- If you haven't seen `yield` before, you can google generators and iterators and how to make them on your own
- This is basically **code copy-paste time** here. Instead of memorizing all this, we can always copy-paste this class format when we need it.
- So while a lot of these things might be *new* to you, this code is pretty succinct for how much it does. But this is pretty much all boilerplate code. We shouldn't have to type this out, or copy-paste it whenever we need code that works this way.
- Fortunately, there is a tool in the Python standard library that can use to help us
NamedTuple
- It's a helper class called `NamedTuple` and it's built-in to Python 3
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> p
Point(x=1, y=2, z=3)
>>> p == Point(1, 2, 3)
True
>>> p.x = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: cannot set attribute
>>>
- `NamedTuple` lives in the `typing` module **(click)**
- To create a new `NamedTuple` class, you inherit from `NamedTuple` and **(click)** use type hints to define attributes on the class
- The **(click)** class we get from this has **(click)** a friendly string representation
- **(click)** Nice default comparisons
- And **(click)** they're immutable, just like tuples
- By the way, if you've seen `namedtuple` before but the inheritance syntax I'm using looks a little unfamiliar that's because there's another way to use `namedtuple` that existed in Python 2 which was a bit more awkward
- This way of using it I'm showing here was inspired by `attrs` and works in Python 3.5 and above
- So out-of-the-box, `NamedTuple` pretty much gives us all the things we were looking for in our `Point` class
- But using `NamedTuple` to implement our `Point` class has some downsides
namedtuple are tuples
>>> Point(1, 2, 3) < Point(4, 5, 6)
True
>>> Point(1, 2, 3) + Point(4, 5, 6)
(1, 2, 3, 4, 5, 6)
>>> Point(1, 2, 3) * 2
(1, 2, 3, 1, 2, 3)
>>> len(Point(1, 2, 3))
3
- Named tuples inherit from **(click)** tuples
- Which means they inherit all the features that tuples have
- So they can be **(click)** ordered... which can be useful but it doesn't really make sense for our `Point` objects
- They can also be **(click)** added to each other... but the object that comes back from this operation **(click)** *might* not be what you'd expect
- The same is true for multiplication **(click)**
- You can multiply namedtuples by integers, but the **(click)** thing you get back is a little odd
- Also they have **(click)** a length... which also doesn't necessarily make sense in all cases
- So named tuples can be useful, but you sometimes have to be careful while using them
from typing import NamedTuple
class Point(NamedTuple):
x: float
y: float
z: float
def __lt__(self, other): raise TypeError
def __le__(self, other): raise TypeError
def __gt__(self, other): raise TypeError
def __ge__(self, other): raise TypeError
def __add__(self, other): raise TypeError
def __mul__(self, other): raise TypeError
def __rmul__(self, other): raise TypeError
def __len__(self, other): raise TypeError
- You *can* try to override a bunch of methods to fix this issue, but that's *really* awkward
- Finding and removing features that you *don't* want is often trickier than *adding* features that you do want
- So don't use `NamedTuple` unless you're trying to make something that **acts like a tuple**
attrs
- One alternative to named tuples is `attrs`
- `attrs` is a third-party library that makes it easier to create classes with certain common features
$ pip install attrs
import attr
@attr.s(auto_attribs=True)
class Point:
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> p
Point(x=1, y=2, z=3)
>>> p == Point(1, 2, 3)
True
>>> p < Point(4, 5, 6)
True
- You can get attrs by **(click)** pip-installing it
- **(click)** After that you can use it by **(click)** importing attr (without the s, which is a little odd)
- And then **(click)** decorating your class with the `attr.s` decorator
- That `auto_attribs` argument allows us to use **(click)** type hints to define our attributes
- By default, **(click)** `attrs` classes get **(click)** a nice string representation
- And **(click)** they can be logically compared to other objects of the same type
- But **(click)** they can also be *ordered* the same way named tuples can, which again is a bit odd in this case
- Our `Point` objects can be ordered and they aren't iterable or immutable yet
- If we want our `Point` class to work just like the one we made *manually* before, we could do this
import attr
@attr.s(auto_attribs=True, cmp=False , frozen=True )
class Point:
x: float
y: float
z: float
def __iter__(self):
yield from (self.x, self.y, self.z)
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return tuple(self) == tuple(other)
def __hash__(self):
return hash(tuple(self))
- Here we've **(click)** frozen our object so the attributes can't be modified
- We've also added **(click)** a dunder iter method to make our object work with multiple assignment
- And we've **(click)** disabled comparisons so we don't get automatic ordering, but that means we have to do a lot of extra work by...
- implementing **(click)** a dunder eq method
- and implementing **(click)** a dunder hash method
- So the `attrs` library is more powerful than named tuples, but it can take a bit of playing with sometimes
- Plus it's a third party library which means you'll need to install it whenever you want to use it
- But there's an `attrs`-like library that's actually bundled with Python now
dataclasses
- Data classes is essentially a simplified version of the `attrs` library that's built-in to the Python standard library
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> p
Point(x=1, y=2, z=3)
>>> p == Point(1, 2, 3)
True
>>> p < Point(4, 5, 6)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'Point' and 'Point'
- To use dataclasses, you'll **(click)** import `dataclass` from the `dataclasses` library
- And you'll **(click)** decorate your class, the same way you can with `attrs`
- The syntax relies on **(click)** type hinting, just like `typing.NamedTuple` and `attrs` do
- This **(click)** `Point` class has **(click)** a nice string representation and **(click)** it supports comparisons
- But **(click)** it doesn't support ordering... and that's perfect because we don't want our class to orderable
- But this `Point` class doesn't work quite the way we want it to yet
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> x, y, z = p
Traceback (most recent call last):
File "", line 1, in
TypeError: cannot unpack non-iterable Point object
>>> p.x = 4
>>> p
Point(x=4, y=2, z=3)
- If we try to use this point with multiple assignment **(click)**, we'll get an error **(click)**, because our `Point` class is not iterable so we can't unpack into 3 variables
- Also if we **(click)** try to assign to the `x` attribute in our class
- It **(click)** works... which isn't good because our class is *supposed to be* **immutable**, meaning we shouldn't be able to change the attributes in our `Point` objects
- We're expecting to get an error here instead
- So we need to make our `Point` class iterable and immutable
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float
def __iter__(self):
yield from (self.x, self.y, self.z)
- To make our class iterable, we can add **(click)** a dunder iter method
- Which you could copy-paste from the Point class that we made manually earlier, before we looked at namedtuples and attrs and dataclasses
- But instead of hard-coding **(click)** self.x, self.y, and self.z, you could instead copy-paste this code
- *(pause)*
from dataclasses import dataclass, astuple
@dataclass(frozen=True)
class Point:
x: float
y: float
z: float
def __iter__(self):
yield from astuple(self)
>>> p = Point(1, 2, 3)
>>> x, y, z = p
>>> x
1
>>> y
2
- The `dataclasses` library has an `astuple` helper that allows us to get a tuple that contains our object's attributes without manually specifying each of the attributes. Which is kind of nifty.
- *(pause)*
- So our **(click)** `Point` class is iterable now, meaning we can use **(click)** multiple assignment with it
- Which is great **(click)**
- But it's not immutable yet
- To make it immutable, we can set the **(click)** `frozen` attribute when using our `dataclass` decorator
from dataclasses import dataclass, astuple
@dataclass(frozen=True)
class Point:
x: float
y: float
z: float
def __iter__(self):
yield from astuple(self)
>>> p = Point(1, 2, 3)
>>> p.x = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'x'
- Now that we've frozen our `Point` class, if we **(click)** try to assign an attribute on a point object, **(click)**, we'll get an error
- Which is just what we're looking for
- So our class is now iterable, immutable, comparable, and it has a nice representation. But it's *not* orderable, it *doesn't* have a length, and it *doesn't* have other features that don't make sense either
- So using dataclasses, we've made a **friendly class** that has just the features that we want
- All in less than 10 lines of code. I think this is pretty cool.
dataclasses
Built-in to standard library (in Python 3.7)
Available as a third-party library
While attrs supports both Python 2 and Python 3, dataclasses only work on Python 3
dataclasses is simpler, but less feature-rich than attrs
- When I told you before that dataclasses are **(click)** included in the Python standard library, I meant that **(click)** it's included with Python 3.7
- Which was just released pretty recently, so you're probably not using it yet
- It's also **(click)** available as a third-party library though, so you can pip-install it from any version of Python 3. If you're on Python 2, you're out of luck.
- If you need Python 2 support **(click)** or if you need **(click)** more features than dataclasses provide, you might want to reach for the `attrs` library instead
- The dataclasses library was partially inspired attrs, so many of the design decisions made by each are similar
Friendly Class Recipes
- Let's take a look at some common uses for data classes
- I'm intending this section to act as sort of a recipe book for how to make different types of friendly classes
- I don't mean it to be an infomercial for data classes
- But it might be an infomercial for data classes
What you get out of the box
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> p
Point(x=1, y=2, z=3)
>>> p == Point(1, 2, 3)
True
- Without any other customization, data classes give you:
- a good string representation for your object
- and they allow you to compare objects to each other with the equality and inequality operators
- That's what data classes give you without any customization
- And that might be all you need
Immutability
from dataclasses import dataclass
@dataclass(frozen=True)
class Point:
x: float
y: float
z: float
>>> p = Point(1, 2, 3)
>>> p.x = 4
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 3, in __setattr__
dataclasses.FrozenInstanceError: cannot assign to field 'x'
- But you can also make your data classes immutable by setting **(click)** frozen to True
- This also makes our classes hashable, if that's a thing you care about
Iterability
from dataclasses import dataclass, astuple
@dataclass
class Point:
x: float
y: float
z: float
def __iter__(self):
yield from astuple(self)
>>> p = Point(1, 2, 3)
>>> x, y, z = p
>>> tuple(p)
(1, 2, 3)
- And something we've seen that's unrelated to data classes, but certainly works with them, is iterability
- We can make a `\_\_iter\_\_` method on any class to make our class iterable
- That `astuple` helper just makes it *a tiny bit* easier with data classes
- *(pause)*
- One thing I haven't shown you is how to make orderable data classes...
Orderability
from dataclasses import dataclass
@dataclass(order=True)
class Month:
year: int
month: int
>>> Month(2019, 12) < Month(2020, 1)
True
>>> months = [Month(2018, 6), Month(2018, 1), Month(2019, 10)]
>>> print(*sorted(months), sep='\n')
Month(year=2018, month=1)
Month(year=2018, month=6)
Month(year=2019, month=10)
- Before we talked about named tuples, attrs, *or* data classes, we made a `Month` class which was orderable
- Meaning we could use comparison operators, like less than, to ask whether one month object was less than another
- To get this to work as we we'd expect, we can set the `order` to True **(click)** when we decorator our data class
- This makes our class order items lexicographically, which is just the fancy name for the way that Python orders tuples, lists, strings, and pretty much every other type of orderable collection
- One more thing I'd like to discuss that's not at all data class specific is...
Friendly operations
from dataclasses import dataclass, astuple
@dataclass
class Vector:
x: float
y: float
z: float
def __iter__(self):
yield from astuple(self)
def __add__(self, other):
return Vector(*(a + b for a, b in zip(self, other)))
def __sub__(self, other):
return Vector(*(a - b for a, b in zip(self, other)))
>>> Vector(1, 2, 3) + Vector(4, 5, 6)
Vector(x=5, y=7, z=9)
>>> Vector(4, 5, 6) - Vector(1, 2, 3)
Vector(x=3, y=3, z=3)
- Dunder methods!
- One of the secrets to making a friendly class is embracing dunder methods
- This is a class that implements `\_\_iter\_\_`, `\_\_add\_\_` and `\_\_sub\_\_`.
- The `\_\_add\_\_` and `\_\_sub\_\_` methods make addition **(click)** and subtraction **(click)** work our class
- And that `\_\_iter\_\_` method makes the implementation of those other two methods a bit simpler
- Sometimes people call dunder methods *magic methods*
- Dunder methods are **not** magical and they're not scary: dunder methods can make class friendlier
- *(pause)*
- Before we wrap up, I'd like you to consider...
Should I make a class?
- Whether you even need classes
class Matrix:
"""Turn a string into a matrix-like thing."""
def __init__(self, string):
self.string = string
@property
def rows(self):
return [
[float(x) for x in row.split()]
for row in self.string.splitlines()
]
@property
def columns(self):
return [
list(column)
for column in zip(*self.rows)
]
>>> matrix = Matrix("9 8 7\n5 3 2\n6 6 7")
>>> matrix.rows
[[9, 8, 7], [5, 3, 2], [6, 6, 7]]
>>> matrix.columns
[[9, 5, 6], [8, 3, 6], [7, 2, 7]]
- This is a class that represents a matrix **(click)**
- This class accepts **(click)** a string that represents a matrix of numbers, separated by spaces and newline characters
- When we construct a `Matrix` from our class, that `matrix` object will have **(click)** a rows attribute that gives us back the rows in our matrix
- And it also has **(click)** a columns attribute that gives us back a *transposed* version of those rows
- *This class* **does not** need to exist
- I say that, because we could *replace* this class...
def matrix_from_string(string):
"""Convert rows of numbers to list of lists."""
return [
[float(x) for x in row.split()]
for row in string.splitlines()
]
def transpose(matrix):
"""Return a transposed version of given list of lists."""
return [
list(column)
for column in zip(*matrix)
]
>>> matrix = matrix_from_string("9 8 7\n5 3 2\n6 6 7")
>>> matrix
[[9, 8, 7], [5, 3, 2], [6, 6, 7]]
>>> transpose(matrix)
[[9, 5, 6], [8, 3, 6], [7, 2, 7]]
- With two functions:
- matrix_from_string **(click)** and transpose **(click)**
- We call call that `matrix_from_string` function the **(click)** same way we called our `Matrix` class constructor before
- But instead of giving us a `matrix` object back, **(click)** it gives us back a list of numbers, just like our `rows` attribute did on our `matrix` object
- If we pass that list of numbers to the `transpose` function **(click)**, we'll get back the transpose matrix of that list of numbers, just like we did when we used the `columns` attribute on the matrix that we had before
- *(pause)*
- You **do not** always need classes
- Friendly classes are great, but the friendliest code sometimes doesn't have classes at all
Advanced Recipes
- So I have even more recipes, but they're kind of advanced and I'm going to skip over these slides entirely because I don't have time to go through them
- I will tweet out a link to the slides later in case you're interested in them though
- **[skip this section]**
- Before we wrap up, I'd like you to consider...
Metadata on fields!
from dataclasses import dataclass, field, fields
@dataclass
class Point:
x: float = field(metadata={'iter': True})
y: float = field(metadata={'iter': True})
z: float = field(metadata={'iter': True})
color: str
def __iter__(self):
return (
getattr(self, field.name)
for field in fields(self)
if field.metadata.get('iter')
)
>>> x, y, z = Point(1, 2, 3, color='red')
>>> (x, y, z)
(1, 2, 3)
- The dataclasses library has a `field` class that can be used to store metadata about each of our fields
- metadata allows you to attach information to your class fields and kind of extend the functionality of dataclasses
- Here we're making **(click)** some of our attributes iterable and **(click)** some of them not
- **(click)** These `field` objects can be used for lots of other things
- One example is compare, which can be used to make some fields non-comparable, which could be handy in some cases
Non-comparable fields!
from dataclasses import dataclass, field, fields
@dataclass
class Point:
x: float = field(metadata={'iter': True})
y: float = field(metadata={'iter': True})
z: float = field(metadata={'iter': True})
color: str = field(compare=False )
def __iter__(self):
return (
getattr(self, field.name)
for field in fields(self)
if field.metadata.get('iter')
)
>>> p1 = Point(1, 2, 3, color='red')
>>> p2 = Point(1, 2, 3, color='blue')
>>> p1 == p2
True
- Adding `compare=False` to a dataclass **(click)** removes that field from comparisons
- So if we have **(click)** two points that have the same coordinates but one is red and one is blue
- And we ask whether these two points are equal **(click)**
- We'll see that **(click)** they are because we're ignoring the color when we compare them
Dynamic default values!
from dataclasses import dataclass, field, fields
import random
def random_color(): return random.choice(['purple', 'blue', 'red'])
@dataclass
class Point:
x: float = field(metadata={'iter': True})
y: float = field(metadata={'iter': True})
z: float = field(metadata={'iter': True})
color: str = field(default_factory=random_color , compare=False)
def __iter__(self):
...
>>> p1 = Point(1, 2, 3)
>>> p1.color
'blue'
- `default_factory` is another advanced feature of data class fields
- With `default_factory` we can specify a function that chooses a random color
- So that when we **(click)** create a point without a color specified **(click)**, it'll call that `random_color` function to randomly choose a color for us
- Alright just one more advanced feature of data classes...
Auto-created fields
from dataclasses import dataclass, field, fields
import random
def random_color(): return random.choice(['purple', 'blue', 'red'])
@dataclass
class Point:
x: float = field(metadata={'iter': True})
y: float = field(metadata={'iter': True})
z: float = field(metadata={'iter': True})
color: str = field(compare=False, init=False )
def __post_init__(self):
self.color = random_color()
def __iter__(self):
...
>>> point = Point(1, 2, 3)
>>> point.color
'purple'
>>> point = Point(1, 2, 3, color='blue')
TypeError: __init__() got an unexpected keyword argument 'color'
- Auto-created fields
- We've already made a data class that sets a **(click)** a dynamic default value for **(click)** our `color` attribute
- This class here also goes a step further by disallowing that color attribute from being specified manually **(click)** give us an error if we try to specify that attribute manually
- This works by setting **(click)** `init=False` on our dataclass `field` and creating a custom **(click)** dunder post_init method, which is a dataclass-specific dunder method that gets called by dunder init
- **(click)** These are all pretty complex things I just flashed by and if you actually need any of those features, you can consult the dataclasses documentation on your own
- Before we wrap up, I'd like you to consider...