Creating performance tests for Python Morsels exercises is a frequent annoyance
I loathe writing automated tests for performance-related exercises because they’re always flaky. How flaky depends on the exercise, what I’m testing, and the time variability inherent in the particular Python features that a learner might use.
I came up with a solution for flaky tests recently, but it also makes my tests less readable. I then came up with a tool to improve the readability, but that has its own trade-offs.
The code I eventually came up with is a beautiful Python monstrosity.
1 2 3 4 5 6 |
|
I’ll explain what that code does, but first let’s talk about why it’s needed.
The flaky performance tests
My flaky performance tests initially looked like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
The first block runs a performance test for the user’s function on a very small list and on a slightly larger list and then asserting that the slightly larger list didn’t take too much longer to run. The next two blocks run the same code on even larger lists and make further assertions about the relative times that the code took to run.
This roughly approximates the time complexity of this code.
Running performance checks in a loop
These performance checks need to:
- Predictably fail for inefficient solutions
- Predictably pass for efficient solutions
- Run fast (within just a few seconds) even when the code is inefficient
- Avoid the use of
threading
because they’ll be running on WebAssembly in the browser - Run consistently on pretty much any computer
These 5 requirements together have caused me countless headaches. I get the tests passing well, but they don’t always fail when they should. I get the tests failing and passing when they should, but then they’re too slow. And so on…
Notice the n
and m
factors in the above assertions:
1
|
|
If n
and m
are too big, we’ll get false positives (tests passing when they should fail).
If n
and m
are too small, we’ll get false negatives (tests failing when they should pass).
To avoid both Type I and Type II errors, I decided to keep n
and m
small but attempt the assertion block multiple times.
Here’s the (far less flaky) revised code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
The for
loop runs the code multiple times, the break
statement stops the code as soon as the assertions all pass, and the except
and if
ensure that any assertion errors are suppressed until/unless we’re on the final iteration of the loop.
Let’s call this a for
-try
-break
-except
-if
-raise
pattern.
It’s an absurdly verbose name fitting of absurdly verbose code.
This for
-try
-break
-except
-if
-raise
pattern works pretty well!
But it’s not pretty.
Like many programmers, I believe that Don’t Repeat Yourself (DRY) need not apply to tests. Tests are allowed to be repetitive if the verbosity improves readability.
But there is so much noise in that code! I decided that removing some noise might improve readability. So I devised a helper utility to reduce the repetition.
In search of a solution
While pondering the repetitive noise in this code, I wondered what Python features I could use to abstract away this for
-try
-break
-except
-if
-raise
pattern.
Could I make a context manager and use a with
block?
That might help with the try
-except
, but context managers can’t run their code block multiple times, so that wouldn’t help with the for
and the break
.
So a context manager is out.
Could I abstract this away into a looping helper by implementing a generator function?
We are looping and generator functions can break
early.
But, a generator function can’t catch an exception that’s raised within the body of a loop.
So a generator function wouldn’t work either.
What about a decorator? 🤔
Context managers and decorators both sandwich a block of code. But decorators sandwich functions and they have the power to run the same function repeatedly. A decorator might work!
Here’s a decorator that will run a given function up to 10 times (until no AssertionError
is raised):
1 2 3 4 5 6 7 8 9 |
|
To use this decorator, we would need to define a function and then call that function:
1 2 3 4 5 6 7 |
|
This isn’t quite good enough though…
- We need a pattern to run code N times (not necessarily exactly 10)
- We reference the variables defined in each block in later blocks, so
micro_time
andtiny_time
will need to be available outside that function - We need this function to run just one time right after it’s defined… could we do that automatically?
All 3 of these problems are solvable:
- We need a decorator that accepts arguments
- We need to use rarely seen
nonlocal
statement - We could have the decorator automatically call the decorated function
The final weird decorator
Here’s the decorator I ended up with:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
This decorator accepts an n
argument which determines the maximum number of times the decorated function should be called.
The decorator then calls the function repeatedly in a for
loop and a try
-except
block.
As soon as an AssertionError
is not raised during one of these function calls, the looping stops.
The weirdest part about this decorator is that it calls the decorated function.
Note that the decorator
function doesn’t define a wrapper
function within itself… it just runs code right away!
The resulting beautiful Python monstrosity
Here’s the final refactored test code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
The attempt_n_times
decorator immediately calls the function it decorates.
Each function is defined and immediately called one or more times, in a try
-except
block within a loop.
That’s why we’ve named these functions with the throwaway _
name: we don’t care about the name of a function we’re never going to refer to again.
Also note the use of the nonlocal
statement.
Each function in Python has its own scope and all assignments assign to the local scope by default.
That nonlocal
variable pulls those variables to the scope of the outer function instead.
Compare the above code to the code just before this refactor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
I find the refactored version easier to skim.
But that attempt_n_times
decorator does abuse the decorator syntax.
Decorators aren’t meant to call the function they’re decorating.
Is this misuse of decorators worth it?
Is this worth it?
Decorators aren’t supposed to immediately call the function they decorate. But there’s nothing stopping them from doing so. I feel that I’ve traded “normal code” for a beautiful monstrosity that’s easier to skim at a glance.
The attempt_n_times
decorator is pretending that it’s a block-level tool by using a function because there’s no other way to invent such a tool in Python.
I think abstracting away the for
-try
-break
-except
-if
-raise
pattern was worth it, even though I ended up abusing Python’s decorator syntax in the process.
What do you think?
Was that attempt_n_times
abstraction worth it?