July 2, 2025
Let’s assume we have written a program that analyzes genome data.
It is:
Runs fine on our computer! 😎
Now we want to share it with other people.
Easy! Just send the instructions:
Dang.
❯ genome_analyzer their_input_data.fasta
Traceback (most recent call last):
File "__main__.py", line 10, in <module>
sys.exit(main())
~~~~^^
File "genome_analyzer/core.py", line 5, in main
compute_inverse_ratio()
~~~~~~~~~~~~~~~~~~~~~^^
File "genome_analyzer/core.py", line 15, in compute_inverse_ratio
return total/count
~~~~~^~~~~~
ZeroDivisionError: division by zero
We only tested with our own data. 🤦
Or worse:
❯ genome_analyzer my_input_data.fasta
Traceback (most recent call last):
File "__main__.py", line 10, in <module>
sys.exit(main())
~~~~^^
File "genome_analyzer/core.py", line 8, in main
reference = reference.read_text()
File "Lib/pathlib/_local.py", line 546, in read_text
return PathBase.read_text(self, encoding, errors, newline)
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: No such file or directory: 'gene_markers.csv'
Some file is missing. 🤔
Avoid nasty surprises with automated testing.
On your machine and anywhere else.
❯ pytest
========== test session starts ===================
rootdir: GenomeAnalyzer
configfile: pyproject.toml
testpaths: tests
collected 2 items
test_read_reference PASSED [1/2]
test_compute_inverse_ratio PASSED [2/2]
========== 2 passed in 0.03s =====================
For the user:
For the developer:
Given-When-Then [Clean Code, 2008]
(Clean-up: Remove side-effects, reset to original state.)
Arrange-Act-Assert [Test Driven Development, 2002]
Same thing, different name (give or take).
solver.py
:
test_solver.py
:
Tests may fail in non-obvious ways!
A.k.a. system test, integration test.
Ideal starting point!
Write at least one unit test for every function.
Advanced features:
0
, nan
,
""
).stdout
.Python:
R:
Write your tests FIRST. [Clean Code]
Prepare for 1000s of tests.
Write tests as soon as possible!
Even better:
Write tests first and only then the implementation!
But…
solvers.py
:
test_solvers.py
:
Run test(s):
❯ pytest
========= test session starts ===================
collected 1 item
test_solver.py . [100%]
========= 1 passed in 0.04s =====================
Add test for division by zero to test_solver.py
.
from solvers import solve_linear_equation
from pytest import raises
def test_raises_for_gradient_zero():
with raises(ZeroDivisionError):
solve_linear_equation(k=0, d=5)
❯ pytest
========= test session starts ===================
collected 2 items
test_solver.py .. [100%]
========= 2 passed in 0.04s =====================
Let’s add another test, same as the first,
just with other (non-zero) values.
def test_solve_linear_equation_other_values():
result = solve_linear_equation(k=9, d=2.7)
assert result == -0.3
Should work, right?
–2.7 / 9 = –0.3 ✔
❯ pytest
========= test session starts ===================
collected 3 items
test_solver.py ..F [100%]
========= FAILURES ==============================
____ test_solve_linear_equation_other_values ____
def test_solve_linear_equation_other_values():
result = solve_linear_equation(k=9, d=2.7)
> assert result == -0.3
E assert -0.30000000000000004 == -0.3
test_solver.py:21: AssertionError
========= 1 failed, 2 passed in 0.18s ============
Compare floating-point numbers with a tolerance.
In Python, we can (for example) use math.isclose()
.
from math import isclose
def test_solve_linear_equation_other_values():
result = solve_linear_equation(k=9, d=2.7)
assert isclose(result, -0.3)
Other test frameworks (in other programming language) typically have a convenience function to “assert almost equal”.
pyTest also has approx()
:
assert result == approx(-0.3)
.
my_package.py
(extract)
def test_workflow():
# Given
config_file = Path('test_config.yaml')
input_file = Path('test_input.csv')
output_file = Path('test_result.csv')
dataframe = read_csv(
StringIO("""
a b c
17.1 33.9 88.2
0.0 2.5 1.0
"""),
sep=r'\s+',
)
dataframe.to_csv(input_file, index=None)
with config_file.open('w') as stream:
stream.write(f'input_file: {input_file}\n')
stream.write(f'output_file: {output_file}\n')
You will be facing common challenges.
I don’t know how to get started.
Break the vicious circle by adding end-to-end tests.
I can’t test hidden/private code.
I can’t test every combination of inputs.
Test until you are confident that your code works.
I don’t understand what a function does.
Unit testing can help.
The function has 20 parameters.
Time to refactor!
The function needs to read a file.
There are several options …
Create input file for testing purposes.
from reader import read_file
TEST_FILE = "./data/example.txt" # Contains "abc".
def test_returns_read_string_twice():
assert read_file(TEST_FILE) == 'abcabc'
Downsides:
Create the file as needed:
“Mock” the function that reads the file.
from unittest import mock
from reader import read_file
def test_returns_read_string_twice_mocked():
with mock.patch('builtins.open', mock.mock_open(read_data='abc')):
assert read_file(test_file) == 'abcabc'
This uses mock
from the UnitTest testing
framework.
Most frameworks, also pyTest, have something similar.
mockery
package, with_mock
The function writes to a file.
Let it. But …
temp
directory.The functions accesses a database.
Same strategies: Provide test database or mock.
The function creates an image.
Tricky. Avoid as much as possible.
GitHub Actions
.github/workflows/python-ci.yml
name: Test commit
on: [push, pull_request, workflow_dispatch]
jobs:
test:
strategy:
matrix:
python: ["3.10", "3.13"]
runs-on: ubuntu-latest
steps:
- name: Check out code.
uses: actions/checkout@v4
- name: Install Python ${{ matrix.python }}.
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
- name: Install project.
run: pip install .
- name: Run test suite.
run: pytest
Build Pipelines
.gitlab-ci.yml
Always test your code!
(At least with one end-to-end test.)