Automated Testing

July 2, 2025

Introduction

Motivation

Let’s assume we have written a program that analyzes genome data.

It is:

Available in a Git repository.
Properly packaged to be installable.

❯ git clone https://github.com/sib/GenomeAnalyzer.git
❯ pip install GenomeAnalyzer
❯ genome_analyzer my_genome_data.fasta
Analyzed all the data. Whoo-hoo!

Runs fine on our computer! 😎

Now we want to share it with other people.
Easy! Just send the instructions:

❯ git clone https://github.com/sib/GenomeAnalyzer.git
❯ pip install GenomeAnalyzer
❯ genome_analyzer your_genome_data.fasta

Dang.

❯ genome_analyzer their_input_data.fasta
Traceback (most recent call last):
  File "__main__.py", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "genome_analyzer/core.py", line 5, in main
    compute_inverse_ratio()
    ~~~~~~~~~~~~~~~~~~~~~^^
  File "genome_analyzer/core.py", line 15, in compute_inverse_ratio
    return total/count
           ~~~~~^~~~~~
ZeroDivisionError: division by zero

We only tested with our own data. 🤦

Or worse:

❯ genome_analyzer my_input_data.fasta
Traceback (most recent call last):
  File "__main__.py", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "genome_analyzer/core.py", line 8, in main
    reference = reference.read_text()
  File "Lib/pathlib/_local.py", line 546, in read_text
    return PathBase.read_text(self, encoding, errors, newline)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: No such file or directory: 'gene_markers.csv'

Some file is missing. 🤔

Tests!

Avoid nasty surprises with automated testing.
On your machine and anywhere else.

❯ pytest
========== test session starts ===================
rootdir: GenomeAnalyzer
configfile: pyproject.toml
testpaths: tests
collected 2 items

test_read_reference          PASSED          [1/2]
test_compute_inverse_ratio   PASSED          [2/2]

========== 2 passed in 0.03s =====================

Why tests are important

For the user:
- Instill confidence in the project.
- Promise stability of output results.
For the developer:
- Make bug reporting easier.
- Find bugs before users do.
- Safety harness when refactoring.
- Make (future) changes faster.

Testing Basics

Pattern

Given-When-Then [Clean Code, 2008]

Given: Set up environment, provide input data.
When: Execute what you want to test.
Then: Compare actual result with expected result.

(Clean-up: Remove side-effects, reset to original state.)

Pattern

Arrange-Act-Assert [Test Driven Development, 2002]

Arrange: Set up environment, provide input data.
Act: Execute what you want to test.
Assert: Compare actual result with expected result.

Same thing, different name (give or take).

Example

solver.py:

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k

Test

test_solver.py:

def test_solve_linear_equation():

    # GIVEN
    nonzero_gradient = 2
    nonzero_offset   = 5

    # WHEN
    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    # THEN
    expected_result = -2.5
    assert result == expected_result

>>> from test_solver import test_solve_linear_equation
>>> test_solve_linear_equation()
>>>

Note

Tests may fail in non-obvious ways!

def test_solve_linear_equation():
    nonzero_gradient = 9
    nonzero_offset   = 2.7

    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    expected_result = -0.3
    assert result == expected_result

>>> test_solve_linear_equation()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in test_solve_linear_equation
AssertionError

Test Classification

Classification

End-to-end Test
Unit Test
Integration Test
Regression Test

End-to-end Test

A.k.a. system test, integration test.

Tests the complete program.
From the user’s perspective.
Okay if slow.
As there will be few of them.

Ideal starting point!

Unit Test

Tests a “single unit”, such as a function or method.
Takes the developer’s perspective.
There may be many (hundreds).
They should/will be fast.

Write at least one unit test for every function.

Integration Test

Tests combinations and interactions of components.
Somewhere between end-to-end and unit test.
Rather used in complex applications.

Regression Test

Designed to avoid “regression” of undesired behavior.
Important for refactoring.
Often extensive testing of full application range.
Covers a large spectrum of the input space.

Testing Frameworks

Example (as before)

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k

def test_solve_linear_equation():
    nonzero_gradient = 2
    nonzero_offset   = 5

    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    expected_result = -2.5
    assert result == expected_result

Challenges

Separation of tests and your actual code.
There may be hundreds of tests.
We would also like some tooling that:
- Handles entire arrays, dataframes/tables, etc.
- Tests floating-point numbers are “almost equal”.
- Tests errors are raised when they should be.
- … and more.

Testing Frameworks

Find and run all tests in the code base.
Output concise report of passing tests.
Provide insight into failing tests.
Convenience methods for comparing expectation with actual event (“assertions”).

Testing Frameworks

Advanced features:

Convenience functions for parameterized tests.
Helpers that heuristically generate corner cases based on data type (e.g. using 0, nan, "").
Coverage report:
Which lines were executed by the test suite?
Fixtures such as capturing stdout.

Testing Frameworks

Python:

TestThat

Best Testing Practices

Good Tests

Write your tests FIRST. [Clean Code]

F ast
I solated
R epeatable
S elf-validating
T imely

Fast

Prepare for 1000s of tests.

Aim at a few milliseconds for unit tests,
a few seconds tops for end-to-end tests.
We will run all tests often.
We will write many of them.

Isolated

independent of each other
independent of behaviour tested elsewhere
independent of machine
independent of external data/network

Repeatable

Same result every time we run it.
No random input, fix “seeds”.
No side-effects.

Self-Validating

Test asserts itself that result was as expected.
No manual work involved.
No need to look at results if test succeeds.

Timely

Write tests as soon as possible!

You still know why you wrote the code that way.
Corner cases are fresh on your mind.
You are manually testing your code anyway.

Even better:
Write tests first and only then the implementation!

Best Practices

Keep input data minimal and obvious.
Mind the resource consumption: CPU, memory.
Don’t forget to test side-effect: changed data, logs, warnings.
If possible, factor out common test inputs into a “test setup”.
Take some time to consider corner cases.

But…

Don’t test third-party code, focus on yours.

Development

Writing code and writing tests goes hand-in-hand.
Software development is an iterative process.
Develop tests alongside your application, not afterwards.

Unit Tests

Python example with pyTest

We recommend you use pyTest to test Python code.
An alternative is UnitTest from the standard library.
It’s similar to jUnit in Java, and thus less “pythonic”.
There is a testing framework for every programming language.

Example (same still)

solvers.py:

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k

Test Case

test_solvers.py:

from solvers import solve_linear_equation

def test_solve_linear_equation():
    """Checks solution for k = 2 and d = 5"""
    result = solve_linear_equation(k=2, d=5)
    assert result == -2.5

Test Runner

Run test(s):

❯ pytest
========= test session starts ===================
collected 1 item

test_solver.py .                           [100%]

========= 1 passed in 0.04s =====================

Corner Cases

Add test for division by zero to test_solver.py.

from solvers import solve_linear_equation
from pytest  import raises

def test_raises_for_gradient_zero():
    with raises(ZeroDivisionError):
        solve_linear_equation(k=0, d=5)

❯ pytest
========= test session starts ===================
collected 2 items

test_solver.py ..                          [100%]

========= 2 passed in 0.04s =====================

Corner Cases

Let’s add another test, same as the first,
just with other (non-zero) values.

def test_solve_linear_equation_other_values():
    result = solve_linear_equation(k=9, d=2.7)
    assert result == -0.3

Should work, right?
–2.7 / 9 = –0.3 ✔

Corner Cases

❯ pytest
========= test session starts ===================
collected 3 items

test_solver.py ..F                          [100%]

========= FAILURES ==============================
____ test_solve_linear_equation_other_values ____

    def test_solve_linear_equation_other_values():
        result = solve_linear_equation(k=9, d=2.7)
>       assert result == -0.3
E       assert -0.30000000000000004 == -0.3
test_solver.py:21: AssertionError
========= 1 failed, 2 passed in 0.18s ============

Corner Cases

Compare floating-point numbers with a tolerance.
In Python, we can (for example) use math.isclose().

from math import isclose

def test_solve_linear_equation_other_values():
    result = solve_linear_equation(k=9, d=2.7)
    assert isclose(result, -0.3)

Other test frameworks (in other programming language) typically have a convenience function to “assert almost equal”.

pyTest also has approx(): assert result == approx(-0.3).

End-to-end Test

Testing the software from a user’s perspective.
We should have one for each workflow / use case.
Parameter space coverage is not the focus,
that’s where unit tests come in.
Input data should be as small as possible,
but not trivial.

Workflow

my_package.py (extract)

from pandas import read_csv, read_excel

def workflow(config_file=None):
    config  = read_config(config_file)
    data    = read_data(config)
    results = process(data, config)
    write_result(results, config)

def read_data(config):
    input_path = config['input_file']
    if input_path.endswith('.csv'):
        return read_csv(input_path)
    elif input_path.endswith('.xlsx'):
        return read_excel(input_path)
    else:
        raise NotImplementedError('Input file has unknown format.')

Given

from pandas  import read_csv
from io      import StringIO
from pathlib import Path

def test_workflow():
    # Given
    config_file = Path('test_config.yaml')
    input_file  = Path('test_input.csv')
    output_file = Path('test_result.csv')
    dataframe = read_csv(
        StringIO("""
            a     b     c
            17.1  33.9  88.2
            0.0   2.5   1.0
        """),
        sep=r'\s+',
    )
    dataframe.to_csv(input_file, index=None)
    with config_file.open('w') as stream:
        stream.write(f'input_file:  {input_file}\n')
        stream.write(f'output_file: {output_file}\n')

When - Then

import my_package

def test_workflow():
    # Given
    …

    # When
    my_package.workflow(config_file)

    # Then
    assert_expectations(test_result)   # Make sensible assertions.

    # Cleanup
    config_file.unlink()
    input_file.unlink()
    output_file.unlink()

Real-World Testing

Challenges

You will be facing common challenges.

I don’t know how to get started.
I don’t understand what a function does.
I can’t test hidden/private code.
I can’t test every combination of inputs.
The function has 20 parameters, so 2²⁰ tests?
The function reads from a file or database.
The function writes to a file or database.

Getting Started

I don’t know how to get started.

Testing requires refactoring.
Refactoring requires tests.

Break the vicious circle by adding end-to-end tests.

Private Code

I can’t test hidden/private code.

No problem, neither can anybody else.
Test what users can do with your code.

Lean Testing

I can’t test every combination of inputs.

Test until you are confident that your code works.

Test the uses cases the unit was designed for.
Test “boundaries” and special cases.
Add tests when you discover bugs.
“Code coverage” tools can help.

No Documentation

I don’t understand what a function does.

Unit testing can help.

Write a test based on your assumption.
Look at the actual output.
Fix the test.
Repeat.

Complex Interface

The function has 20 parameters.

Time to refactor!

Add regression tests first.
Try to extract atomic units and test them.
- 20 one-boolean-parameter-functions: 40 tests.
- One 20-boolean-parameter-function: ~1M tests.
Add integration tests for the (important) use cases.

Interoperation

The function needs to read a file.

def read_file(path):
    with open(path, "r") as file:
        x = file.read()
        return x + x

There are several options …

File fixture in repository

Create input file for testing purposes.

from reader import read_file

TEST_FILE = "./data/example.txt"    # Contains "abc".

def test_returns_read_string_twice():
    assert read_file(TEST_FILE) == 'abcabc'

Downsides:

More clutter in repository.
Content not obvious in test code.

Create a temporary file

Create the file as needed:

from reader import read_file
import tempfile

def setup_module():
    global temp_dir, test_file
    temp_dir  = tempfile.TemporaryDirectory()
    test_file = temp_dir.name + '/data.txt'
    with open(test_file, 'w') as stream:
        stream.write('abc')

def test_returns_read_string_twice():
    assert read_file(test_file) == 'abcabc'

def teardown_module():
    temp_dir.cleanup()

Mocking

“Mock” the function that reads the file.

from unittest import mock
from reader   import read_file


def test_returns_read_string_twice_mocked():
    with mock.patch('builtins.open', mock.mock_open(read_data='abc')):
        assert read_file(test_file) == 'abcabc'

This uses mock from the UnitTest testing framework.
Most frameworks, also pyTest, have something similar.

Mocking in R

mockery package, with_mock

library(testthat)

m <- mock(1)
f <- function (x) summary(x)
with_mock(f = m, {
  expect_equal(f(iris), 1)
})

Side Effects

The function writes to a file.

def write_function(path, content):
    with open(path, "w") as file:
        file.write(content)

Let it. But …

Make sure to clean up afterwards.
Or make it write the file to the temp directory.

Side Effects

The functions accesses a database.

Same strategies: Provide test database or mock.

Images

The function creates an image.

import pandas as pd

def plotting_function(data):
    pd.DataFrame(data).plot()

Tricky. Avoid as much as possible.

Image renders are not pixel-perfect across platforms.
You’d have to compare two images with a tolerance.
(The Matplotlib test suite does that.)
Minimalistic approach: Check for no failure.

Continuous Integration

Ulitimate Testing

Fully automated:
Runs on every each commit in your repository.
Fully independent:
Always starts from a clean slate.
Notifications:
Sends an email in case of failure.

GitHub

GitHub Actions

Free (for open-source projects)
Easy setup: many prebuilt templates (“actions”)

GitHub - Example

.github/workflows/python-ci.yml

name: Test commit
on: [push, pull_request, workflow_dispatch]

jobs:
  test:
    strategy:
      matrix:
        python: ["3.10", "3.13"]
    runs-on: ubuntu-latest

    steps:
    - name: Check out code.
      uses: actions/checkout@v4
    - name: Install Python ${{ matrix.python }}.
      uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python }}
    - name: Install project.
      run:  pip install .
    - name: Run test suite.
      run:  pytest

GitLab

Build Pipelines

Free for personal projects
Limit on user number and computation time
Generic template, good to start from

GitLab - Example

.gitlab-ci.yml

image: python:latest

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - venv/

before_script:
  - python -m venv venv/
  - source venv/bin/activate
  - pip install pytest

test:
  script:
    - pytest

CI Servers

For example: Jenkins.
Dedicated local server for CI:
lint, test, deploy, report, …
Hooked to Repository: commit notification.
More manual work for setup and maintenance.
Provides plenty of additional features
e.g., graphic coverage reports.

Conclusion

What for?

Code does what it’s supposed to.
And keeps doing so going forward.
We want results to be reproducible, always!

Additional Benefits

Confidence: Puts your mind at ease.
Trust:
- Invites others to use your code.
- Facilitates working in a team.
Also documents how to use your code (to an extent).
Accelerates further development.
Makes you write better code.

Take-Home Message

Always test your code!

(At least with one end-to-end test.)

Acknowledgements

Slides:
- Reveal.js
- Pandoc
Authors:
- Uwe Schmitt
- Manuel Weberndorfer
- Emanuel Schmid
- John Hennig