Automated Testing

July 2, 2025

Introduction

Motivation

Let’s assume we have written a program that analyzes genome data.

It is:

  • Available in a Git repository.
  • Properly packaged to be installable.
 git clone https://github.com/sib/GenomeAnalyzer.git
 pip install GenomeAnalyzer
 genome_analyzer my_genome_data.fasta
Analyzed all the data. Whoo-hoo!

Runs fine on our computer! 😎

Now we want to share it with other people.
Easy! Just send the instructions:

 git clone https://github.com/sib/GenomeAnalyzer.git
 pip install GenomeAnalyzer
 genome_analyzer your_genome_data.fasta

Dang.

❯ genome_analyzer their_input_data.fasta
Traceback (most recent call last):
  File "__main__.py", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "genome_analyzer/core.py", line 5, in main
    compute_inverse_ratio()
    ~~~~~~~~~~~~~~~~~~~~~^^
  File "genome_analyzer/core.py", line 15, in compute_inverse_ratio
    return total/count
           ~~~~~^~~~~~
ZeroDivisionError: division by zero

We only tested with our own data. 🤦

Or worse:

❯ genome_analyzer my_input_data.fasta
Traceback (most recent call last):
  File "__main__.py", line 10, in <module>
    sys.exit(main())
             ~~~~^^
  File "genome_analyzer/core.py", line 8, in main
    reference = reference.read_text()
  File "Lib/pathlib/_local.py", line 546, in read_text
    return PathBase.read_text(self, encoding, errors, newline)
           ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: No such file or directory: 'gene_markers.csv'

Some file is missing. 🤔

Tests!

Avoid nasty surprises with automated testing.
On your machine and anywhere else.

❯ pytest
========== test session starts ===================
rootdir: GenomeAnalyzer
configfile: pyproject.toml
testpaths: tests
collected 2 items

test_read_reference          PASSED          [1/2]
test_compute_inverse_ratio   PASSED          [2/2]

========== 2 passed in 0.03s =====================

Why tests are important

  • For the user:

    • Instill confidence in the project.
    • Promise stability of output results.
  • For the developer:

    • Make bug reporting easier.
    • Find bugs before users do.
    • Safety harness when refactoring.
    • Make (future) changes faster.

Testing Basics

Pattern

Given-When-Then [Clean Code, 2008]

  • Given: Set up environment, provide input data.
  • When: Execute what you want to test.
  • Then: Compare actual result with expected result.

(Clean-up: Remove side-effects, reset to original state.)

Pattern

Arrange-Act-Assert [Test Driven Development, 2002]

  • Arrange: Set up environment, provide input data.
  • Act: Execute what you want to test.
  • Assert: Compare actual result with expected result.

Same thing, different name (give or take).

Example

solver.py:

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k

Test

test_solver.py:

def test_solve_linear_equation():

    # GIVEN
    nonzero_gradient = 2
    nonzero_offset   = 5

    # WHEN
    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    # THEN
    expected_result = -2.5
    assert result == expected_result
>>> from test_solver import test_solve_linear_equation
>>> test_solve_linear_equation()
>>>

Note

Tests may fail in non-obvious ways!

def test_solve_linear_equation():
    nonzero_gradient = 9
    nonzero_offset   = 2.7

    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    expected_result = -0.3
    assert result == expected_result
>>> test_solve_linear_equation()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in test_solve_linear_equation
AssertionError

Test Classification

Classification

  • End-to-end Test
  • Unit Test
  • Integration Test
  • Regression Test

End-to-end Test

A.k.a. system test, integration test.

  • Tests the complete program.
  • From the user’s perspective.
  • Okay if slow.
  • As there will be few of them.

Ideal starting point!

Unit Test

  • Tests a “single unit”, such as a function or method.
  • Takes the developer’s perspective.
  • There may be many (hundreds).
  • They should/will be fast.

Write at least one unit test for every function.

Integration Test

  • Tests combinations and interactions of components.
  • Somewhere between end-to-end and unit test.
  • Rather used in complex applications.

Regression Test

  • Designed to avoid “regression” of undesired behavior.
  • Important for refactoring.
  • Often extensive testing of full application range.
  • Covers a large spectrum of the input space.

Testing Frameworks

Example (as before)

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k
def test_solve_linear_equation():
    nonzero_gradient = 2
    nonzero_offset   = 5

    result = solve_linear_equation(nonzero_gradient, nonzero_offset)

    expected_result = -2.5
    assert result == expected_result

Challenges

  • Separation of tests and your actual code.
  • There may be hundreds of tests.
  • We would also like some tooling that:
    • Handles entire arrays, dataframes/tables, etc.
    • Tests floating-point numbers are “almost equal”.
    • Tests errors are raised when they should be.
    • … and more.

Testing Frameworks

  • Find and run all tests in the code base.
  • Output concise report of passing tests.
  • Provide insight into failing tests.
  • Convenience methods for comparing expectation with actual event (“assertions”).

Testing Frameworks

Advanced features:

  • Convenience functions for parameterized tests.
  • Helpers that heuristically generate corner cases based on data type (e.g. using 0, nan, "").
  • Coverage report:
    Which lines were executed by the test suite?
  • Fixtures such as capturing stdout.

Testing Frameworks

Python:

R:

Best Testing Practices

Good Tests

Write your tests FIRST. [Clean Code]

  • F ast
  • I solated
  • R epeatable
  • S elf-validating
  • T imely

Fast

Prepare for 1000s of tests.

  • Aim at a few milliseconds for unit tests,
    a few seconds tops for end-to-end tests.
  • We will run all tests often.
  • We will write many of them.

Isolated

  • independent of each other
  • independent of behaviour tested elsewhere
  • independent of machine
  • independent of external data/network

Repeatable

  • Same result every time we run it.
  • No random input, fix “seeds”.
  • No side-effects.

Self-Validating

  • Test asserts itself that result was as expected.
  • No manual work involved.
  • No need to look at results if test succeeds.

Timely

Write tests as soon as possible!

  • You still know why you wrote the code that way.
  • Corner cases are fresh on your mind.
  • You are manually testing your code anyway.

Even better:
Write tests first and only then the implementation!

Best Practices

  • Keep input data minimal and obvious.
  • Mind the resource consumption: CPU, memory.
  • Don’t forget to test side-effect: changed data, logs, warnings.
  • If possible, factor out common test inputs into a “test setup”.
  • Take some time to consider corner cases.

But…

  • Don’t test third-party code, focus on yours.

Development

  • Writing code and writing tests goes hand-in-hand.
  • Software development is an iterative process.
  • Develop tests alongside your application, not afterwards.

Unit Tests

Python example with pyTest

  • We recommend you use pyTest to test Python code.
  • An alternative is UnitTest from the standard library.
  • It’s similar to jUnit in Java, and thus less “pythonic”.
  • There is a testing framework for every programming language.

Example (same still)

solvers.py:

def solve_linear_equation(k, d):
    """Solves k * x + d = 0 for x."""
    return -d / k

Test Case

test_solvers.py:

from solvers import solve_linear_equation

def test_solve_linear_equation():
    """Checks solution for k = 2 and d = 5"""
    result = solve_linear_equation(k=2, d=5)
    assert result == -2.5

Test Runner

Run test(s):

❯ pytest
========= test session starts ===================
collected 1 item

test_solver.py .                           [100%]

========= 1 passed in 0.04s =====================

Corner Cases

Add test for division by zero to test_solver.py.

from solvers import solve_linear_equation
from pytest  import raises

def test_raises_for_gradient_zero():
    with raises(ZeroDivisionError):
        solve_linear_equation(k=0, d=5)
❯ pytest
========= test session starts ===================
collected 2 items

test_solver.py ..                          [100%]

========= 2 passed in 0.04s =====================

Corner Cases

Let’s add another test, same as the first,
just with other (non-zero) values.

def test_solve_linear_equation_other_values():
    result = solve_linear_equation(k=9, d=2.7)
    assert result == -0.3

Should work, right?
–2.7 / 9 = –0.3 ✔

Corner Cases

❯ pytest
========= test session starts ===================
collected 3 items

test_solver.py ..F                          [100%]

========= FAILURES ==============================
____ test_solve_linear_equation_other_values ____

    def test_solve_linear_equation_other_values():
        result = solve_linear_equation(k=9, d=2.7)
>       assert result == -0.3
E       assert -0.30000000000000004 == -0.3
test_solver.py:21: AssertionError
========= 1 failed, 2 passed in 0.18s ============

Corner Cases

Compare floating-point numbers with a tolerance.
In Python, we can (for example) use math.isclose().

from math import isclose

def test_solve_linear_equation_other_values():
    result = solve_linear_equation(k=9, d=2.7)
    assert isclose(result, -0.3)

Other test frameworks (in other programming language) typically have a convenience function to “assert almost equal”.

pyTest also has approx(): assert result == approx(-0.3).

End-to-end Test

End-to-end Test

  • Testing the software from a user’s perspective.
  • We should have one for each workflow / use case.
  • Parameter space coverage is not the focus,
    that’s where unit tests come in.
  • Input data should be as small as possible,
    but not trivial.

Workflow

my_package.py (extract)

from pandas import read_csv, read_excel
def workflow(config_file=None):
    config  = read_config(config_file)
    data    = read_data(config)
    results = process(data, config)
    write_result(results, config)
def read_data(config):
    input_path = config['input_file']
    if input_path.endswith('.csv'):
        return read_csv(input_path)
    elif input_path.endswith('.xlsx'):
        return read_excel(input_path)
    else:
        raise NotImplementedError('Input file has unknown format.')

Given

from pandas  import read_csv
from io      import StringIO
from pathlib import Path
def test_workflow():
    # Given
    config_file = Path('test_config.yaml')
    input_file  = Path('test_input.csv')
    output_file = Path('test_result.csv')
    dataframe = read_csv(
        StringIO("""
            a     b     c
            17.1  33.9  88.2
            0.0   2.5   1.0
        """),
        sep=r'\s+',
    )
    dataframe.to_csv(input_file, index=None)
    with config_file.open('w') as stream:
        stream.write(f'input_file:  {input_file}\n')
        stream.write(f'output_file: {output_file}\n')

When - Then

import my_package
def test_workflow():
    # Given


    # When
    my_package.workflow(config_file)

    # Then
    assert_expectations(test_result)   # Make sensible assertions.

    # Cleanup
    config_file.unlink()
    input_file.unlink()
    output_file.unlink()

Real-World Testing

Challenges

You will be facing common challenges.

  • I don’t know how to get started.
  • I don’t understand what a function does.
  • I can’t test hidden/private code.
  • I can’t test every combination of inputs.
  • The function has 20 parameters, so 220 tests?
  • The function reads from a file or database.
  • The function writes to a file or database.

Getting Started

I don’t know how to get started.

  • Testing requires refactoring.
  • Refactoring requires tests.

Break the vicious circle by adding end-to-end tests.

Private Code

I can’t test hidden/private code.

  • No problem, neither can anybody else.
  • Test what users can do with your code.

Lean Testing

I can’t test every combination of inputs.

Test until you are confident that your code works.

  • Test the uses cases the unit was designed for.
  • Test “boundaries” and special cases.
  • Add tests when you discover bugs.
  • “Code coverage” tools can help.

No Documentation

I don’t understand what a function does.

Unit testing can help.

  • Write a test based on your assumption.
  • Look at the actual output.
  • Fix the test.
  • Repeat.

Complex Interface

The function has 20 parameters.

Time to refactor!

  • Add regression tests first.
  • Try to extract atomic units and test them.
    • 20 one-boolean-parameter-functions: 40 tests.
    • One 20-boolean-parameter-function: ~1M tests.
  • Add integration tests for the (important) use cases.

Interoperation

The function needs to read a file.

def read_file(path):
    with open(path, "r") as file:
        x = file.read()
        return x + x

There are several options …

File fixture in repository

Create input file for testing purposes.

from reader import read_file

TEST_FILE = "./data/example.txt"    # Contains "abc".

def test_returns_read_string_twice():
    assert read_file(TEST_FILE) == 'abcabc'

Downsides:

  • More clutter in repository.
  • Content not obvious in test code.

Create a temporary file

Create the file as needed:

from reader import read_file
import tempfile
def setup_module():
    global temp_dir, test_file
    temp_dir  = tempfile.TemporaryDirectory()
    test_file = temp_dir.name + '/data.txt'
    with open(test_file, 'w') as stream:
        stream.write('abc')
def test_returns_read_string_twice():
    assert read_file(test_file) == 'abcabc'

def teardown_module():
    temp_dir.cleanup()

Mocking

“Mock” the function that reads the file.

from unittest import mock
from reader   import read_file


def test_returns_read_string_twice_mocked():
    with mock.patch('builtins.open', mock.mock_open(read_data='abc')):
        assert read_file(test_file) == 'abcabc'

This uses mock from the UnitTest testing framework.
Most frameworks, also pyTest, have something similar.

Mocking in R

mockery package, with_mock

library(testthat)

m <- mock(1)
f <- function (x) summary(x)
with_mock(f = m, {
  expect_equal(f(iris), 1)
})

Side Effects

The function writes to a file.

def write_function(path, content):
    with open(path, "w") as file:
        file.write(content)

Let it. But …

  • Make sure to clean up afterwards.
  • Or make it write the file to the temp directory.

Side Effects

The functions accesses a database.

Same strategies: Provide test database or mock.

Images

The function creates an image.

import pandas as pd

def plotting_function(data):
    pd.DataFrame(data).plot()

Tricky. Avoid as much as possible.

  • Image renders are not pixel-perfect across platforms.
  • You’d have to compare two images with a tolerance.
    (The Matplotlib test suite does that.)
  • Minimalistic approach: Check for no failure.

Continuous Integration

Ulitimate Testing

  • Fully automated:
    Runs on every each commit in your repository.
  • Fully independent:
    Always starts from a clean slate.
  • Notifications:
    Sends an email in case of failure.

GitHub

GitHub Actions

  • Free (for open-source projects)
  • Easy setup: many prebuilt templates (“actions”)

GitHub - Example

.github/workflows/python-ci.yml

name: Test commit
on: [push, pull_request, workflow_dispatch]

jobs:
  test:
    strategy:
      matrix:
        python: ["3.10", "3.13"]
    runs-on: ubuntu-latest

    steps:
    - name: Check out code.
      uses: actions/checkout@v4
    - name: Install Python ${{ matrix.python }}.
      uses: actions/setup-python@v5
      with:
        python-version: ${{ matrix.python }}
    - name: Install project.
      run:  pip install .
    - name: Run test suite.
      run:  pytest

GitLab

Build Pipelines

  • Free for personal projects
  • Limit on user number and computation time
  • Generic template, good to start from

GitLab - Example

.gitlab-ci.yml

image: python:latest

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  paths:
    - venv/

before_script:
  - python -m venv venv/
  - source venv/bin/activate
  - pip install pytest

test:
  script:
    - pytest

CI Servers

  • For example: Jenkins.
  • Dedicated local server for CI:
    lint, test, deploy, report, …
  • Hooked to Repository: commit notification.
  • More manual work for setup and maintenance.
  • Provides plenty of additional features
    e.g., graphic coverage reports.

Conclusion

What for?

  • Code does what it’s supposed to.
  • And keeps doing so going forward.
  • We want results to be reproducible, always!

Additional Benefits

  • Confidence: Puts your mind at ease.
  • Trust:
    • Invites others to use your code.
    • Facilitates working in a team.
  • Also documents how to use your code (to an extent).
  • Accelerates further development.
  • Makes you write better code.

Take-Home Message

Always test your code!

(At least with one end-to-end test.)

Acknowledgements

  • Slides:
  • Authors:
    • Uwe Schmitt
    • Manuel Weberndorfer
    • Emanuel Schmid
    • John Hennig