Checking and Testing Code#

Learning Objectives#

  • Understand the importance and limitations of software testing

  • Recognize different types of testing such as system-level and defect testing

  • Learn how to implement and interpret assert statement in Python code

  • Understand when and why to use assert statements

  • Understand the concept of unit testing and Test Driven Development (TDD)

  • Write and execute unit tests using the unit test module

  • Learn the purpose and implementation of fixtures and mocks in testing

  • Apply these concepts to test code that depends on external resources

  • Understand the role of code linting and type checking

  • Use tools like Flake8 and my[py] to ensure code quality and adherence to style guidelines

Testing software#

Although we cannot prove our code is free of defects or bugs, we can, and should, establish that it behaves as intended.

System level testing#

Once we have completed our software, we should ensure that it works as intended. This is referred to as validation testing. Typically, this will involve taking a sample of input data and ensuring that the output of our software is as expected for the given input.

This validation testing will tell us if the overall system runs and produces valid results. Such tests should be repeated when changes are made to the software to ensure the changes have not introduced errors.

Changes that can impact your software are diverse and include:

  • new Python language releases

  • upgrades to imported libraries

  • operating system updates.

Defect testing#

In a research environment it is often the case that there is no explicit specification for the software we create.

By specification we mean something like:

  • Written statement of user requirements - typically “user stories”

  • Functional requirements - e.g what file formats are to be supported

  • Non-functional requirements - e.g. subject data must be encrypted.

Discussion#

If you don’t have a specification for your software, how might you establish suitable tests to find and resolve defects?

Assert statement#

The built-in Python assert statement looks like this -

# Try modifying this code to deliberately fail the assert statements

def my_add_two(a):
    return a + 2.0

assert my_add_two(1) == 3
# Better to include a message in case of failure
assert my_add_two(3) == 5, f"my_add_two(3) failed with {my_add_two(3)}, expected 5"

When to use assert#

assert should never be used to modify control flow.

Assertions allow you to verify that parts of your program are correct, but are only applied if the internal constant __debug__ is True. Although __debug__ is usually set to True, it is not guaranteed.

## This is approximate what the assert statement does 

def my_assert(condition, message):
    if __debug__ and not condition:
        raise AssertionError(message)

my_assert(my_add_two(1) == 3, "my_add_two(1) failed")

Why might we want different behaviour from our assert statements?#

What would you want your assert statements to do?#

Unit-tests#

Unit-tests are small tests that test the behaviours of our functions and classes.

Unit-tests are typically run within a testing framework or test-runner that automates testing, often inside our IDE.

Test Driven Development (TDD)#

TDD is an approach to software design, it is not software testing. TDD uses unit-tests to create a software design, especially when the design is created incrementally, as with Agile.

Refactoring#

Whether or not you adopt TDD, refactoring - changing the implementation of your code without changing its behaviour, is something that you are certain to do. If only to remove print statements, or change the names of variables.

Refactoring code without appropriate tests can easily introduce new errors.

unittest#

Python 3 distributions include the unittest module. See https://docs.python.org/3/library/unittest.html

import unittest

class TestMyAddTwo(unittest.TestCase):
    def test_my_add_two(self):
        self.assertEqual(my_add_two(1), 3)
    def test_my_add_two_3(self):
        self.assertEqual(my_add_two(3), 5)

# unittest.main(argv=[''], exit=False)

Fixtures and mocks#

Ideally each unit of code should be tested independently.

Why is this?#

However, there are situations where testing code might require data read from a file or a database connection. If only one test requires this external data, then opening the file and reading the data will be part of the test. If several tests require this data, then we use a fixture.

# Module level fixture setup and teardown
def setUpModule():
    global sample_data
    sample_data = open("data/rows.txt", "r")

def tearDownModule():
    sample_data.close()

# unittest.main(argv=[''], exit=False)
# Class level fixture setup and teardown
class TestMyAddTwo(unittest.TestCase):
    @classmethod
    def setUpClass(cls):
        cls.sample_data = open("data/rows.txt", "r")

    @classmethod
    def tearDownClass(cls):
        cls.sample_data.close()

    def test_file_parsing(self):
        for line in self.sample_data:
            # Do something with the line
            pass

Mocks#

Mock and MagicMock objects create all attributes and methods as you access them and store details of how they have been used. You can configure them to specify return values or limit what attributes are available.

See https://docs.python.org/3/library/unittest.mock.html

How can we test a function that does not return a value?#

def show_results():
    arr = [1, 2, 3]
    print(arr)
    print()

show_results()
[1, 2, 3]

Here is a possible test#

from unittest.mock import MagicMock

class TestShowResults(unittest.TestCase):
    def setUp(self):
        global print
        print = MagicMock()
    def tearDown(self):
        global print
        print = __builtins__.print
    def test_show_results(self):
        show_results()
        self.assertEqual(print.call_count, 2)

# unittest.main(argv=[''], exit=False)

How does this test work?#

‘Linting’ code with Flake8 and my[py]#

There are various tools that can analyse Python code and suggest fixes or improvements without running the code.

These are ‘static code checkers’ or ‘linters’ - because they help you remove fluff!

my[py]#

We saw my[py] briefly before. It is used to find mistakes in type hints, and can even be used to enforce type hints if desired.

https://mypy.readthedocs.io/en/stable/getting_started.html

Flake8#

Flake8 runs a variety of checks on your Python scripts, and can be used with IDEs such as VS Code to help you write clearer, more readable, code. The, optional, but highly recommended style guide for Python is PEP 8.

https://peps.python.org/pep-0008/

https://flake8.pycqa.org/en/latest/index.html

Test coverage#

Coverage.py works in three phases:#

  • Execution: Coverage.py runs your code, and monitors it to see what lines were executed.

  • Analysis: Coverage.py examines your code to determine what lines could have run.

  • Reporting: Coverage.py combines the results of execution and analysis to produce a coverage number and an indication of missing execution.

See https://coverage.readthedocs.io/en/7.5.3/api.html

Pytest#

Introduction to pytest#

Pytest is another popular tool for testing in Python which makes it easier to write and run tests.

Pytest uses file and function naming conventions to discover test. You will rarely need to run a test directly as the framework will find and run tests for you when you modify your code.

Pytest is a package that you install into your environment from conda or PyPI. For example:

pip install pytest

You should then create a directory containing your tests called tests.

mkdir tests

Within tests, you can create Python scripts containing tests - e.g. example_test.py.

Example test: comparing csv files#

There are a wide variety of applications for testing. Below is an example of a test where we are running some code to generate a .csv file, and then confirming if the results are as expected.

This could come in handy if you have produced code for a model, and are concerned that others running on the model on a different machine could be getting slightly different results.

In this example, we start the .py file by importing:

  • pytest (to run the tests)

  • pandas (to manage the csv files)

  • Our model (imagining a scenario where we have a file model_code inside a folder scripts/, which is a sister folder to tests/)

  • tempfile (to save our model results to a temporary directory)

import pytest
import pandas as pd
from scripts import model_code
import tempfile

We then provide file paths at the start of our script:

EXP_FOLDER = 'exp_results'
TEMP_FOLDER = tempfile.mkdtemp()

Assuming we might be running the model with two different parameters (so producing two .csv files, and comparing each of those), we can use parametrise to run the same test with two different inputs. First, we can define our parameters for the model:

parameters = [
    {
        'arrivals': 100,
        'file': 'result100.csv'
    },
    {
        'arrivals': 150,
        'file': 'result150.csv'
    }
]

For the file paths, we should set these up as fixtures:

@pytest.fixture
def exp_folder():
    return EXP_FOLDER


@pytest.fixture
def temp_folder():
    return TEMP_FOLDER

We can then write our test function, and use the file names from parameters as the ID for each test:

@pytest.mark.parametrize('param', parameters,
                         ids=[d['file'] for d in parameters])
def test_equal_df(param, temp_folder, exp_folder):
    '''
    Test that model results are consistent with the expected
    results (which are saved in the EXP_FOLDER)
    '''
    # Run the model (assuming the function has inputs for our
    # parameter dictionary and for a save location for the .csv file)
    model_code.main(**param, temp_folder)

    # Import the test and expected results (we can use the filename from the
    # parameter dictionary, and then the folder name)
    test_result = import_xls(temp_folder, param['file'])
    exp_result = import_xls(exp_folder, param['file'])

    # Check that the dataframes are equal
    pd.testing.assert_frame_equal(test_result, exp_result)

With our .py file now complete, we can run our tests from the terminal. Ensuring you are located in the parent folder to tests/, run the command:

pytest

If your tests take a long time to run, you may want to explore parallelising them. You can install pytest-xdist, which is a package that parallelises your pytests using multiple CPUs. With this package installed, you can run the command:

pytest -n auto

Coverage#

https://pypi.org/project/pytest-cov/

pytest-notebook#

See https://pytest-notebook.readthedocs.io/en/latest/

Resources#

See the testing section of https://alan-turing-institute.github.io/rse-course/html/module01_introduction_to_python/index.html

Testing Practical Exercise#

Python 3 distributions include the unittest module. See https://docs.python.org/3/library/unittest.html

Here is the example included in the Python documentation.

Exercise 1#

Using the above as a template, create a test class for the Upper class we used earlier.

class Upper(str):
    def __new__(cls, text=""):
        return super().__new__(cls, text.upper())

Important#

What should (and can) be tested?

See https://docs.python.org/3/library/unittest.html

Exercise 2#

Design a new capability for the class using TDD.

Here are some suggestions -

  • Do not allow strings without at least one letter

  • Only allow strings that begin with a letter

  • Limit the length of the string to 10 characters.