Checking and Testing Code#
Learning Objectives#
Understand the importance and limitations of software testing
Recognize different types of testing such as system-level and defect testing
Learn how to implement and interpret assert statement in Python code
Understand when and why to use assert statements
Understand the concept of unit testing and Test Driven Development (TDD)
Write and execute unit tests using the unit test module
Learn the purpose and implementation of fixtures and mocks in testing
Apply these concepts to test code that depends on external resources
Understand the role of code linting and type checking
Use tools like Flake8 and my[py] to ensure code quality and adherence to style guidelines
Testing software#
Although we cannot prove our code is free of defects or bugs, we can, and should, establish that it behaves as intended.
System level testing#
Once we have completed our software, we should ensure that it works as intended. This is referred to as validation testing. Typically, this will involve taking a sample of input data and ensuring that the output of our software is as expected for the given input.
This validation testing will tell us if the overall system runs and produces valid results. Such tests should be repeated when changes are made to the software to ensure the changes have not introduced errors.
Changes that can impact your software are diverse and include:
new Python language releases
upgrades to imported libraries
operating system updates.
Defect testing#
In a research environment it is often the case that there is no explicit specification for the software we create.
By specification we mean something like:
Written statement of user requirements - typically “user stories”
Functional requirements - e.g what file formats are to be supported
Non-functional requirements - e.g. subject data must be encrypted.
Discussion#
If you don’t have a specification for your software, how might you establish suitable tests to find and resolve defects?
Assert statement#
The built-in Python assert statement looks like this -
# Try modifying this code to deliberately fail the assert statements
def my_add_two(a):
return a + 2.0
assert my_add_two(1) == 3
# Better to include a message in case of failure
assert my_add_two(3) == 5, f"my_add_two(3) failed with {my_add_two(3)}, expected 5"
When to use assert#
assert
should never be used to modify control flow.
Assertions allow you to verify that parts of your program are correct, but are only applied if the internal constant __debug__
is True
. Although __debug__
is usually set to True, it is not guaranteed.
## This is approximate what the assert statement does
def my_assert(condition, message):
if __debug__ and not condition:
raise AssertionError(message)
my_assert(my_add_two(1) == 3, "my_add_two(1) failed")
Why might we want different behaviour from our assert statements?#
What would you want your assert statements to do?#
Unit-tests#
Unit-tests are small tests that test the behaviours of our functions and classes.
Unit-tests are typically run within a testing framework or test-runner that automates testing, often inside our IDE.
Test Driven Development (TDD)#
TDD is an approach to software design, it is not software testing. TDD uses unit-tests to create a software design, especially when the design is created incrementally, as with Agile.
Refactoring#
Whether or not you adopt TDD, refactoring - changing the implementation of your code without changing its behaviour, is something that you are certain to do. If only to remove print statements, or change the names of variables.
Refactoring code without appropriate tests can easily introduce new errors.
unittest#
Python 3 distributions include the unittest module. See https://docs.python.org/3/library/unittest.html
import unittest
class TestMyAddTwo(unittest.TestCase):
def test_my_add_two(self):
self.assertEqual(my_add_two(1), 3)
def test_my_add_two_3(self):
self.assertEqual(my_add_two(3), 5)
# unittest.main(argv=[''], exit=False)
Fixtures and mocks#
Ideally each unit of code should be tested independently.
Why is this?#
However, there are situations where testing code might require data read from a file or a database connection. If only one test requires this external data, then opening the file and reading the data will be part of the test. If several tests require this data, then we use a fixture.
# Module level fixture setup and teardown
def setUpModule():
global sample_data
sample_data = open("data/rows.txt", "r")
def tearDownModule():
sample_data.close()
# unittest.main(argv=[''], exit=False)
# Class level fixture setup and teardown
class TestMyAddTwo(unittest.TestCase):
@classmethod
def setUpClass(cls):
cls.sample_data = open("data/rows.txt", "r")
@classmethod
def tearDownClass(cls):
cls.sample_data.close()
def test_file_parsing(self):
for line in self.sample_data:
# Do something with the line
pass
Mocks#
Mock and MagicMock objects create all attributes and methods as you access them and store details of how they have been used. You can configure them to specify return values or limit what attributes are available.
See https://docs.python.org/3/library/unittest.mock.html
How can we test a function that does not return a value?#
def show_results():
arr = [1, 2, 3]
print(arr)
print()
show_results()
[1, 2, 3]
Here is a possible test#
from unittest.mock import MagicMock
class TestShowResults(unittest.TestCase):
def setUp(self):
global print
print = MagicMock()
def tearDown(self):
global print
print = __builtins__.print
def test_show_results(self):
show_results()
self.assertEqual(print.call_count, 2)
# unittest.main(argv=[''], exit=False)
How does this test work?#
‘Linting’ code with Flake8 and my[py]#
There are various tools that can analyse Python code and suggest fixes or improvements without running the code.
These are ‘static code checkers’ or ‘linters’ - because they help you remove fluff!
my[py]#
We saw my[py] briefly before. It is used to find mistakes in type hints, and can even be used to enforce type hints if desired.
Flake8#
Flake8 runs a variety of checks on your Python scripts, and can be used with IDEs such as VS Code to help you write clearer, more readable, code. The, optional, but highly recommended style guide for Python is PEP 8.
Test coverage#
Coverage.py works in three phases:#
Execution: Coverage.py runs your code, and monitors it to see what lines were executed.
Analysis: Coverage.py examines your code to determine what lines could have run.
Reporting: Coverage.py combines the results of execution and analysis to produce a coverage number and an indication of missing execution.
Pytest#
Introduction to pytest#
Pytest is another popular tool for testing in Python which makes it easier to write and run tests.
Pytest uses file and function naming conventions to discover test. You will rarely need to run a test directly as the framework will find and run tests for you when you modify your code.
Pytest is a package that you install into your environment from conda or PyPI. For example:
pip install pytest
You should then create a directory containing your tests called tests
.
mkdir tests
Within tests, you can create Python scripts containing tests - e.g. example_test.py
.
Example test: comparing csv files#
There are a wide variety of applications for testing. Below is an example of a test where we are running some code to generate a .csv
file, and then confirming if the results are as expected.
This could come in handy if you have produced code for a model, and are concerned that others running on the model on a different machine could be getting slightly different results.
In this example, we start the .py
file by importing:
pytest
(to run the tests)pandas
(to manage the csv files)Our model (imagining a scenario where we have a file
model_code
inside a folderscripts/
, which is a sister folder totests/
)tempfile
(to save our model results to a temporary directory)
import pytest
import pandas as pd
from scripts import model_code
import tempfile
We then provide file paths at the start of our script:
EXP_FOLDER = 'exp_results'
TEMP_FOLDER = tempfile.mkdtemp()
Assuming we might be running the model with two different parameters (so producing two .csv
files, and comparing each of those), we can use parametrise
to run the same test with two different inputs. First, we can define our parameters for the model:
parameters = [
{
'arrivals': 100,
'file': 'result100.csv'
},
{
'arrivals': 150,
'file': 'result150.csv'
}
]
For the file paths, we should set these up as fixtures:
@pytest.fixture
def exp_folder():
return EXP_FOLDER
@pytest.fixture
def temp_folder():
return TEMP_FOLDER
We can then write our test function, and use the file names from parameters as the ID for each test:
@pytest.mark.parametrize('param', parameters,
ids=[d['file'] for d in parameters])
def test_equal_df(param, temp_folder, exp_folder):
'''
Test that model results are consistent with the expected
results (which are saved in the EXP_FOLDER)
'''
# Run the model (assuming the function has inputs for our
# parameter dictionary and for a save location for the .csv file)
model_code.main(**param, temp_folder)
# Import the test and expected results (we can use the filename from the
# parameter dictionary, and then the folder name)
test_result = import_xls(temp_folder, param['file'])
exp_result = import_xls(exp_folder, param['file'])
# Check that the dataframes are equal
pd.testing.assert_frame_equal(test_result, exp_result)
With our .py
file now complete, we can run our tests from the terminal. Ensuring you are located in the parent folder to tests/
, run the command:
pytest
If your tests take a long time to run, you may want to explore parallelising them. You can install pytest-xdist
, which is a package that parallelises your pytests using multiple CPUs. With this package installed, you can run the command:
pytest -n auto
Coverage#
pytest-notebook#
Resources#
See the testing section of https://alan-turing-institute.github.io/rse-course/html/module01_introduction_to_python/index.html
Testing Practical Exercise#
Python 3 distributions include the unittest module. See https://docs.python.org/3/library/unittest.html
Here is the example included in the Python documentation.
Exercise 1#
Using the above as a template, create a test class for the Upper class we used earlier.
class Upper(str):
def __new__(cls, text=""):
return super().__new__(cls, text.upper())
Important#
What should (and can) be tested?
Exercise 2#
Design a new capability for the class using TDD.
Here are some suggestions -
Do not allow strings without at least one letter
Only allow strings that begin with a letter
Limit the length of the string to 10 characters.