Connected Component Analysis#

Questions

  • How to extract separate objects from an image and describe these objects quantitatively.

Objectives

  • Understand the term object in the context of images.

  • Learn about pixel connectivity.

  • Learn how Connected Component Analysis (CCA) works.

  • Use CCA to produce an image that highlights every object in a different colour.

  • Characterise each object with numbers that describe its appearance.

Objects#

In the Thresholding episode we have covered dividing an image into foreground and background pixels. In the shapes example image, we considered the coloured shapes as foreground objects on a white background.

Original shapes image

In thresholding we went from the original image to this version:

Mask created by thresholding

Here, we created a mask that only highlights the parts of the image that we find interesting, the objects. All objects have pixel value of True while the background pixels are False.

By looking at the mask image, one can count the objects that are present in the image (7). But how did we actually do that, how did we decide which lump of pixels constitutes a single object?

Pixel Neighborhoods#

In order to decide which pixels belong to the same object, one can exploit their neighborhood: pixels that are directly next to each other and belong to the foreground class can be considered to belong to the same object.

Let’s discuss the concept of pixel neighborhoods in more detail. Consider the following mask “image” with 8 rows, and 8 columns. For the purpose of illustration, the digit 0 is used to represent background pixels, and the letter X is used to represent object pixels (foreground).

0 0 0 0 0 0 0 0
0 X X 0 0 0 0 0
0 X X 0 0 0 0 0
0 0 0 X X X 0 0
0 0 0 X X X X 0
0 0 0 0 0 0 0 0

The pixels are organised in a rectangular grid. In order to understand pixel neighborhoods we will introduce the concept of “jumps” between pixels. The jumps follow two rules: First rule is that one jump is only allowed along the column, or the row. Diagonal jumps are not allowed. So, from a centre pixel, denoted with o, only the pixels indicated with a 1 are reachable:

- 1 -
1 o 1
- 1 -

The pixels on the diagonal (from o) are not reachable with a single jump, which is denoted by the -. The pixels reachable with a single jump form the 1-jump neighborhood.

The second rule states that in a sequence of jumps, one may only jump in row and column direction once -> they have to be orthogonal. An example of a sequence of orthogonal jumps is shown below. Starting from o the first jump goes along the row to the right. The second jump then goes along the column direction up. After this, the sequence cannot be continued as a jump has already been made in both row and column direction.

- - 2
- o 1
- - -

All pixels reachable with one, or two jumps form the 2-jump neighborhood. The grid below illustrates the pixels reachable from the centre pixel o with a single jump, highlighted with a 1, and the pixels reachable with 2 jumps with a 2.

2 1 2
1 o 1
2 1 2

We want to revisit our example image mask from above and apply the two different neighborhood rules. With a single jump connectivity for each pixel, we get two resulting objects, highlighted in the image with A’s and B’s.

0 0 0 0 0 0 0 0
0 A A 0 0 0 0 0
0 A A 0 0 0 0 0
0 0 0 B B B 0 0
0 0 0 B B B B 0
0 0 0 0 0 0 0 0

In the 1-jump version, only pixels that have direct neighbors along rows or columns are considered connected. Diagonal connections are not included in the 1-jump neighborhood. With two jumps, however, we only get a single object A because pixels are also considered connected along the diagonals.

0 0 0 0 0 0 0 0
0 A A 0 0 0 0 0
0 A A 0 0 0 0 0
0 0 0 A A A 0 0
0 0 0 A A A A 0
0 0 0 0 0 0 0 0

Exercise: Object counting (optional, not included in timing)#

How many objects with 1 orthogonal jump, how many with 2 orthogonal jumps?

0 0 0 0 0 0 0 0
0 X 0 0 0 X X 0
0 0 X 0 0 0 0 0
0 X 0 X X X 0 0
0 X 0 X X 0 0 0
0 0 0 0 0 0 0 0

1 jump:

a) 1 b) 5 c) 2

2 jumps:

a) 2 b) 3 c) 5

Jumps and neighborhoods

We have just introduced how you can reach different neighboring pixels by performing one or more orthogonal jumps. We have used the terms 1-jump and 2-jump neighborhood. There is also a different way of referring to these neighborhoods: the 4- and 8-neighborhood. With a single jump you can reach four pixels from a given starting pixel. Hence, the 1-jump neighborhood corresponds to the 4-neighborhood. When two orthogonal jumps are allowed, eight pixels can be reached, so the 2-jump neighborhood corresponds to the 8-neighborhood.

Connected Component Analysis#

In order to find the objects in an image, we want to employ an operation that is called Connected Component Analysis (CCA). This operation takes a binary image as an input. Usually, the False value in this image is associated with background pixels, and the True value indicates foreground, or object pixels. Such an image can be produced, e.g., with thresholding. Given a thresholded image, the connected component analysis produces a new labeled image with integer pixel values. Pixels with the same value, belong to the same object. scikit-image provides connected component analysis in the function ski.measure.label(). Let us add this function to the already familiar steps of thresholding an image.

First, import the packages needed for this episode:

import imageio.v3 as iio
import ipympl
import matplotlib.pyplot as plt
import numpy as np
import skimage as ski

%matplotlib widget

In this episode, we will use the ski.measure.label function to perform the CCA.

Next, we define a reusable Python function connected_components:

def connected_components(filename, sigma=1.0, t=0.5, connectivity=2):
    # load the image
    image = iio.imread(filename)
    # convert the image to grayscale
    gray_image = ski.color.rgb2gray(image)
    # denoise the image with a Gaussian filter
    blurred_image = ski.filters.gaussian(gray_image, sigma=sigma)
    # mask the image according to threshold
    binary_mask = blurred_image < t
    # perform connected component analysis
    labeled_image, count = ski.measure.label(binary_mask,
                                             connectivity=connectivity, return_num=True)
    return labeled_image, count

The first four lines of code are familiar from the Thresholding episode.

Then we call the ski.measure.label function. This function has one positional argument where we pass the binary_mask, i.e., the binary image to work on. With the optional argument connectivity, we specify the neighborhood in units of orthogonal jumps. For example, by setting connectivity=2 we will consider the 2-jump neighborhood introduced above. The function returns a labeled_image where each pixel has a unique value corresponding to the object it belongs to. In addition, we pass the optional parameter return_num=True to return the maximum label index as count.

Optional parameters and return values

The optional parameter return_num changes the data type that is returned by the function ski.measure.label. The number of labels is only returned if return_num is True. Otherwise, the function only returns the labeled image. This means that we have to pay attention when assigning the return value to a variable. If we omit the optional parameter return_num or pass return_num=False, we can call the function as

labeled_image = ski.measure.label(binary_mask)

If we pass return_num=True, the function returns a tuple and we can assign it as

labeled_image, count = ski.measure.label(binary_mask, return_num=True)

If we used the same assignment as in the first case, the variable labeled_image would become a tuple, in which labeled_image[0] is the image and labeled_image[1] is the number of labels. This could cause confusion if we assume that labeled_image only contains the image and pass it to other functions. If you get an AttributeError: 'tuple' object has no attribute 'shape' or similar, check if you have assigned the return values consistently with the optional parameters.

We can call the above function connected_components and display the labeled image like so:

labeled_image, count = connected_components(filename="data/shapes-01.jpg", sigma=2.0, t=0.9, connectivity=2)

fig, ax = plt.subplots()
ax.imshow(labeled_image)
ax.set_axis_off();

Suppressing outputs in Jupyter Notebooks

We just used ax.set_axis_off(); to hide the axis from the image for a visually cleaner figure. The semicolon is added to suppress the output(s) of the statement, in this case the axis limits. This is specific to Jupyter Notebooks.

We can use the function ski.color.label2rgb() to convert the 32-bit grayscale labeled image to standard RGB colour (recall that we already used the ski.color.rgb2gray() function to convert to grayscale). With ski.color.label2rgb(), all objects are coloured according to a list of colours that can be customised. We can use the following commands to convert and show the image:

# convert the label image to color image
colored_label_image = ski.color.label2rgb(labeled_image, bg_label=0)

fig, ax = plt.subplots()
ax.imshow(colored_label_image)
ax.set_axis_off();

Exercise: How many objects are in that image (15 min)#

Now, it is your turn to practice. Using the function connected_components, find two ways of printing out the number of objects found in the image.

What number of objects would you expect to get?

How does changing the sigma and threshold values influence the result?

You might wonder why the connected component analysis with sigma=2.0, and threshold=0.9 finds 11 objects, whereas we would expect only 7 objects. Where are the four additional objects? With a bit of detective work, we can spot some small objects in the image, for example, near the left border.

shapes-01.jpg mask detail

For us it is clear that these small spots are artifacts and not objects we are interested in. But how can we tell the computer? One way to calibrate the algorithm is to adjust the parameters for blurring (sigma) and thresholding (t), but you may have noticed during the above exercise that it is quite hard to find a combination that produces the right output number. In some cases, background noise gets picked up as an object. And with other parameters, some of the foreground objects get broken up or disappear completely. Therefore, we need other criteria to describe desired properties of the objects that are found.

Morphometrics - Describe object features with numbers#

Morphometrics is concerned with the quantitative analysis of objects and considers properties such as size and shape. For the example of the images with the shapes, our intuition tells us that the objects should be of a certain size or area. So we could use a minimum area as a criterion for when an object should be detected. To apply such a criterion, we need a way to calculate the area of objects found by connected components. Recall how we determined the root mass in the Thresholding episode by counting the pixels in the binary mask. But here we want to calculate the area of several objects in the labeled image. The scikit-image library provides the function ski.measure.regionprops to measure the properties of labeled regions. It returns a list of RegionProperties that describe each connected region in the images. The properties can be accessed using the attributes of the RegionProperties data type. Here we will use the properties "area" and "label". You can explore the scikit-image documentation to learn about other properties available.

We can get a list of areas of the labeled objects as follows:

# compute object features and extract object areas
object_features = ski.measure.regionprops(labeled_image)
object_areas = [objf["area"] for objf in object_features]
object_areas
[318539.0,
 1.0,
 523207.0,
 496622.0,
 517330.0,
 143.0,
 256215.0,
 1.0,
 69.0,
 338787.0,
 265767.0]

Exercise: Plot a histogram of the object area distribution (10 min)#

Similar to how we determined a “good” threshold in the Thresholding episode, it is often helpful to inspect the histogram of an object property. For example, we want to look at the distribution of the object areas.

  1. Create and examine a histogram of the object areas obtained with ski.measure.regionprops.

  2. What does the histogram tell you about the objects?

Exercise: Filter objects by area (10 min)#

Now we would like to use a minimum area criterion to obtain a more accurate count of the objects in the image.

  1. Find a way to calculate the number of objects by only counting objects above a certain area.

Using functions from NumPy and other Python packages

Functions from Python packages such as NumPy are often more efficient and require less code to write. It is a good idea to browse the reference pages of numpy and skimage to look for an available function that can solve a given task.

Exercise: Remove small objects (20 min)#

We might also want to exclude (mask) the small objects when plotting the labeled image.

  1. Enhance the connected_components function such that it automatically removes objects that are below a certain area that is passed to the function as an optional parameter.

Exercise: Colour objects by area (optional, not included in timing)#

Finally, we would like to display the image with the objects coloured according to the magnitude of their area. In practice, this can be used with other properties to give visual cues of the object properties.

Advanced indexing in NumPy

You may have noticed that in the solution, we have used the labeled_image to index the array object_areas. This is an example of advanced indexing in NumPy. The result is an array of the same shape as the labeled_image whose pixel values are selected from object_areas according to the object label. Hence the objects will be colored by area when the result is displayed. Note that advanced indexing with an integer array works slightly different than the indexing with a Boolean array that we have used for masking. While Boolean array indexing returns only the entries corresponding to the True values of the index, integer array indexing returns an array with the same shape as the index. You can read more about advanced indexing in the NumPy documentation.

Summary Quiz#