This is the first post of a series that will focus on image processing tools, starting from the way in which images make their transition to numerical or mathematical representations and gradually expose the way these mathematical representations, and functions of them, play a key role in computer vision problems.
Pause for a second and think: at what point did images and pictures become authentic gatecrashers of the party of big data? We perceive that the generic answer to this question is totally related to the present-day variety of technological tools and devices that create this kind of data (e.g. smartphones). Suddenly, human beings with cameras and phones have become walking factories for image data. This abundance of images on the web, and the development of tools to process them in a correct and reliable way, has become an important input factor in vision-based artificial intelligence applications.
Leveraging useful and abundant data is one of the golden premises of machine learning algorithms. Of course, this does not imply that we simply take 10,000 random images from the internet and consider that sample as a data set to solve a particular problem. For example, imagine you want to train an image classifier to discern whether an image is of a cat or a dog. Here, the canonical supervised machine learning (ML) approach requires the compilation of many examples for both the dog and cat classes, with associated “dog” and “cat” labels. These classes will be mapped to a number (e.g. 0 for “cat” and 1 for “dog”. The supervised machine learning algorithm will then learn how to map the image with such a number.
Other tools to apply image processing methods come from computer vision (CV) and deep learning (DL). The former includes the task of extract and pre-process as much information as possible from images/videos. The latter exploits the recent development of convolutional neural networks (CNN) , in particular in the domain of image classification, segmentation and object detection. In this series I will heavily make use of two tools for image processing: OpenCV  and the scikit-image  python library.
How do you move from the picture of a cat to the algorithmical application of image operations on it? As a first step, we need to read images as numerical entities. Here, we use the python wrapper for of the OpenCV library. We can read a file ‘image.jpg’ as a numerical array (img) with just three lines of code:
import cv2 img = cv2.imread('image.jpg') print(“The size of the image array is:”, img.shape)
The first lesson here is that colour images are loaded as numerical arrays with three dimensions: height, width and channels. The first two simply represent the rectangular dimensions of the image (in pixels). The third dimension, known as “number of channels”, represents the three colour channels red (R), green (G) and blue (B), respectively1.
The second lesson is that images are represented as arrays of integers in the interval (0, 255). In fact, in the RGB colour model2 all colours are encoded as a combination of contributions from the (R, G, B) channels, each ranging from zero to 255. Cooking a colour in RGB is like saying, “aggregate this amount of red with this amount of green, and then finish with this amount of blue”. For example, the colour labeled as “cyan” has the RGB ingredients (0, 255, 255). The recipe for black is (0, 0, 0) and the one for white is (255, 255, 255). The numerical upper limit of 255 comes from the maximum value that can be encoded with 8-bits. In fact, with 2^8 binary configurations there are 256 potential value (ranging from 0 to 255).
The RGB colour scheme is just a recipe to map a numeric code to the property that we (as humans) identify as visual “colour”. With these two simple starting rules, our image now lives in the numerical space, and a number of mathematical operations can be applied to it.
Once we know how to load an image into a numerical array and understand what the colour scheme represent, we can proceed to understand fundamental image operations, with particular emphasis on modifying colour structure and local morphology: like introducing an artificial blurring or detect parts of the images with prominent edges.
In the following table are a few examples of image operations that will be presented more deeply in the subsequent posts of this series. Examples of the graphical output for these functions are displayed in the bottom figure.
|Action||Operation||OpenCV function + arguments|
|Greyscale||Map a 3D array from RGB format to a 2D array in gray scale||cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)|
|Greyscale Inverted||Invert the gray colour scale||cv2.bitwise_not(img_gray)|
|Gaussian Blurring||Apply a convolution to the 2D image with a Gaussian filter||cv2.GaussianBlur(img, filter)|
|Edge detector (Canny3)||Detect the gradients’ magnitude over the grayscale encoding of the input image||cv2.Canny(img_gray, min_val, max_val)|
|Histogram Equalization||Compensate or redistribute the contrast of colours via the histogram of the magnitude||cv2.equalizeHist(img_gray)|
|Threshold||Apply a floor/ceiling to all greyscale values according to given threshold||cv2.threshold(img_gray, thresh, max_val, cv2.THRESH_BINARY)|
In the following figure we summarise the graphical output of the image operations applied to a real image of a marathon bib number:
The next post in this series will show how to combine these image operations to extract useful features from images containing alphanumeric characters (e.g. the individual digits’ values).
Header image: courtesy of Beata Ratuszniak.
Thanks to Anju Curumsing and Rohan Liston for reviewing this post