Posted inCustom Object Detector

Image Data Labelling and Annotation - Everything you need to know

Image Data Labelling and Annotation

Data labelling is an essential step in a supervised machine learning task. Garbage In Garbage Out is a phrase commonly used in machine learning community which means that the quality of the training data determines the quality of the model. The same is true for annotations used for data labelling. If you show a child a tomato and say its a potato, the next time the child sees a tomato, it is very likely that he classifies it as a potato. As a machine learning model learns in a similar way, by looking at examples, the result of the model depends on the labels we feed in during its training phase.

Data labelling is a task that requires a lot of manual work. If you can find a good open dataset for your project, that is labelled, LUCK IS ON YOUR SIDE! But mostly, this is not the case. It is very likely that you will have to go through the process of data annotation by yourself.

In this post, we will look at the types of annotation for images, commonly used annotation format, and some tools that you can use for image data labelling.

Image Annotation Types

Before jumping into image annotations, it is useful to know about the different annotation types that exists so that you pick the right type for your use-case.

Here are a few different types of annotations:

Bounding boxes: Bounding boxes are the most commonly used type of annotation in computer vision. Bounding boxes are rectangular boxes used to define the location of the target object. They can be determined by the 𝑥 and 𝑦 axis coordinates in the upper-left corner and the 𝑥 and 𝑦 axis coordinates in the lower-right corner of the rectangle. Bounding boxes are generally used in object detection and localization tasks.

Bounding box for detected cars (Original Photo by Patricia Jekki on Unsplash)

Bounding boxes are usually represented by either two co-ordinates (x1, y1) and (x2, y2) or by one co-ordinate (x1, y1) and width (w) and height (h) of the bounding box. (See image below)

Bounding Box showing co-ordinates x1, y1, x2, y2, width (w) and height (h) (Photo by an_vision on Unsplash)

Polygonal Segmentation: Objects are not always rectangle in shape. With this idea, polygonal segmentation is another type of data annotation where complex polygons are used instead of rectangles to define the shape and location of the object in a much precise way.

Polygonal segmentation of images from COCO dataset (Source)

Semantic Segmentation: Semantic segmentation is a pixel-wise annotation, where every pixel in the image is assigned to a class. These classes could be pedestrian, car, bus, road, sidewalk, etc., and each pixel carries semantic meaning.

Semantic Segmentation is primarily used in cases where environmental context is very important. For example, it is used in self-driving cars and robotics because for the models to understand the environment they are operating in.

Semantic segmentation of images from Cityscapes Dataset (Source)

3D cuboids: 3D cuboids are similar to bounding boxes with additional depth information about the object. Thus, with 3D cuboids you can get a 3D representation of the object, allowing systems to distinguish features like volume and position in a 3D space.

A use-case of 3D cuboids is in self-driving cars where it can use the depth information to measure the distance of objects from the car.

3D Cuboid annotation on image (Original Photo by Jose Carbajal on Unsplash)

Key-Point and Landmark: Key-point and landmark annotation is used to detect small objects and shape variations by creating dots across the image. This type of annotation is useful for detecting facial features, facial expressions, emotions, human body parts, and poses.

Key-point annotation examples from COCO dataset (Source)

Lines and Splines: As the name suggests, this type is annotation is created by using lines and splines. It is commonly used in autonomous vehicles for lane detection and recognition.

Line annotation on road (Original Photo by Karsten Würthon Unsplash)

Image Annotation Formats

There is no single standard format when it comes to image annotation. Below are few commonly used annotation formats:

COCO: COCO has five annotation types: for object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning. The annotations are stored using JSON.

For object detection, COCO follows the following format:

Pascal VOC: Pascal VOC stores annotation in XML file. Below is an example of Pascal VOC annotation file for object detection.

YOLO: In YOLO labelling format, a .txt file with the same name is created for each image file in the same directory. Each .txt file contains the annotations for the corresponding image file, that is object class, object coordinates, height, and width.

For each object, a new line is created.

Below is an example of annotation in YOLO format where the image contains two different objects.

Image Annotation Tools

Here is a list of tools that you can use for annotating images:

1. MakeSense.AI

2. LabelImg

3. VGG image annotator

4. LabelMe


6. RectLabel


In this post, we covered what data annotation/labelling is and why it is important for machine learning. We looked at 6 different types of annotations of images: bounding boxes, Polygonal Segmentation, Semantic Segmentation, 3D cuboids, Key-Point and Landmark, and Lines and Splines, and 3 different annotation formats: COCO, Pascal VOC and YOLO. We also listed a few image annotation tools that are available.

In the next post, we will cover how to annotate image data in detail.