Overview

In machine learning, data labeling, or data annotation is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it.

Computer vision and Audio

For classification problem, organize your dataset according to the following structure:

├── train
│   ├── category 1
|      ├── 1.jpg
│      ├── 2.jpg
│   ├── category 2
|      ├── 1.jpg
│      ├── 2.jpg
├── valid
│   ├── category 1
|      ├── 1.jpg
│      ├── 2.jpg
│   ├── category 2
|      ├── 1.jpg
│      ├── 2.jpg

In PyTorch, ImageFolder can be use to automatically label your data. In Tensorflow, similar class, image_dataset_from_directory can also be use.

For object detection problem, we can use several tool such as label-studio, labelImg, labelme, etc.