Overview
In machine learning, data labeling, or data annotation is the process of identifying raw data (images, text files, videos, etc.) and adding one or more meaningful and informative labels to provide context so that a machine learning model can learn from it.
Computer vision and Audio
For classification problem, organize your dataset according to the following structure:
├── train
│ ├── category 1
| ├── 1.jpg
│ ├── 2.jpg
│ ├── category 2
| ├── 1.jpg
│ ├── 2.jpg
├── valid
│ ├── category 1
| ├── 1.jpg
│ ├── 2.jpg
│ ├── category 2
| ├── 1.jpg
│ ├── 2.jpg
In PyTorch, ImageFolder
can be use to automatically label your data. In Tensorflow, similar class, image_dataset_from_directory
can also be use.
For object detection problem, we can use several tool such as label-studio, labelImg, labelme, etc.