Computer Vision is one of the most exciting fields in Artificial Intelligence and machine learning.
It enables machines to “see” and interpret visual data just like humans do — but with speed and precision.
From face recognition to self-driving cars, its applications are revolutionizing every industry.
In this blog, we’ll break down what computer vision is, how it works, and where it's used.
Definition: Computer vision(CV) is a domain of artificial intelligence and computer science that makes computer systems capable of extracting meaningful information from visual data like videos and images.
Applications of Computer Vision
ü Facial Recognition: It is used to enable a system to detect, recognise and verify faces in digital images or videos. This technology is popular in unlocking a Smartphone.
ü Face filters: Through the camera, the machine or the algorithm is able to identify the facial features of the person and applies the facial filter selected. This technology is widely used by influencers on Instagram, Snapchat, etc. like social media platforms.
ü Image based search: Users will upload the image and the Google search engine shows all the relevant images from the web. In this feature we don't need to type anything.
ü Self-driving cars: Computer Vision is the fundamental technology behind the development of autonomous vehicles. This involves the process of identifying the objects, getting navigational routes and also at the same time environment monitoring.
ü Medical imaging: It helps the doctors and physicians to analyse X-rays, CT scans, MRIs, and ultrasound images. It also helps in detecting abnormalities, quantifying disease progression, identifying and anatomical structures. (Important)
ü Google Translate App: This app utilises CV technology, to enable real-time translation of text captured by the device's camera. This app identifies and extracts text characters from the mage wich are then translated into the desired language.
Computer vision tasks
Classification
ü A computer vision task for single object.
ü It enables the computer to see an image and accurately classify it in the class in which it falls.
ü It requires training a model to recognise patterns and features within images to assign them to the correct class.
Classification + Localisation
ü A computer vision task for single object.
ü It involves both processes of identifying the object present in the image and at the same time identifying the location at which the object is present in that image.
ü This is a combined task of classification and localisation of the input image.
Object Detection
ü A computer vision task for multiple objects.
ü It is the task of locating and classifying multiple objects within an image i.e. identifying where the object is in the image.
ü It also draws a bounding boxes around the objects.
Instance segmentation
ü A computer vision task for multiple objects.
ü It identifies an object by dividing images into different regions based on the pixels seen.
ü It also places a shape or outline of an item to determine what it is.
ü It also recognises if there is more than one object in an image or frame.
Note: The picture of dog is
missing in the instance segmentation. It's due to a technical issue
Basics of Images
Pixels
ü Pixels are the smallest unit of a digital image.
ü They stands for "picture element"
ü They represent individual points of light and are typically arranged in a grid pattern.
ü Each pixel is characterised by its position within the image and its colour or intensity value.
Resolution
ü Resolution refers to the number of pixels contained in an image
ü It is expressed as width X length in pixels.
ü High resolution has good quality image and lower resolution has a bad quality image.
Pixel value: Refers
to the numerical representation of the colour or intensity of a pixel.
Difference between grayscale image
and RGB images
Grayscale Images |
RGB Images |
Black and white image |
Colorful image |
Each pixel contains only shades of gray |
Each pixel contains values for Red, Green, and Blue channels |
Each pixel has a single intensity value (brightness) |
Each pixel has three intensity values (one for each color channel) |
No color channels present |
Color channels (Red, Green, Blue) are present |
Typically uses 8 bits per pixel (0–255 gray levels) |
Typically uses 24 bits per pixel (8 bits per channel × 3) |
Used for simplicity, image processing, medical imaging, etc. |
Used in digital photography, videos, and color display systems |