Idea explained: SIFT

Dulmini Hettiarachchi
3 min readFeb 27, 2021

--

SIFT stands for Scale Invariant Feature Transform

Why?

To be able to detect key points even if they are in different scales (scale invariant). For example, a simple corner detector which detects a corner is unable to do so in a zoomed in image using the same sampling window.

Figure 1: Corner is not detected in the zoomed image, it has become an edge with the scale change

How?

By creating a feature vector which includes a locally distinct point (key point) and a descriptor that describes the key point’s surrounding.

How to find key points? — use Difference of Gaussian (DoG) approach

○ Take the image and blur it with a gaussian blur at different magnitudes. This will give a set of images with different levels of blurs; Image slightly blurred, more blurred so on… (Left stack of Figure 2)

○ Then subtract those blurred images from the adjacent images (DoG)

○ Stack those difference images on top of each other and look for extreme points (Right stack of Figure 2 and then Figure 3)

○ This is performed in an image pyramid — images are differently scaled in size. Notice sets of different scaled octaves in Figure 2.

○ This enable to find key points which are invariant to scale changes

Figure 2: Image is repeatedly convolved with Gaussians to produce the set of scale space images shown on the left. Adjacent Gaussian images are subtracted to produce the difference-of-Gaussian images on the right [1]
Figure 3: Detecting key points by using difference-of-Gaussian images [1]

How to compute the descriptor vector?

○ Take the local neighborhood around a key point (16x16 block)

○ Compute the gradient for each pixel (orientation and magnitude)

○ Divide the block into smaller sub blocks ( 16, 4x4 blocks)

○ For each block compute gradient direction histogram over 8 directions

○ Concatenate the histograms to obtain a 128 (16*8) dimensional feature vector

Figure 5: SIFT descriptor illustration (Source: Gil’s CV Blog)

Descriptor creation illustrated

Source: Gil’s CV Blog

Step 1: Warping the region around the keypoint
Step 2: Dividing to squares and calculating orientation
Step 3: Calculating histograms of gradient orientation
Step 4: Concatenating histograms from different squares

SIFT features are shown to be invariant to image rotation and scale and robust across a substantial range of affine distortion, addition of noise, and change in illumination.

Note: Writing this while learning, as a self note. If there are any mistakes or comments, please let me know. Thank you!

References

[1] Distinctive Image Features from Scale-Invariant Keypoints

[2] Gil’s CV Blog

[3] SIFT — 5 Minutes with Cyrill

--

--