Frequency domain techniques for motion estimation

(1)

SCIENCES

FREQUENCY DOMAIN TECHNIQUES FOR

MOTION ESTIMATION

by

Metehan ALTUNLU

February, 2011 ĐZMĐR

(2)

FREQUENCY DOMAIN TECHNIQUES FOR

MOTION ESTIMATION

A Thesis Submitted to the

Graduate School of Natural and Applied Sciences of Dokuz Eylül University In Partial Fulfillment of the Requirements for the Degree of Master of Science

in Electrical and Electronics Engineering

by

Metehan ALTUNLU

February, 2011 ĐZMĐR

(3)

ii

We have read the thesis entitled “FREQUENCY DOMAIN TECHNIQUES FOR MOTION ESTIMATION” completed by METEHAN ALTUNLU under supervision of ASS. PROF. DR. HALDUN SARNEL and we certify that in our opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.

___________________________ ___________________________ Ass. Prof. Dr. Haldun SARNEL

Supervisor

___________________________ ___________________________ ___________________________ ___________________________ (Jury Member) (Jury Member)

___________________________ ___________________________

Prof.Dr. Mustafa SABUNCU Director

(4)

iii

I would like to thank to my advisor Ass. Prof. Dr. Haldun Sarnel for his valuable guidance and support during theory, application and conclusion period of this thesis. I don’t think, without his help and guidance, this work could be possible in due course.

I also would like to thank to my family since they have always believed what I believe and encouraged me to be able to fulfill my dreams.

(5)

iv ABSTRACT

Motion estimation is defined as searching the best motion vector, which is the displacement of the coordinate of the best similar block in previous frame for the block in current frame. It has been extensively studied by both computer vision and human vision researchers for more than 30 years. The output of motion estimation is used in various areas such as video compression, segmentation, pattern tracking and camera motion stabilization.

Motion estimation techniques could be classified into four main categories: techniques based on spatio-temporal differentials, techniques based on matching, Fourier techniques and techniques using feature extraction.

In this thesis, four frequency domain techniques which are phase correlation, gradient correlation, statistically robust correlation, and robust correlation were examined. To be able to detect local motion, image was divided into small blocks and the blocks that are in the same location on the consecutive frames were converted into frequency domain. Then, one of the correlation methods (phase correlation, cross correlation, etc…) were applied in the frequency domain. After inverse Fourier transform, one peak and five biggest peaks were being searched in the correlation function and their performance comparison were made using peak signal to noise ratio (PSNR) of displacement field difference as a measure of success.

Keywords: Motion Estimation, Frequency Domain, Cross Correlation, Phase Correlation, Robustness, Gaussian Noise, Motion Vector, Gradient

(6)

v ÖZ

Hareket kestirimi, bir önceki resimdeki bloğun kordinatının şimdiki resimdeki en benzer bloğa göre yer değiştirmesini temsil eden en doğru hareket vektörünü aramak olarak tanımlanır. 30 yıldan fazla bir süredir bu konu bilgisayar görüsü ve insan görüsü araştırmacıları tarafından yoğun olarak çalışılmaktadır. Hareket kestiriminin çıktıları video sıkıştırma, patern takibi, bölütleme ve kamera hareketi dengeleme gibi birçok alanda kullanılır.

Hareket kestirimi teknikleri dört ana kategoride sınıflandırılabilir: uzay-zamansal diferansiyel kullanan teknikler, eşleme teknikleri, Fourier teknikleri ve özellik çıkarımı kullanan teknikler.

Bu tezde faz korelasyon, gradyan korelasyon, istatistiksel gürbüz korelasyon ve gürbüz korelasyon olarak dört frekans ortamı tekniği incelendi. Bölgesel hareketi anlayabilmek için resim küçük bloklara bölündü ve ardışık resimlerde aynı bölgedeki bloklar frekans ortamına çevrildi. Sonra sonuca korelasyon metodlarından biri (faz korelasyon, çapraz korelasyon, vs…) uygulandı. Ters Fourier dönüşümü yapıldıktan sonra, korelasyon fonksiyonunda bir tepe noktası ve en büyük beş tepe noktaları arandı ve bunların birbirinden performans kıyaslaması, yer değiştirme alan farkının tepe sinyal gürültü oranını (PSNR) başarı ölçütü olarak ele alarak yapıldı.

Anahtar Sözcükler: Hareket Kestirimi, Frekans Ortamı, Çapraz Korelasyon, Faz Korelasyonu, Gürbüzlük, Gauss Gürültüsü, Hareket Vektörü, Gradyan

(7)

vi

Page

THESIS EXAMINATION RESULT FORM ... ii

ACKNOWLEDGEMENTS ... iii

ABSTRACT ...iv

ÖZ ...v

CHAPTER ONE – INTRODUCTION ...1

1.1 Introduction...1

1.2 Types of Motion...2

1.3 Literature Review...3

1.4 Comparison of Motion Measurement Techniques...5

1.4.1 Techniques Based on Spatio-Temporal Differentials...5

1.4.2 Techniques Based on Matching ...6

1.4.3 Fourier Techniques...6

1.4.4 Techniques Using Feature Extraction...7

1.5 Thesis Scope ...8

1.6 Outline ...8

CHAPTER TWO – MOTION ESTIMATION FUNDAMENTALS ...9

2.1 Linear Transformations ...9

2.1.1 Scaling ...9

2.1.2 Rotation...11

2.1.3 Translation ...11

2.2 Image Noise ...13

2.3 Image Similarity Measures...14

(8)

vii

2.4.2 Accuracy ...16

2.4.3 Speed and Computational Efficiency ...16

CHAPTER THREE – FREQUENCY DOMAIN TECHNIQUES...17

3.1 Fourier Transform ...17

3.2 Discrete Fourier Transform ...18

3.3 Fast Fourier Transform ...20

3.4 Frequency Domain Techniques ...22

3.4.1 Phase Correlation Technique ...22

3.4.1.1 Stages of Phase Correlation Technique ...24

3.4.1.2 Steps of Phase Correlation Technique ...26

3.4.1.3 Properties of Phase Correlation Technique...28

3.4.1.4 Undesirable Effects of Phase Correlation Technique ...29

3.4.2 Gradient Correlation in the Frequency Domain...30

3.4.2.1 Steps of Gradient Correlation in the Frequency Domain...31

3.4.3 Statistically Robust Correlation ...32

3.4.3.1 Steps of Statistically Robust Correlation ...35

3.4.4 Robust Correlation...36

3.5 Comparison Criteria ...38

CHAPTER FOUR - IMPLEMENTATION...39

4.1 Implemented Methods...39

4.1.1 Phase Correlation ...39

4.1.2 Gradient Correlation Application...41

4.1.3 Statistically Robust Correlation Application ...42

4.1.4 Robust Correlation Application ...43

(9)

viii

CHAPTER FIVE – EXPERIMENTS AND RESULTS ...49

5.1 Description of Experiments ...49

5.2 Results ...51

5.2.1 FlowerGarden Sequence Results...51

5.2.2 Schefflera Sequence Results ...54

5.2.3 Mequon Sequence Results ...57

5.2.4 Grove Sequence Results ...60

5.2.5 Grove2 Sequence Results ...63

5.2.6 Grove3 Sequence Results ...66

5.2.7 Hydrangea Sequence Results ...69

5.2.8 Urban Sequence Results ...72

CHAPTER SIX - CONCLUSION...78

REFERENCES ...80

(10)

CHAPTER ONE

INTRODUCTION

1.1 Introduction

Motion estimation, which refers to image-plane motion (2-D motion) or object-motion (3-D object-motion) estimation, is one of the fundamental problems in digital image processing. Because time-varying images are D projections of the 3-D secenes, 2-D motion refers to the projection of the 3-2-D motion onto the image plane. We wish to estimate the 2-D motion (instantaneous velocity or displacement) field from the time-varying images. However, 2-D velocity or displacement fields may not always be observable for several reasons. Instead, what we observe is the so-called “apparent” motion (optical flow or correspondence) field.

The correspondence (optical flow) field is, in general, different from the 2-D displacement (2-D velocity) field due to:

• Lack of sufficient spatial image gradient

There must be sufficient gray-level (color) variation within the moving region for the actual motion to be observable. For example, the motion of a circle with uniform intensity rotating about its center generates no optical flow, thus is unobservable.

• Changes in external illumination

An observable optical flow may not always correspond to an actual motion. For example, if the external illumination varies from frame to frame, then an optical flow will be observed even though there is no motion. Therefore, changes in external illumination impair the estimation of the actual 2-D motion field. (Tekalp,1995)

The 2-D displacement and velocity fields are projections of the respective 3-D fields into the image plane, whereas the correspondence and optical flow fields are the velocity and displacement fuctions perceived from the time-varying image

(11)

intensity pattern. Since we can only observe optical flow and correspondence fields, we assume they are the same as the 2-D motion field in the remainder of this thesis.

1.2 Types of Motion

In any image sequence, two kinds of motion can be, in general, distinguished. The first kind results from the motion of the camera (e.g., panning, zooming, tilting and/or a more complex combination of these basic components); this kind of motion is generally referred to as global motion, where the region of support for motion representation consists of the entire image frame. The other kind of motion results from the translational displacements and transformations (e.g., scaling, rotations, and deformations) of individual objects composing the scene; this is referred to as local motion. The region of support for local motion consists of small areas (rectangular blocks or even a single pixel) within an image.

Most motion estimation techniques make no distinction between the global and local motion; global motion is taken into account only implicitly through local estimates. However, it is usually advantageous to process global and local motion separately. Global Motion Estimation (GME) assumes that all the pixels in the image or block changes uniformly wheras Local Motion Estimation (LME) is motion estimation of moving objects within a frame or block. These objects are not sufficiently represented by global motion. Thus, local motion estimation is also needed.

Motions are often modeled by parametric transformations (motions) of two-dimensional images. Motion estimation is the process of determining motion vectors that describe the transformation from one image to another. There are many image processing applications in which knowledge of the speed and direction of the movement of all parts of the image would be very useful. This applications include video compression, segmentation, pattern tracking, camera motion stabilization, de-interlacing, etc…

(12)

Image properties, such as intensity or color, have a very high correlation in the direction of motion. They do not change significantly when tracked in the image (the color of a car does not change as the car moves across the image). This can be used for the removal of temporal video redundancy; in an ideal situation only the first image and the subsequent motion have to be transmitted.

Most motion measurement techniques discussed in the literature fall into four categories: techniques based on spatio-temporal differentials, techniques based on matching, Fourier techniques, techniques using feature extraction.

1.3 Literature Review

According to P.C.Shenolikar and S.P. Narote (2009), motion estimation is defined as searching the best motion vector, which is the displacement of the coordinate of the best similar block in previous frame for the block in current frame. In literature, there are various kinds of proposed motion estimation techniques that have superior performance for specific motions. Existing motion estimation algorithms are generally divided into three groups : differential, frequential and primitive matching techniques (Bouchafa, Aubert, Bouzar, 1998).

The block-matching algorithm (BMA) is dominant and more suitable for a simple hardware realization because of its regularity and simplicity (Chen, Chen, Chiueh, Lee, 1995). Despite its increasing popularity, the block-matching motion estimation has its drawbacks. Most of them are due to the assumption that the motion of the moving objects within a block is a uniform translation which can be approximated by a displacement vector. In reality, motion is a complex combination of translation and rotation, which is impossible to estimate using this technique. In order to cope with rotation as well as other nonlinear deformations, a general approach to blockmatching motion estimation was proposed (Seferidis, Ghanbari,1992). In this method, blocks of pixels in the previous frame are nonlinearly transformed to represent deformation of pixel blocks due to complex motion. Such a transformation requires additional operations, which increase the computational load of the motion

(13)

compensation process, but the improved prediction reduces the bit rate, especially when complicated motion exists in the scene (Seferidis, Ghanbari,1994).

The idea of feature-based motion estimator is to concentrate only on a small set of feature points, for which the motion can be determined reliably. In the past, a large number of feature point detectors have been proposed and this topic is still an active area of research (Chun, Yi, Bo, Yan, 2010).

The global motion estimation method based on feature matching is approached to human visual characteristic perfectly. The typical features are extracted and used to estimate model parameters. In general, typical features are line intersections, comers, points of locally maximum curvature of contour lines, centers of windows having locally maximum curvature and centers of gravity of closed-boundary regions (Huang, Chi, Hsieh and Wen-Shyong, 2004). It is important to solve the problems of feature stabilization and location accuracy. To obtain stable and accurate feature, many methods have been used. Feature is extracted according to pixel gradient, then local motion is reduced by the method based on block (Li, Shengrong and Zhiming, 2007). In the article of Lei, Shiwei, Yaquin, Ning and Liang (2008), Laplacian of Gaussian edge detection method is applied to feature extraction and the method based on local motion is rejected by residual histogram.

Motion estimation methods based on differential techniques proved to be very useful in the context of video analysis, but have a limited employment in classical video compression because, though accurate, the dense motion vector field they produce requires too much coding resource and computational effort (Cagnazzo, Maugey, Popescu, 2009).

(14)

1.4 Comparison of Motion Measurement Techniques

1.4.1 Techniques Based on Spatio-Temporal Differentials

These techniques are based on the assumption that the intensity variation across a field is a linear function of displacement. This is equivalent to assuming that the displacement to be measured is small compared to the wavelength of the highest image frequency component present.

The luminance difference between corresponding pixels in successive frames is calculated and summed over a block. The difference between adjacent pixels is also summed, in both the horizontal and vertical directions. The ratio between the frame difference and the horizontal and vertical element differences gives the horizontal and vertical shifts, respectively, in units of pixels per frame.

Although such techniques work well for sub-pixel shifts, they fail for larger movements. It is possible to apply such methods recursively, by displacing the latest input picture by an amount corresponding to the estimated shift based on previous measurements. This can help the measurement converge to the correct value, although convergence can be slow and in some cases does not occur at all. Some recursive techniques update the displacement estimate on pixel-by-pixel basis, and some reset this ‘running estimate’ at what is thought to be an object edge.

Even when these refinements are incorporated, spatio-temporal differential techniques tend not to work particularly well. They have the advantage of being relatively simple to implement, although once a number of recursive refinements have been included, the complexity can increase substantially. They are prone to failing in areas containing significant movement and this makes them unsuitable for many applications, since accurate motion measurement is most important in those areas (Thomas,Hons,1987).

(15)

1.4.2 Techniques Based on Matching

This class of techniques works by dividing the picture into small blocks and summing the means square difference (or similar function) between each pixel of corresponding blocks in adjacent fields. This calculation is performed with several different spatial offsets between the blocks and the offset that gives the minimum error is taken as the motion vector for that block. The way in which trial offsets are chosen varies from method to method.

Alhough this class of techniques can generate a large number of motion vectors, it is not always an advantage. In some implementations, it may be found to be useful to limit the vectors actually used to those that occur most frequently in the picture. This reduces the number of incorrect vector assignments that occures (Thomas,Hons,1987).

1.4.3 Fourier Techniques

This class of techniques has been used in the past for image registration problems. In this application, the technique involves correlating two images by first performing a two-dimensional Fourier transform on each image, multiplying together corresponding frequency components, and perfoming a reverse Fourier transform on the resulting array. The result is an array of numbers (a ‘correlation surface’) which will have a peak at the coordinates corresponding to the shift between the two frames.

Not only does the use of Fourier transforms reduce the amount of calculation required compared to performing a correlation in the spatial domain, but it also enables filtering to be performed on the correlation surface. In particular, the sharpness of the peak can be significantly increased by normalizing the amplitude of each frequency component prior to performing the reverse transform. This technique is referred to as phase correlation.

(16)

This class of techniques can measure very large shifts (many tents of pixels) to an accuracy better than a tenth of a pixel, by interpolating the correlation surface. However, as it stands, the method is only capable of measuring global motion, any slight rotation of the picture can reduce the height of the correlation peak significantly.

Although it is possible to obtain even greater accuracy by using other image registration algorithms, these are generally not as robust and don’t have as good noise immunity as correlation based techniques (Thomas,Hons,1987).

1.4.4 Techniques Using Feature Extraction

This class of techniques is often applied to problems such as the determination of the three-dimensional structure of a scene from a number of photographs taken from different locations. The basis of these methods is to identify particular features in the scene (often edges or corners of the objects), and follow the movement of these features from one picture to the next. This provides motion information at various points in the picture, and an interpolation process is used to assign motion vectors to the remaining picture areas.

This class of techniques is useful for specialized scene analysis tasks , but is not often used to measure motion in more general scenes. As these techniques rely on the extraction of particular features from the scene (such as edges), they can fail to measure the correct velocity in picture areas that do not contain such features. They have been applied to more general scenes, but this failing is apparent in the results presented. An example of the type of picture material that would probably cause such techniques to fail is a moving area containing fine detail, such as a horizontal camera pan accross grass (Thomas,Hons,1987).

(17)

1.5 Thesis Scope

In this thesis, four Frequency Domain Techniques which are Phase Correlation, Gradient Correlation, Statistically Robust Correlation and Robust Correlation are concentrated. Their performance and implementation differences are compared and tabulated. Although the most common method of all is Phase Correlation, other methods are also used and give satisfactory results. To be able to compare the techniques’ computational cost and performance difference, all the source codes were implemented in MATLAB environment and results are obtained for 8 sample images in noiseless, 14dB noise and 20 dB noise cases. The resulting motion vectors are superimposed on images for both noiseless and 20dB noisy case in order to visualize the results.

1.6 Outline

This thesis is presented in six chapters. Chapter 1 presents this introduction. Chapter 2 introduces fundamentals of motion estimation. Chapter 3 shows frequency domain techniques. Chapter 4 presents the implementation. Chapter 5 presents experiments and results. Chapter 6 presents the conclusion.

(18)

CHAPTER TWO

MOTION ESTIMATION FUNDAMENTALS

2.1 Linear Transformations

In an digital image coordinate system, the origin is generally at the upper left corner of the image as shown in Fig 2.1. Notation for the horizontal and vertical axes adopted may or may not be the same as that for matrix indices which denotes vertical axis by the first index and horizontal axis by the second. Also, units are integer increments of grid indices.

Figure 2.1 Image coordinate system.

A two-dimensional (2D) geometric transformation mathematically maps points from one 2D space to another one. A point with coordinates (x,y) is mapped to coordinates (x′,y′) in the transformed spaced. The 2D linear transformations have several transform parameters, commonly including a scaling, a rotation and a translation in 2D space.

2.1.1 Scaling

Scaling a coordinate means multiplying each of its components by a scalar. Uniform scaling means this scalar is the same for all components (Fig.2.2) whereas in non-uniform scaling, different scalars per component are used (Fig.2.3).

(19)

Figure 2.2 Uniform scaling.

Figure 2.3 Non-uniform scaling.

Scaling operation is:

ax x′= Equation 2.1 by y′= Equation 2.2 or in matrix form:             =       ′ ′ y x b a y x 0 0 Equation 2.3

(20)

2.1.2 Rotation

Rotation means moving the coordinates in a circular motion as shown in Fig.2.4.

Figure 2.4 Rotation.

Rotation operation is expressed by

θ θ sin cos y x x′= − Equation 2.4 θ θ cos sin y x y′= + Equation 2.5 or in matrix form:             − =       ′ ′ y x y x θ θ θ θ cos sin sin cos Equation 2.6

Even though sinθ and cosθ are nonlinear functions of θ, x’and y’ are a linear combination of x and y.

2.1.3 Translation

The most basic transformation is the translation. The formal definition of a translation is that "every point of an image is moved by the same distance (t_x,t_y) in the same direction to form the transformed image."

(21)

Figure 2.5 Translation.

Translation operation is given by

      + + =       ′ ′ y x t y t x y x Equation 2.7

Conventionally, a geometric transformation can consist of the combination of translation, rotation, and uniform scale transformation, therefore, also called rotation-scale-translation (RST) transformation. Two functions denoted by f and g are taken, representing a gray-level image defined, which are related by a four-parameter geometric transformation that maps each point (xg, yg) in g to a corresponding point

(xf, yf ) in f according to the following matrix equation

                    − =           1 1 0 0 cos sin sin cos 1 f f y x g g y x t t y x θ α θ α θ α θ α Equation 2.8

Equivalently, for any pixel (x, y) it is true that:

) ) cos sin ( , ) sin cos ( ( ) , (x y f x y x x y y g = α β + β −∆ α − β + β −∆ Equation 2.9

where (tx,ty)_{are translations, α is the uniform scale factor, and θ is the rotation}

(22)

2.2 Image Noise

Images may contain some visual noise which reduces image quality and is especially significant when the objects being imaged are small and have relatively low contrast. This variation is usually random and has no particular pattern. The presence of noise gives an image a mottled, grainy, textured, or snowy appearance and the most significant factor is that noise can cover and reduce the visibility of certain features within the image. The loss of visibility is especially significant for low-contrast objects.

Noise having Gaussian-like distribution is very often encountered in acquired data. Gaussian noise is the noise whose probability distribution function has a bell shaped curve. Gaussian noise is characterized by adding to each image pixel a value from a zero-mean Gaussian distribution (Vijaykumar, Vanathi, Kanagasabapathy, 2009). Other common noises are salt and pepper noise, quantization noise and speckle noise in coherent light situations.

Another important phenomenon is white noise. It is a random signal with a flat power spectral density. In other words, the signal contains equal power within a fixed bandwidth at any center frequency. The noise throughout this thesis is modeled as Additive White Gaussian Noise (AWGN), where all the image pixels deviate from their original values following the Gaussian curve.

Let f(x,y) denote an image. The image is composed of a desired noiseless image )

, (x y

g and a noise component, q(x,y)in additive form

) , ( ) , ( ) , (x y g x y q x y f = + Equation 2.10

(23)

There are two important parameters in noise analysis: variance and standard deviation. Variance describes how far noise values lie from the mean (µ) and is formulated by:

(

)

1 * ) , ( 1 0 1 0 2 − − =

∑∑

− = − = N M y x f M y N x µ σ Equation 2.11 where N M y x f M y N x * ) , ( 1 0 1 0

∑∑

− = − = = µ Equation 2.12

Standard deviation (σ) shows how much variation there is from the mean value. And it is calculated by the squared root of variance:

2 σ

σ = Equation 2.13

2.3 Image Similarity Measures

An image similarity measure quantifies the degree of similarity between intensity patterns in two images. The choice of an image similarity measure depends on the modality of the images. Common examples of image similarity measures include Sum of Absolute Differences (SAD) and Normalized Cross Correlation (NCC).

Let f(x,y) and g(x,y) denote two images. The similarity measures SAD and NCC are computed for these images by the following equations.

(

)

∑∑

− = − = − = 1 0 1 0 ) , ( ) , ( M y N x y x g y x f abs SAD Equation 2.14

(24)

∑

∑∑

− = − = − = − = − = − = = 1 0 1 0 1 0 2 1 0 2 1 0 1 0 ) , ( . ) , ( ) , ( ). , ( M y M y N x N x M y N x y x g y x f y x g y x f NCC Equation 2.15

Since SAD method depends on the difference of two images, it is effected by increasing noise strength more than NCC is.

2.4 Parameters of Motion Estimation

2.4.1 Robustness and Insensitivity to Luminance Variations

Statistical inferences are based only in part upon the observations. An equally important base is formed by prior assumptions about the underlying situation. Even in the simpliest cases, there are explicit or implicit assumptions about randomness and independence, about distributional models, perhaps prior distributions for some unknown parameters, and so on.

These assumptions are not supposed to be exactly true. As in every other branch of applied mathematics, such rationalizations or simplifications are vital, and one justifies their use by appealing to a vague continuity or stability principle: a minor error in the mathematical model should cause only a small error in the final conclusions.

Unfortunately, this does not always hold. Some of the most common statistical procedures are excessively sensitive to seemingly minor deviations from the assumptions, and a plethora of alternative ‘robust’ procedures have been proposed. So, robustness signifies insensitivity to small deviations from the assumptions. Note that “small” may imply small deviations for all data (e.g. Gaussian noise) or large deviations for a small quantity of data (outliers). In relation to image registration, robustness implies correct registration in the presence of effects such as: noise,

(25)

changes on luminance, occlusion, revealed regions, new objects; in general any effect that may cause deviation from a perfect match.

2.4.2 Accuracy

Accuracy of the displacement tries to satisfy the question: ‘Does the estimated field correspond to actual motion?’. So, the aim of accuracy is to determine motion direction(s) and magnitude correctly.

For example, the accuracy of displacement estimates is very important in applications such as motion compensated frame interpolation. The visual quality of the resulting frames may drastically degrade when structures from one frame to another may not correspond to each other in the sense of motion. On the other hand in data compression utilizing motion-compensated prediction, for example, the degradation in image quality due to inaccurate displacement estimates may not be that drastic.

2.4.3 Speed and Computational Efficiency

Keeping the speed of motion estimation as fast as possible and minimum achievable computing power as low as possible is two of the main concerns of motion estimation. As the algoritm gets complicated, the speed of the algoritm starts to decrease as well as computational efficiency.

(26)

CHAPTER THREE

FREQUENCY DOMAIN TECHNIQUES

3.1 Fourier Transform

The motivation for the Fourier transform comes from the study of Fourier series. In the study of Fourier series, complicated periodic functions are written as the sum of simple waves mathematically represented by sines and cosines. Due to the properties of sine and cosine it is possible to recover the amount of each wave in the sum by an integral. In many cases it is desirable to use Euler's formula, which states that e2πiθ = cos (2πθ) + i sin (2πθ), to write Fourier series in terms of the basic waves e2πiθ. This has the advantage of simplifying many of the formulas involved and providing a formulation for Fourier series.

The Fourier Series showed us how to rewrite any periodic function into a sum of sinusoids. The Fourier Transform is the extension of this idea to non-periodic functions. Any sufficiently regular periodic continuous-time signal x(t) can be expanded in a complex exponential Fourier series:

∑

∞ −∞ = = k t jk ke c t x₍ ₎ ω0 _{Equation 3.1}

where ω0 = 2π/T is the fundamental frequency, and the Fourier coefficients {ck}

are given by:

dt e t x T c jk t t T k 0 2 / 2 / ) ( 1 − _ω −

∫

= Equation 3.2 17

(27)

The Fourier coefficients {ck} tell about the frequency content (or spectral content)

of x(t). Any continuous-time signal x(t) that has finite “energy” :

∞ + 〈

∫

∞ ∞ − dt t x2() Equation 3.3

can be represented in the frequency domain via the Fourier transform:

∫

∞ ∞ − − = xt e dt X(ω) () jωt Equation 3.4

The signal x(t) can be recovered from its Fourier transform using the inverse Fourier transform formula:

∫

∞ ∞ − = ω ω π ω d e X t x ( ) j t 2 1 ) ( Equation 3.5

Looking at signals in the Fourier domain allows one to understand the frequency response of a system and also to design systems with a particular frequency response, such as filtering out high frequency signals.

3.2 Discrete Fourier Transform

The Discrete Fourier Transform (DFT) is derived from the normal definition of the Fourier Transform of a series. If a time series x[n] with n running from (0, …, N-1) is taken, then such a series can be considered 'embedded' in an infinite series running from (0, …, ∞) just by padding it with zeros, and the normal Fourier Transform of a unilateral series can be taken. The result is a continuous function of the 'angular frequency ', called the 'continuous spectrum of the series' and we have replaced a finite series with a continuous function on the unit circle. Taking N samples of that complex function at regular intervals along the unit circle of the

(28)

complex plane converts it again to a vector. A real sequence now becomes complex, some additional reduction is possible

DFT requires an input function that is discrete and whose non-zero values have a limited (finite) duration. Such inputs are often created by sampling a continuous function. Unlike the discrete-time Fourier transform (DTFT), it only evaluates enough frequency components to reconstruct the finite segment that was analyzed. Using the DFT implies that the finite segment that is analyzed is one period of an infinitely extended periodic signal; if this is not actually true, a window function has to be used to reduce the artifacts in the spectrum. For the same reason, the inverse DFT cannot reproduce the entire time domain, unless the input happens to be periodic (forever). Therefore it is often said that the DFT is a transform for Fourier analysis of finite-domain discrete-time functions.

Because sinusoids repeat periodically, the Fourier Transforms for periodic functions tend to be simpler than those for non-periodic ones.

For the one-dimensional case: If a function with limited domain N units wide is taken and caused it to repeat every N units, the spectrum of the periodic function must only have components with frequencies 1/N, 2/N, ... All sinusoids that make up the Fourier Transform of this periodic image must resynchronize at the edges of the image in order to begin again. Instead of continuous frequency space, a discrete one is generated. The Fourier Transform thus becomes,

∑

− = − = 1 0 / 2 ] [ 1 ) ( N n N sn i e n f N s F π Equation 3.6

And Inverse Discrete Fourier Transform becomes:

∑

− = = 1 0 / 2 ) ( ] [ N s N sn i e s F n f π Equation 3.7

(29)

The n in the exponent from the continuous transform becomes n/N in the discrete transform so that it ranges from 0 to 1 as one goes across the domain of the function. Equation 3.6 is called the Discrete Fourier Transform and Equation 3.7 is corresponding inverse. As long as periodic functions are worked, it’s given up nothing by moving from a continuous Fourier Transform to a discrete one. The discrete Fourier Transform is the continous Fourier Transform for a periodic function.

3.3 Fast Fourier Transform

Fast Fourier Transform (FFT) is an efficient algorithm to compute the Discrete Fourier Transform (DFT) and its inverse. A DFT decomposes a sequence of values into components of different frequencies. This operation is useful in many fields but computing it directly from the definition is often too slow to be practical. An FFT is a way to compute the same result more quickly: computing a DFT of N points in the naive way, using the definition, takes (N2) arithmetical operations, while an FFT can compute the same result in only (N log N) operations. The difference in speed can be substantial, especially for long data sets where N may be in the thousands or millions in practice, the computation time can be reduced by several orders of magnitude in such cases, and the improvement is roughly proportional to N/log(N).

An FFT computes the DFT and produces exactly the same result as evaluating the DFT definition directly; the only difference is that an FFT is much faster. (In the presence of round-off error, many FFT algorithms are also much more accurate than evaluating the DFT definition directly).

If we let,

N i N e

(30)

the by substituting N=2M, Discrete Fourier Transform can be written as:

∑

− = = 1 2 0 2 ) ( 2 1 ) ( M x sx M W x f M s F Equation 3.9

Seperating out the M odd terms and the M even terms:

      + + =

∑

− = + − = 1 0 ) 1 2 ( 2 1 0 ) 2 ( 2 (2 1) 1 ) 2 ( 1 2 1 ) ( M x x s M M x x s M f x W M W x f M s F Equation 3.10 Since (W2M)2s=(WM)s and (W2M)2s+1=(WM)s(W2M)s       + + =

∑

− = − = 1 0 2 1 0 ) 1 2 ( 1 ) 2 ( 1 2 1 ) ( M x s M sx M M x sx M f x W W M W x f M s F Equation 3.11

This is the Fourier Transform of the even terms (Feven(s)) plus a constant (W2M)s

times the Fourier Transform of the odd terms (Fodd(s)). Simplified, this means that

the first M terms of the Fourier Transform of 2M items can be computed by:

{

s

}

M odd even s F sW F s F ( ) ( ) ₂ 2 1 ) ( = + Equation 3.12

Similarly, the last M terms can be computed by:

{

s

}

M odd even s F sW F s F ( ) ( ) ₂ 2 1 ) ( = − Equation 3.13

This means that an N-point transform can be computed by separating the odd and even elements of the original function, computing their individual N=2 element DFTs, and then combining them using Equation 3.12 and Equation 3.13.

If N is a power of two, this process can be repeated recursively. Eventually, two one-point transforms are taken, each of which is its own transform. These are

(31)

combined, the results are combined, those results recombined, etc. until the complete transform is computed.

3.4 Frequency Domain Techniques

The category of frequency domain motion estimation encompasses all techniques that utilise a frequency domain representation of an image sequence. The principle of motion estimation method in the frequency domain is to calculate the local spectrum on each pixel of an image sequence and then estimate the parameters of each motion plan, which are velocities (Pingault, Pellerin, 2003).

Frequency domain techniques can give accurate motion estimates, however, they tend to be computationally expensive since a large array of filters is required to sample the frequency domain properties of an image sequence (Kolodko, Vlacic, 2005). The existing frequency domain motion estimation techniques are mainly used for the global motion estimation. They can also be applied to local motion estimation as block based approaches (Essannouni, Thami, Aboutajdine, Salam, 2007).

3.4.1 Phase Correlation Technique

The most known and popular frequency technique is the phase correlation method (Essanouni, Hadi, Thami, Aboutajdine, Salam, 2006). Phase correlation technique assumes that there is a pure translation within the image or block. The cross-correlation function of two images with a translational difference is an array of numbers (a correlation surface) which will have a peak at the coordinates corresponding to the shift betwen the two pictures. When correlation is performed using Fourier transforms, the sharpness of the peak can be significantly increased, as Fig.3.1 shows, by normalizing the amplitude of each frequency component prior to performing the reverse transform. This correlation process is known as ‘a phase correlation’ since the normalizing process results in only the phase information being used.

(32)

Figure 3.1 Peak construction in phase correlation.

As all positional information is contained in the phase of the spatial frequencies making up an image, this technique isolates the required information and is not confused by brightness or contrast changes in the scene. It also has a good noise immunity. The method is brightness enduring because, increasing the brightness means adding a DC component to the signal (or image) and this dc component only effect the DC component of Fourier coefficients. Similarly, it is contrast enduring, because, increasing contrast increases maximum and minimum values of the signal (or image). This effects all the components in Fourier transform, but since they are normalized by the magnitude of themselves, only phase information remains.

As the phase correlation method is performed in the frequency domain using a FFT algorithm, it is more computationally efficient than the block matching method. In the case of global movement, the correlation surface consists of a sharp peak situated at coordinates corresponding to the displacement.

This type of technique is capable of measuring very large shifts. However, the method is only capable of measuring global motion and any slight rotation of the picture can reduce the height of the correlation peak significantly. To comprise the local motion, further investigations should be conducted on the images.

(33)

An ideal vector measurement method would have the accuracy of Fourier technique coupled with the ability of block matching algoritms to measure the motion of many separate objects in a scene. Such a method should also be able to assign vectors to individual pixels if required.

3.4.1.1 Stages of Phase Correlation Technique

In the first stage of the vector measurement process, the input picture would be divided into blocks. For correctly estimating the cross correlation of corresponding blocks, the blocks must be extended to twice the block size centered around the formerly defined blocks to calculate phase correlation. Subsequently, a two dimensional raised cosine window is applied to each extended block to put more weight on the formerly defined region, to which a motion vector will be assigned. Then, a phase correlation would be performed between corresponding blocks in successive pictures, resulting in a number of correlation surfaces describing the movement present in different areas of the picture.

The highest peak in the phase correlation map usually corresponds to the best match between frames. However for the given block, due to several moving objects in the blocks due to different displacements or noise, several peaks can be appearing in the correlation map. In this case, several candidates will be selected instead of the one with the highest peak, and then deciding which peak best represents the displacement vector for the block (Yang, Jagannathan, Bohannan, 2005). So, each correlation surface would be searched to locate not one, but several dominant peaks resulting from the motion of objects within each block. Thus, using this approach, several motion vectors could be measured by each correlation process. The result of this stage of the process would be a list of motion vectors likely to be present in the picture, on an area-by-area basis. The correlation surface could be interpolated to provide sub-pixel accuracy.

(34)

The second stage of the process would involve taking the list of possible vectors measured in the first stage, and assigning them to appropriate areas of the picture. The assignment would be done by shifting the input picture by each vector in turn relative to the previous picture, and calculating the match error in each pixel. This process would produce an error surface for each trial vector that would indicate how well the vector fitted all parts of the picture. The vector giving the smallest match error would be assigned to each area.

So the method is similar to the block matching algorithms except that the number of trial displacements is limited to those measured in the phase correlation process. This allows the number of trial vectors to be kept to a minimum while still enabling large displacements to be measured accurately. Also, it is no longer necessary to assign vectors on a block-by-block basis; the area of the picture used to determine if a vector fits or not can be smaller because the number of trial vectors is limited, making it easier to distinguish between them.

The technique is likely to perform better with translational movement than it is with zoom and rotational movement. These types of movement produce a continuous range of velocity vectors, only one of which would be measured per measuring block. This may provide adequate if there are enough measuring blocks in the picture. However, the number of measuring blocks that can be used is limited by the minimum size of each block. The dimensions of a measuring block must be at least twice the size of the largest movement expected in each dimension, in order that there is a large amount of overlap between picture material in corresponding blocks in successive pictures.

Figure 3.2 Phase correlation procedure. Divide into blocks Extend the blocks and apply weighting window Estimate motion by phase correlation Shift input picture and calculate match error

(35)

3.4.1.2 Steps of Phase Correlation Technique

Let’s take ft and ft+1 as consecutive frames with identical dimensions and assume

that there exist pure translational relationship between them. The technique involves correlating consecutive two images. As illustrated in Fig. 3.2, the steps are:

• Divide the consecutive images into square blocks (16×16, 32×32 or 64×64), extend them twice the block size centered around the formerly defined blocks, and apply weighting window.

Equation 3.14

• Perform a two dimensional Fourier transform on each consecutive block.

Equation 3.15

• Calculate the cross spectrum.

Equation 3.16

• Normalize the result. The result is called normalized or whitened cross spectrum.

Equation 3.17

• Perform a reverse Fourier transform on the resulting array. The result is the phase correlation of ft and ft+1.

Equation 3.18 ) , ( ₀ ₀ 1 f x x y y f_t+ = _t − − 1 1 + + ⇒ ⇒ t t t t F f F f 1 1 , + ∗ + = t t t t F F C 1 1 1 , ( , ) + ∗ + ∗ + = ′ t t t t t t F F F F l k c

(

)

,

(

)

,

(

0 0 ) ( 2 1 1 1 1 1 , 0 0

y

x

e

F

y

x

P

ly kx j t t t t t t

−

=













=

+ − + ∗ + ∗ − +

δ

π

F

(36)

The coordinates (x0, y0) of the maximum of the real valued array Pt,t+1 can be used

as an estimate of the horizontal and vertical components of motion between ft and ft+1

as follows:

Equation 3.19

In fact, each correlation surface would be searched to locate not one but several dominant peaks resulting from the motion of objects within each block.

The method mentioned so far can only yield integer-precision motion estimates. A key performance issue in motion estimation is sub-pixel accuracy. It is self-evident that actual scene motion has arbitrary accuracy and is oblivious to the pixel grid structure resulting from spatial sampling at the image acquisition stage i.e. by CCD arrays or other A/D post-acquisition operations. Theoretical and experimental analyses have established that sub-pixel accuracy has a significant impact on motion compensated prediction error performance for a wide range of natural moving scenes. As a consequence, recent standardisation efforts in video compression have embraced the principle of sub-pixel accuracy for motion estimation and motion compensated prediction. The most popular techniques for subpixel image registration are based on interpolation. To achieve sub-pixel accuracy a number of methods have been proposed. The maximum peak of the phase correlation surface and its neighbouring values on either side, vertically and horizontally, are taken (Argyriou, Vlachos, 2006). Equation 3.20 Equation 3.21

{

( , )

}

Re max arg ) , (x₀ y₀ = P_t_,_t₊₁ x y

{

}

{

( , 1), ( , ), ( , 1)

}

) , 1 ( ), , ( ), , 1 ( 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , + − + − + + + + + + y x P y x P y x P and y x P y x P y x P t t t t t t t t t t t t

(37)

The location of the maximum of the fitted function provides the required sub-pixel motion estimate (x0+ dx, y0+dy) where (dx, dy) is computed by

Equation 3.22

Equation 3.23

3.4.1.3 Properties of Phase Correlation Technique

The most remarkable property of the phase correlation method compared to the classical cross correlation method is the accuracy by which the peak of the correlation function can be detected. The phase correlation method provides a distinct sharp peak at the point of registration whereas the standard cross correlation yields several broad peaks and a main peak whose maximum is not always exactly centered at the right point.

A second important property is due to whitening of the signals by normalization, which makes the phase correlation notably robust to those types of noise that are correlated to the image function, e.g., uniform variations of illumination, offsets in average intensity, and fixed gain errors due to calibration. This property also makes phase correlation suitable for registration across different spectral bands.

Using the convolution theorem, it can be shown that the method can also handle blurred images, provided that the blurring kernel is relatively invariant from one frame to another. One may for instance use this property to register images contaminated with wide-band additive noise, by taking the phase correlation in the low-frequency portion of the spectrum (Froosh(Shekarforoush), Zerubia, Berthod, 2002).

Also, the phase correlation always contains a single coherent peak at the point of registration corresponding to signal power, and some incoherent peaks which can be

)) 1 , ( ) 1 , ( ) , ( ( 2 ) 1 , ( ) 1 , ( )) , 1 ( ) , 1 ( ) , ( ( 2 ) , 1 ( ) , 1 ( 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , 0 0 1 , − − + − − − + = − − + − − − + = + + + + + + + + + + y x P y x P y x P y x P y x P dy y x P y x P y x P y x P y x P dx t t t t t t t t t t t t t t t t t t t t

(38)

assumed to be distributed normally over a mean value of zero. The amplitude of the coherent peak is a direct measure of the degree of congruence between the two images. More precisely, the power in the coherent peak corresponds to the percentage of overlapping areas, while the power in incoherent peaks correspond to the percentage of nonoverlapping areas (Froosh(Shekarforoush), Zerubia, Berthod, 2002).

3.4.1.4 Undesirable Effects of Phase Correlation Technique

The first undesirable effect is the boundary effect. To obtain a perfect impulse, the shift in the spatial domain has to be cyclic. Since things appearing at one end of the block (window) generally do no appear at the other end, the impulse degenerate into a peak. Further, since the 2D DFT assumes periodicity in both directions, discontinuities from left to right boundaries, and from top to bottom, may introduce spurious peaks. It is well known that the boundary effects due to the finiteness of the image (block) frame become less relevant if the image function has small values near the frame boundaries. Therefore, the rectangular window representing the framing process, may be substituted by a weighting window w(x,y) that produces the decay of the image function values near the boundaries.

The second undesirable effect is spectral leakage. In order to observe a perfect impulse, the components of the displacement vector must correspond to an integer multiple of the fundamental frequency. Otherwise, the impulse degenerates into a peak due to the well known spectral leakage phenomenom. Thus, if we assume that the peak values are normally distributed around its maximum, then the actual maximum would be the mean of this distribution.

The third undesirable effect is the range of displacement estimates. Since the 2D DFT is periodic with the block size (N,M), only displacements (x0, y0) can be

detected if they satisfy that –N/2 ≤ x0 ≤ N/2 and –M/2 ≤ y0 ≤ M/2 due to the wrapping

(39)

range (-d, d) along a spatial direction, the block size has to be theoretically at least of 2d size in this spatial direction.

3.4.2 Gradient Correlation in the Frequency Domain

The use of gradient information for motion estimation is a well established concept originating in very early work on image registration and currently featuring in a number of popular algorithms of the non-matching variety such as those that involve the computation of optic flow. This is well justified on the grounds that gradients emphasize precisely those salient image features, such as dominant transitions, that provide useful reference points for high-accuracy motion estimation (Argriou, Vlachos, 2003).

Gradient-based methods has the natural advantage of good feature selection (Tzimiropoulos, Stathaki, 2009). Spatial gradient isolates precisely those salient image features, such as dominant transitions, that provide good measurement reference points (Argriou, Vlachos, 2003) whereas frequency based methods has speed and computational efficiency. Gradient correlation in the frequency domain scheme combines the advantages associated with gradient-based methods with the speed and computational efficiency that typify frequency-domain processing, owing to the use of fast algorithms. The method can be implemented by fast transformation algorithms and hence enjoys the advantages of computational efficiency while, at the same time, it takes full account of gradient strength and orientation which ensures the selection of salient and reliable image features.

It is common ground that the computation of a spatial gradient of a discrete signal can only provide an approximation to the ideal differentiation operator whose frequency response is of the form g(f) = j2πf for |f| < fs / 2 where fs is the sampling

frequency. In digital image processing common approximations rely on the use of forward or central pixel differencing. Elementary filter design also suggests that the addition of more terms can provide a better approximation (Table 3.1). More sophisticated discrete approximations to the gradient are possible by using a filter

(40)

optimisation approach that favours a better spectral match at lower frequencies. This is intuitively plausible given that a significant proportion of a typical image spectrum is clustered in a lower frequency range and decreases at a rate of 1/f (Argriou, Vlachos, 2005).

Table 3.1 Coefficients of central difference estimators up to third order.

Order c-3 c-2 c-1 c0 c1 c2 c3

1 -1 0 1

2 1/12 -2/3 0 2/3 -1/12

3 -1/60 3/20 -3/4 0 3/4 -3/20 1/60

For the sake of simplicity, central differencing whose frequency response is of the form gc (f) = jfs sin(2πf / fs) will be taken. This is an intuitively appealing choice

because it exhibits bandpass spectral selection properties which are well suited for motion measurement purposes of images of natural scenes. Indeed very low spatial frequencies (i.e. DC) do not provide any reference points while very high spatial frequencies typically contain noise and are aliased. This should be contrasted to conventional correlation where image spectra are left intact and, more importantly, to phase correlation where the spectra are pre-whitened.

3.4.2.1 Steps of Gradient Correlation in the Frequency Domain

At each pixel location of a given frame ft(x,y) (except the boundary pixels)

discrete approximations to the horizontal and vertical gradients are estimated using central differencing. 1st order central difference estimators are used in the following formula. But by using the coefficients in Table 3.1, the order of the central difference estimators could be increased to get better results.

) 1 , ( ) 1 , ( ) , ( ) , 1 ( ) , 1 ( ) , ( − − + = − − + = y x f y x f y x g y x f y x f y x g t t v t t t h t Equation 3.24

(41)

Then two terms above are combined: ) , ( ) , ( ) , (x y g x y jg x y g_t = _th + _tv Equation 3.25

The above formula, retains magnitude and orientation information at each pixel location.

For pairs of consecutive frames ft and ft+1 discrete gradients gt and gt+1 are

respectively computed. The identification of motion relies on the detection of the maximum of the cross-correlation function ct,t+1 between gt and gt+1.

(

1

)

* 1 1 ,t+ = − t t+ t G G c F Equation 3.26 where gt => Gt gt+1 => Gt+1

After computing ct,t+1, location of maximum is determined:

Equation 3.27

Sub-pixel accuracy can be obtained in the same manner in phase correlation method.

3.4.3 Statistically Robust Correlation

Statistically Robust Correlation is an optimal and robust correlation technique for the local motion estimation purposes. It is based on the maximization of a statistical robust matching function, which is computed in the frequency domain and therefore can be implemented by fast transformation algorithms. The method achieves a

{

( , )

}

Re max arg ) , (x₀ y₀ = c_t_,_t₊₁ x y

(42)

significant speed up and robustness over the full search block-matching algorithm (Essannouni, Thami, Salam, Aboutajdine, 2006).

The problem treated in block based motion estimation is to define for a given block g whose size is B × B and a search area f whose size is w×h, the position (mvx,mvy) with the minimum block distortion measure (BDM) among all possible

search positions in the search area f.

∑∑

− = − = − + + = 1 0 1 0 )) , ( ) , ( ( ) , ( B l B k l k g l y k x f y x BDM ρ Equation 3.28

where ρ is a symmetric, positive-definite function with a unique minimum at zero. The influence function ρ’ (x) is defined as a derivative of the kernel ρ(x). The influence function measures the influence of a datum on the value of the parameter estimate. For example, for the case of ρ(x) = x2, the influence function is ρ’ (x) = 2x, that is, the influence of a datum on the estimate increases linearly with the size of its error, which confirms the non-robustness of the sum square difference (SSD) to outliers.

The M-estimators try to reduce the effect of outliers by replacing the SSD metric with a less rapidly increasing loss function of the data value. Andrews’ Wave M-Estimator is a common robust estimator that satisfies the criteria for having an outlier process. Andrews proposed an influence function (derivative of kernel) as:

    ≤ = ′ elsewhere x for x x 0 1 ) sin( ) ( π π ρ Equation 3.29

(43)

where x is the quantity to be minimized. Thus Andrews kernel function is of the form:       ≤ − = elsewhere x for x x 2 2 1 1 )) cos( 1 ( 1 ) ( π π π ρ Equation 3.30

Since the motion estimation process are generally taken on luminance channel. Then the values taken by f and g are in general in the range of [0..255]. So if we denote:       + + − = 255 ) , ( ) , ( ) , ( , l k g y l x k f l k d_x_y π Equation 3.31

Then we can define a new matching criteria ”SCD” which is based on Andrews wave cosine as:

∑∑

− = − = = 1 0 1 0 , ( , )) cos( ) , ( B l B k y x k l d y x SCD Equation 3.32

The position of the motion vector (mvx,mvy) can be deduced from the

maximization of the SCD metric. Since the cosine function cos(x) can be approximated to 1− x2/ 2 for the values of x close to 0, then we can expect that the SCD metric returns about the same result as the SSD metric in the absence of noise and a better result than it in the presence of outliers (Essannouni, Thami, Salam, Aboutajdine, 2006).

(44)

3.4.3.1 Steps of Statistically Robust Correlation

The SCD function can be viewed as a cross correlation operation. Indeed, if we note:       =       = ) , ( 255 exp ) , ( ) , ( 255 exp ) , ( y x f i y x f y x g i y x g c c π π Equation 3.33

where i is the square root of (−1) and exp(ix) is the complex exponential. Using the Euler’s identity (cos(x) = R(exp(ix))), the SCD can be written as:

        + + ℜ =

∑∑

( , ) ( , ) ) , (x y g* k l f k x l y SCD _c _c Equation 3.34

From the Fourier correlation theorem, the SCD (Equation 3.32) surface can be computed using FFT algorithm as follows:

{

IFFT(F_c(u,v)G_c*(u,v))

}

ℜ Equation 3.35

where Gc is the FFT of gc and Fc is the FFT of fc, R denotes the real part of a

complex number and the asterisk denotes complex conjugation. The fc and gc are

correlated with FFTs by zero padding the size of gc to the size of fc prior to taking the

forward FFTs. There are two different reasons for zero padding. Firstly, for element-wise multiplication to occur, they must be of the same size, and secondly, to avoid cyclic correlation. For this reason, the last B−1 rows and B−1 columns of result will contain wrap-around data that should be discarded. Finally, the resulting motion vector (mvx,mvy) of the block g in the search area f is measured from the position of

(45)

Most computational cost is with the Fourier transform. The algorithm requires two forward and one backward complex FFTs of size (wh). Then the complexity of the proposed algorithm is about O((wh) log2(wh)) per block. For basic FFT algorithms,

the length of the sequence is usually chosen to be a power of two. However, modern FFTs, works on sequences of any length (Essannouni, Thami, Salam, Aboutajdine, 2006).

3.4.4 Robust Correlation

At the heart of this method lies a second-derivative operator that is commonly employed in edge-detection algorithms. When an image f ′(x,y) with the same contents as the image f(x,y) is acquired under some different illumination condition, the following relation between f ′(x,y) and f(x,y):

) , ( ) , ( ) , ( ) , (x y a x y f x y b x y f′ = + Equation 3.36

will hold in general, where a(x,y) counts for contrast changes and b(x,y) for brightness change. If the illumination condition change is not too serious, both a(x,y) and b(x,y) will be slow changed function compared with f(x,y). Thus:

2 2 2 2 ) , ( x f y x a x f ∂ ∂ = ∂ ′ ∂ Equation 3.37 2 2 2 2 ) , ( y f y x a y f ∂ ∂ = ∂ ′ ∂ Equation 3.38

The modular magnitude in the second derivative of f ′(x,y) will change but the angles will remain almost the same. Illumination invariance property is quite evident in case of offset and scale changes. Experiments show that this property remains even under more unfavorable change of the illumination condition.