View of Modified Geometrical Method For Visual Lip Extraction From Color Images

(1)

Research Article

Modified Geometrical Method For Visual Lip Extraction From Color Images

Ahmed khleef jheel

1

_{, Kadhim M. Hashim}

2

1_{Assistant Lecturer, University of Babylon} 2_{Prof, University of Babylon}

1_{ahmedk.sw.hdr@student,}2_{uobabylon.edu.iq}

Article History: Received: 10 January 2021; Revised: 12 February 2021; Accepted: 27 March 2021; Published

online: 20 April 2021

Abstract: Lip Reading or Speech recognition is a visual way of communicating. It depends on watching the speaker's

mouth; Especially his lips to understand the letters and words of the speaker. Through the visual image (the form of the expression of the mouth) and the movement of the lips, visual recognition of spoken letters, words, and sentences is made; Gestures and expressions on the face. the position of the mouth and its extracted features help to understand the visual speech more clearly. Researchers are always trying to find the new methods to increase the efficiency of the lip reading systems. for this reason, This paper presents a lip localization. based face detection &features extraction strategy to segment mouth region from input video frames. we have used RGB &HSV color model along with modified channel factor and then applying more edge detection techniques and morphological operations to be a new one of the visual speech recognition tools.

Keywords: face detection, Lip localization, visual feature extraction, HSV color model.

1.Introduction

In the last few years, face recognition is becoming one of the significant activity in the field of human-computer interaction. As the major part, lip are one of the Prominent features of face recognition. In order to get the features of lips, this chapter differentiate the face and segment it. and then gains the features of lips, it has the advantages of fast calculation and high accuracy rate.

With the era development, identity recognition technology is becoming a more and more important research topic. Face recognition is an important research in the area of identification, which plays an important role in the intelligent buildings, intelligent monitoring and other fields.

The face recognition remains a powerful tool due to many advantages such as his low costs, the missing of physical contact between user and biometric systems[1].

Face recognition is one of the most challenging biometric techniques when unfolding in unrestricted environments due to the complete difference embodied by facial images in the real world (this type of facial portrait is usually designated as faces in violence).Some of these variations include head poses, aging, occlusions, illumination conditions, and facial [2]expressions. Examples of these are shown in Figure (1-1).

Figure(1-1): Typical variations

found in faces in-the-wild.

(a) Head pose. (b) Age.

(c) Illumination. (d) Facial

expression. (e) Occlusion.

(2)

The system will allow the user to see what the subject has said on the screen. The result will be displayed as soon as the user finishes the sound/letter.

2. Related work:

Coinaiz et al [3] used HSV representation to highlight the red color which associated with the lips in the image. Later, the HSV color model is used [8] for lip detection and locate the its position in mouth region. The boundaries of lips are extracted using color and edge information using a Markov Random Field (MRF) framework based on the lip area segmented.

Eveno et al [4] proposed a new color mixture and (Chromatic transformation for lip segmentation. In this approach, a new method for conversion of the RGB color model and a chromatic map was performed to split the lips and facial skinunder variable lighting conditions

Later on Eveno et al introduced a novel method where the pseudo-hue [5] was applied for lip localization that has been included in an active contour framework. the results show significant improvement in terms of accuracy in lip modeling.

Liewet al [6] in 2003. In those approach, they used new transformation method to convert given color image into the CIE-Lab color model and CIE-LUV color model, and then they calculated a lip membership map using the fuzzy clustering algorithm. The ROI around the mouth can be determined from the face area after a few morphological filters on an original image.

Namrata in 2015[7].In this way it was suggested that the image be converted from RGB color space to YCbCr and then decomposed into its components (luminance, blue chrominance and red chrominance) The Cb/Crratio was used to distinguish the face area. After that face image converted to HSV color space. Then on the cropped lip image of the lips edge detection and morphological operations were applied.

VISUAL FEATURE EXTRACTION METHOD 3. Proposed Method

Feature extraction is preceded by an amount of preprocessing steps to be done in our method as shown in Figure (1-2). This include face detection followed by (region of mouth extraction). Then, the lips of the speaker are Respectively tracked in consecutive frames of recorded video. Following these steps, and given an instructive set of features, the visual module can proceed with feature extraction.

Figure(1-2) the general diagram of visual feature extraction

3.1 Pre-processing

In order to decrease the computational complexity and fast obtain the face and lip features, it is required to preprocess the input video before feeding them to the feature extractors. As shown in Fig(1-3). In our tests, We follow the pre-processing steps given by the following required operations before applying the projected system : a) Divided video into frames (frame rate 30 frames per second). On each frame all following steps of algorithm are implemented.

b) Converting the frames to a simple transformation which convert a color image (RGB)color model to HSV color model.

c) Extract HSV (hue, saturation ,value ) channels for all frames.

d) Adjust the value of the (hue, saturation, value) channels with (hue H=5 , saturation S=10, and value =10) to be more appropriate to the conditions of our project structure. Note that this set of values modifications is not fixed for all videos, but is approved by changing them according to the conditions of video that you are working on .

e) Recombine new hue, saturation and value channels f) Convert back to RGB color space.

Face

detection

ROI

extraction

Visual feature

extraction

Sequence

of frames

(3)

Algorithm(1): pre-processing Input:

Video // that contains only one frontal face its 'size must be smaller than frame size.

Output:

Sequence of frames after pre-processing

Begin

Step1:Divide the original video into sequence of 30 (RGB) frames Step2:Convert all frame from RGB to HSV color representation Step3:Extract HSV (hue, saturation , value ) channels for each

frame

Step4:Adjust the values(hue, saturation, value),in our work we select H=5, S=10 and V=10

Step5:Recombine new hue, saturation and value channels Step6:Re-convert HSV color model to RGB color model End Algorithm

Figure(1-3) Represent pre-processing algorithm

3.4. Face detection

To build the face descriptor, we first need to extract the face region. To do that. Elliptical representation (as a

semi-overlapping subdivision of the target)that incorporates both global and local target information in a single

model can be used depending on some geometrical feature such as the position of the centroid, the length of the axes and the rotation angle are part of the target state

A set of geometric features are extracted based on the distances from length the vertical and horizontal centroid axes .

where enhanced face frames can be clearly identified as the interior middle region from the face detection This are presentation is effective for a limited application, but it is necessarily effective on a universal target. We modify ellipse approximation as follow:

a- Create elliptical shape on sequenced frames that contains only one frontal face its 'size must be smaller than frame size.

b- Define ellipse shape and its coordinates that is mean a four-element vector that specifies the initial location of the ellipse in terms of a bounding rectangle. position has the form [xminymin width height]. See Fig( 1-4/d)

c-

d- Create a binary image ''mask" from the ROI This section describes how ellipse mask is produced. The ellipse mask is also called white region. The process consists of two phases. One phase is to find first pixel which value is 255 on horizontal (left to right and right to left) and vertical (top to down and down to top). And then, the pixel is 255 which it between first points (left to right and right to left, top to down and down to top ) . Finally, the horizontal result is and with vertical that it is called ellipse mask. The diagram of this operation presents in Fig. 5.returns a binary image, that is the same size as the input image with 1s inside the ROI object h and 0s everywhere else. The input image must be contained within the same axes as the ROI

e- burn elliptical binary mask in the original image see fig(1-4/f). f- Calculate major &minor axes for elapsed shape

g- Ellipsed portion cropped out.

Algorithm(2): face detection for each frame Input:

Sequence of frames

Output:

Sequence of Ellipsed face images

Begin

Loop: from 1 to the no. of the frames

Step1: Create drag gable elliptical shape on frame its position

smaller than frame size.

(4)

the original xlim and ylim ranges

Step2: Define ellipse shape and its coordinates // after

Specifies the initial location of the ellipse[x y width heigh]

Step3: Create a binary image ("mask") from the ROI face. Step4:Burn elliptical binary mask in the original image

//images multiplication (pixel by pixel)

Step5:Calculate major &minor axes for ellipsed shape Step6:Ellipsed portion cropped out

Next

End Algorithm

Figure(1-4)Represent face detection algorithm

3.5. Lip localization

The focus of this thesis is lip segmentation, so we just take the lip area as region of interest. In order to reduce redundant information, it is necessary to extract the mouth area. In previous studies, researchers have proposed a variety of methods to extract lip area [9, 10]. However, these methods still contain redundant information or have an adverse impact on the subsequent processing. In this paper, we choose a method which can segment the mouth area according to the general structure and proportion of the face [11]. The formula is as follows:

…..(1)

…..(2)

Where Wfaceand Hface are the width and height of the face and Wmouth and Hmouth are the width and height of the

mouth. From the formula, we can obtain the lip areas shown in Fig. (1-5). It explains that this method can get satisfactory and effective results in the simple background which is required in this paper.

Algorithm(3):Lip localization Input:

Sequenced of Ellipsed face image

Output:

Mouth region for each frame

Begin

For i= 1 to no. of frame

Step1: Divide the Ellipsed face image into four vertical stripes of

equal width equal to Wface /4. //Wfac =minor length of an ellipse. see figure()

Step2: Divide the Ellipsed face image into an equal three

horizontal stripes, of the height equal to Hface /3. // Hface= major length of an ellipse. see figure(1-6/e)

Step3: make across between the vertical and horizontal strips Step4:Specify 4- point which determine the last portion of the

ellipse that can fit conveniently into mouth region

Step5:From step(4), we form a rectangular shape that exactly

corresponds to the mouth region.

Step6: mouth region cropped out. endfor

End Algorithm

(5)

Figure(1-6) the block diagram of proposed algorithm

This algorithm was tested on a set of single images and provided a good result for those images or video frames as shown in figure (1-7). Which is an early step to extract visual features from ROI image. For the purpose of obtaining the most accurate results, Before features extraction step need to enhance the cropped image.

(6)

Figure(1-7) The Result of proposed algorithm on single image

4. Cropped Image Enhancement

Image enhancement is one of the most important techniques in image processing [16]. To understand and analyze the images, various image enhancement techniques are used. To enhance the cropped image , So the poor quality and low contrast images will be modified

4.1 noise reduction

Images quite often contain artifacts known as “noise”. it is denotes, unwanted voices arising in tests that which applicants are interviewed for singing, music or acting but the word hastily expanded to other domains, designating the presence of un-wanted randomly spread artifacts within any given domain.

In imaging domain, for instance, one of the frequently occurring noise-types is called (salt and pepper noise) Quite an intuitive name, as images affected by this type of noise look like as if salt and pepper particles were emerged over the interpret image (bright pixels on darker areas and dark pixels on brighter areas of the image).

The usual causes for this issue are hardware related (analog-to-digital conversion, bit errors in transmissions, etc.).

Which brings us to the median filtering: one among the foremost effective methods to get rid of such noise from images is to use the median filter.

4.1.1Median filter

Median filter is one of the nonlinear order-statistic filters due to its good achievement and usually have good behavior to reduce some limited noise types such as “Gaussian,” “random,” and “salt and pepper” from (an image and signal) with its properties to preserving edge [17]. The mechanism of median filter, the center pixel of a M × M neighborhood is replaced by the median value of the corresponding image portion. Using this idea median filter we can remove this type of noise problems [17].

The median of n observations Xi, i = 1, .. ,n is denoted by med (xi) and it is given by:

X(v+1)n=2v+l

med (xi) = …(3)

1/2(Xv+Xv+1) n=2v

Where x(i) denotes the ith_{order statistic. In the following, mainly the definitionfor an odd n can be utilized. A}

one-dimensional median filter of size

n =2v+ 1 is defined by the following input-output relation:

Yi= med(Xi-v,…,Xi,…,Xi+v) …(4)

Its input is the sequence xi , i



Z and its output is the sequence yi , i



Z. is also called moving median or

running median. A two-dimensional median filter has the following definition:

Yij=med{ Xi+r , j+s ; (r ,s)



A} (i , j)



Z2 …(5)

The set A



Z2_{defines a neighborhood of the central pixel (i,j) and it is called the filter window.}

Median filtering is yet another must-have feature because not only it renders the image/text documents more comprehensible but it also enhances OCR results if applied prior to OCR submission .

4.2. histogram equalization

Illumination is a main element that affects the appearance of an image [18]. The lighting from dissimilar directions may cause the rough illuminations, which often lead to intensity diversity. Therefore, illumination equalization plays a significant role in image analysis and processing [18].

Liew et al. [18] has introduced an effectual way to reduce the apples of vertical direction rough illumination. look for a moment continuous functions, and let the variable r symbolize images' gray level to be improved In the initial part of our discussion we assume that r has been adjustment to the interval

(7)

[0, 1], with r=0 referring to black and r=1 referring to white. After that, we look a discrete formulation and allow

intensity values to be in the interval

[0, L-1].

For any r meeting the conditions mentioned above , we concentrate, interest on transformations function: s=T(r) 0 ≤r ≤1 … (6)

which obtain a level s for every intensity value r in the input image. we assume that the transformation function T(r) meets the following conditions:

(a) T(r) having the property that each element in the domain has corresponding to it exactly one element in the range. and monotonically increasing in the interval 0 ≤ r ≤ 1; and

(b) 0 ≤ T(r) ≤ 1 for 0 ≤ r ≤1.

The requirement in (a) that T(r) be single valued is needed to guarantee that the inverse transformation will exist, and the monotonicity condition keeps the increasing order from black to white in the target image. A transformation function that is not monotonically increasing could effect in at least a portion of the intensity range being reversed, thus generating some reversed gray levels in the produced image. While this may be a desirable effect in some cases, that isnot what we are after in the present discussion. lastly, condition (b) Ensures that the gray output levels will be in the same range as the input levels. Figure (1-8) gives an example of a transformation function that satisfies these two conditions. The inverse transform from s back to r is defined

r = T-1_{(s) 0 ≤ s≤ 1.} _{… (7)}

that even conditions(a) and (b) are satisfied for T(r), it is possible that the corresponding inverseT-1_{(s) may fail}

to be single valued.

Figure(1-8)gray-level transformation function that is both single valued and monotonically increasing.

Usually, in the image and in the interval [0,1], the gray levels appeared as a random variable. One of the most basic characteristics of a random variable is the probability density function (PDF).

For discrete values we deal with probabilities and summations instead of probability density functions and integrals. The probability of event of gray level rk in an image is equaled roughly by

…(8)

where, as it described at the first part , n is the entire number of pixels in the image, nk is the number of pixels

that have gray level rk, and L is

the entire number of possible gray levels in the image. The discrete formulation of the transformation function is

…(9)

Thus, a processed (output) image is obtained by mapping each pixel with lever in the input image into a corresponding pixel with level Sk in the output image that the transformation in Eq. (9) satisfies conditions (a) and

(b) stated previously in this section.

Unlike its continues counterpart, it cannot be proved in general that this discrete transformation will produce the discrete equivalent of a uniform probability density function, which would be a uniform histogram. therefore, use of Eq. (9) does have the general tendency of spreading the histogram of the input image so that the levels of the histogram-equalized image will span a fuller range of the gray scale.

(8)

4.3. contrast stretching

It is the ability to separate two adjacent points in an image . It is the difference in light intensity among points in an image.. The dynamic range of an image is defined to be the entire range of intensity values contained within an image, or apply a simpler approach, One of the popular degradations in the recorded video frames. The contrast of an image can be defined as the difference between its maximum and minimum intensity values. Contrast enhancement makes images easier to be interpreted by making object features easier to be distinguished. The goal of image enhancement is to adjust Intensity of illumination In order to be clearly distinguishing for human viewers, or to provide better input for other automated image processing techniques.

The resulting image after applied contrast stretching is better than the distorted image because: first, it made the image details easy to discriminate the in the regions that are originally very light or very dark.

Second, for adjusting each picture element value to improve the visualization of structures in both the darker and lighter portions of the image at the same time[20].

Rogowska et al [19], is performed Contrast stretching by sliding a window (called the KERNEL) across the image and adjusting the center element using the formula:

R(x ,y) = 255×[(I(x ,y)-Imin)/(Imax-Imin)] …(10)

where ( x , y) are the coordinates of the center picture element in the KERNEL and Imin and Imax are the minimum and maximum intensity values of the image data in the selected KERNEL.

5. Result

Two videos were captured, each for a different person who pronounces some English letter in each of them consisting of 30 frames photographed by Canon D5600 with 1080 x 1920 resolution. An our algorithm was applied to them Good results were obtained, especially girls' face and bearded face . Looking at the poor lighting at the beginning of the algorithm, we converted the RGB model to HSV color model we adjusted the value of hue and saturation and then recombined after that re-convert HSV model to RGB model. To improve the image, first , we reduced noise using the median filter because it is effective to remove the noise that occurs during the shooting process in addition to its advantage in preserving the edges, then we equalized histogram to provide better quality of images without loss of any information. Then contrast stretching technique are used to increase the visualization in image structures of the parts light and dark at the same time. A linearization method is further applied on the enhanced frames to segment the lips. morphological operators can be used, region prop the binary image (see Fig(1-6) ).Using Lip Feature Extraction Algorithm we extracted Viseme for 5 different viseme classes. We tested our algorithm for many frames for each speaker. On an average we are getting 90% result for all speakers.

6. Conclusion

After completing the implementation of this algorithm, we were able to obtain the lip localization for the purpose of extracting features that in turn are of high accuracy for any available dataset, and this method can be worked on in the future in real-time applications. Note that we were able to extract many of the strong features (such as canny detector ,surf descriptor, width and height of mouth ) for the purpose of speech recognition and this was in our next scientific research

References

A. shen Xian-geng1, WU Wei2 ,"An algorithm of lips secondary positioning and feature extraction based on YCbCr color space", international conference on advance in Mechanical Engineering And Industrial Informatics (AMEII 2015).

B. Daniel SaezTrigueros, Li Meng, "Face Recognition: From Traditional to DeepLearning Methods", Margaret Hartnett ,GBG plc , London E14 9QD, UKOctober 2018.

C. T. Coianiz, L. Torresani and B. Caprile, “2D Deformable Models for Visual Speech Analysis”, Proceedings of Springer, Speech reading by Humans and Machines, D.G. stork & M. E. Hennecke Eds., NY, 1996.

(9)

D. N. Eveno, A. Caplier, P.Y. Coulon. “A new color transformation for lips segmentation”, Proceedings of IEEE Fourth Workshop Multi- media Signal, pp. 3-8, Cannes, France, 2001.

E. N. Eveno, A. Caplier and P. Coulon, “Accurate and Quasi-Automatic Lip Tracking”, Proceedings of IEEE Trans. Circuits Syst. Video Techn. 14(5): 706-715., 2004. [6] A.W.C. Liew, S.H. Leung, and W.H. Lau, “Segmentation of color lip images by spatial fuzzy clustering” Proceedings of IEEE Trans. Fuzzy Syst. vol. 11 no. 4, pp. 542-549, Aug. 2003.

F. Namrata Dave1, "A lip localization based visual featureextraction method" ,Proceedings of An International Journal (ECIJ) Volume 4, Number 4, December 2015

G. Priyanka P. Kapkar, S.D.Bharkad, "Lip Feature Extraction And Movement Recognition Methods: A Review" , international journal of scientific&technologyresearchvolume 8, issue 08, august 2019 .

H. Y Gong, "Speech recogniition in nosisy environments: a survey [J]". SpeechComm. 16, 261–291 (1995).

I. LI Jian, CHENG Changkui, JIANG Tianyan. "Wavelet de-noising

of partial discharge signals based on genetic adaptive threshold estimation [J]".IEEE Trans. Dielectr. Electr. Insul., 2012, 19 (20):543- 549

J. LI Ruwei, BAO Changchun, XIA Bingyin, et al. "Speech

enhancement usingthe combination of adaptive wavelet threshold

and spectral subtractionbased on wavelet packet decomposition ".

ICSP 2012 Proceedings,2012: 481-484

K. N. Dave, N. M. Patel. "Phoneme and Viseme based Approach for

Lip Synchronization.", InternationalJournal of Signal Processing,

Image Processing and Pattern Recognition. 7(3), pp. 385-394, 2014.

L. NeeruRathee, Dinesh Ganotra, "Analysis of human lip features: a

review", Int. J. Applied Systemic Studies, 2015, Vol. 6, No. 2,

pp.137–184.

M. S.Agrawal, V.R.Omprakash and Ranvijay, "Lip reading techniques:

A survey", 2nd International Conference on Applied and Theoretical

Computing and Communication Technology (iCATccT),Bangalor,

2016, pp. 753-757

N. Yuanyao Lu* and Qingqing Liu," Lip segmentation using automatic

selectedinitial contours based on localized activecontour model" ,

EURASIP Journal on Image and Video Processing (2018) 2018:7

O. R.Arun, MadhuS.Nair, Member, IAENG, R.Vrinthavani and Rao

Tatavarti,”An Alpha Rooting based Hybrid Technique for Image

Enhancement”, IEEE, August 24, 2011

P. Vinod Kumar, Dr. Priyanka, and Kaushal Kishore,” A Hybrid Filter for

Image Enhancement”, International Journal of Image Processing and

Vision Sciences (IJIPVS) Volume‐1 Issue‐1, 2012.

Q. Rafael C. Gonzales, Rechard E. Woods, ” digital image processing”,

pearson education international",third addition, pp.144-147.

R. I. Attas, J. Louis, J. Belward, A variational approach to the radiometric

enhancement of digital imagery, IEEE Trans. Image Process.4 (6)

(June 1995) 845-849.

S. S.I. Sahidan1 , M.Y. Mashor1 , A.S.W. Wahab1 , Z. Salleh1 , H. Jaafar2, " Local and Global Contrast Stretching For Color Contrast Enhancement on Ziehl-Neelsen Tissue Section Slide Images", Biomed 2008, Proceedings 21, pp. 583–586, 2008