BME BME 445252 Bio Biomedimedical Signal cal Signal ProcessingProcessing Lecture Lecture 77Introduction to EEG and EPIntroduction to EEG and EPBiological signal feature Biological signal feature extractionextraction

(1)

BME BME 4 4 52 52 Bio Bio medi medi cal Signal cal Signal Processing

Processing Lecture Lecture 7 7

 Introduction to EEG and EP Introduction to EEG and EP

 Biological signal feature Biological signal feature extraction

extraction

(2)

Lecture 7 Outline

 In this lecture, we’ll study some details of Electroencephalogram (EEG) and Evoked Potential (EP) signals

 We will also study some feature extraction methods for EEG and EP

 Some of the methods could be applied to other

biological signals like Electrocardiogram

(3)

Electroencephalogram (EEG)

 As mentioned in first lecture, EEG signals are electrical potential in microVolts range extracted from the scalp

 Figure below shows two electrode placements: 10-20 international system and another extension of this system

 10-20 system has 19 actives electrodes, while the other system has 61 active electrodes

FP1

T8 AF7 F7

FC7

O1 OZ TP7

T5 P07

CZ CPZ PZ

POZ AFZ FZ

FCZ FP2

T10 AF8

F8

FC8

O2

TP8 T6 PO8

C5 C3 C1 C2 C4 C6

FC5 FC3 FC1

F5 F3 F1

AF1

FC2 FC4 FC6 F2

AF2 F6 F4

CP2 CP4 CP6 P2 P4 P6

PO4 PO3

CP1 CP5 CP3

P1 P5 P3

FPZ

Figures from Sornmo and Laguna,

‘Bioelectrical Signal Processing in Cardiac and Neurological Applications’

(4)

EEG signals (cont.)

 EEG signals are stochastic (i.e. opposite to deterministic)

 Generally EEG signals are assumed to be stationary, though this is not always correct

 Stationary means the statistical properties do not change with time

 Normally, we use ‘second order/wide sense stationary’

 Eg: Mean and variance do not change over time

 This can be tested simply by segmenting the signal and verifying if the segments have similar mean and variance values

 The similarity measure is done using t-tests (which will be studied later)

 EEG signals are also assumed to be linear in most applications for simplicity though this is also not always correct

 Eg: y[n]=x[n]+x[n-1] – linear relationship

 Eg: y[n]=x[n]+(x[n-1])³ – non-linear relationship

 So we can use linear techniques

(5)

EEG signals (cont.)

 In general, EEG signals are normally in the following spectral (frequency) ranges

 Delta (0.1 to 3 Hz) – during deep sleep/meditation

 Theta (4-7 Hz) - during sleep

 Alpha (8-13 Hz) – during restful/relaxing state, decreases when eyes are opened

 Beta (14-29 Hz) – during some mental activity like computing maths

 Gamma (above 30 Hz, up to 50 Hz) – involved in any higher order brain activity.

Also triggering mechanism, alerts the brain to recognise a picture/ sound, etc.

 EEG with freq<0.1 Hz is generally baseline wander noise

 There are variations to these spectral bands in different books

 Transient EEG signals are like K complexes observed during sleep, spikes observed during epilepsy - will not be covered in this course

 Event related EEG signals are like P300, mu rhythm

 P300 – evoked when a stimulus (audio/visual) is recognised

 Mu rhythm – a group of waves (10- 12 Hz) having arcade/comb shape – evoked as contralateral attenuation when there is the thought of movement

 These will be studied in the evoked potential section and in later lectures

(6)

Evoked potentials (EP)

 Evoked potential/event related potential is EEG in response to some stimulus like visual, audio,

somasensory, etc.

 In particular, Visual Evoked Potential is studied the most

 As we know, the EP is evoked with the brain still active doing other things

 So we have EP plus ongoing EEG, which is many higher than EP

 So, we need to use ensemble averaging, median filtering,

PCA, etc. to reduce ongoing EEG

(7)

Evoked potential – cont.

 EP are normally analysed in time domain

 Peaks

 There are distinguishable peaks in EP (after reducing ongoing EEG, say through averaging)

 The peaks are identified as P1,P2,P3,….N1,N2,N3

 Sometimes, the peaks are identified using the latency, for example, it is know that the third positive component, P3 has latency about 300 ms, so it is also know as P300

 Latency and amplitude

 Each peak will have a latency and amplitude

 Latency is the time delay (in ms) from the stimulus onset

 Amplitude is the amplitude of the peak in microVolts

 Sometimes, we also use peak-peak amplitudes, like N1-P2 amplitude

Note: some

books define

+ve/-ve peaks

the other way

around

(8)

Types of EPs –Brain stem auditory

 Small amplitude EP, with short latencies

 Evoked when the stimulus is brief auditory stimuli (clicks or narrow tone bursts)

 Consist of about 6 waves in the first 10 ms after the sound

 The frequency range is about 100 – 2000 Hz

Figure from Cooper et. al., EEG technology,

(9)

Types of EPs –Somatosensory

 Tactile stimulation

 By electrical stimulation of the median nerve at the wrist or tibial and peroneal nerves at the leg

 Total duration is about 400 ms but only the first 20 ms are important

 The responses are localised on the scalp over the hand area of the sensorimotor cortex

contralateral to the stimulated wrist

 EP spectral content from

100 to 2000 Hz Figure from

Cooper et. al., EEG technology, 3^rd ed.

(10)

Types of EPs –Visual Evoked Potential

 VEP is important in many disease diagnosis and in psychological studies

 Evoked when subjects perceive some visual stimulus

 Amplitude higher than AEP or SEP, ranging up to 20 microVolt

 The frequency ranges varies for different components from 1 to 300 Hz

 Example: Checkerboard pattern reversal

 Pattern reversal must be less than 10 ms

 Can detect problems with the visual system – increase in the latency

Figures from Cooper et.

al., EEG technology, 3^rd ed.

(11)

EP components

 Many components can be evoked each with different experimental paradigm

 Some of these components

 N100

 P300

 N400

 Contigent negative variation (CNV)

 Bereitschafts (readiness) potential

 ERD/ERS

(12)

EP components – N100

 N100 can be evoked easily by auditory, visual and tactile stimuli

 For example, clapping a hand, showing a flash of light

 It is believed to be the response to stimulus perception, i.e. when the brain recognises that a stimulus has been presented

 Normally, N100 is less than 20 Hz, so a LPF of 20 Hz should be used

 So, apply a LPF before/after doing averaging

 In the figure, why is the flash N100 latency longer than the click N100 latency?

Figure from Cooper et. al., EEG technology, 3^rd ed.

(13)

EP components – P300

 P300 is the third positive component evoked with latency about 300 ms

 It relates the subjects ability of attention

 P300 is commonly elicited using oddball paradigm

 Say, two pictures are presented, one with higher frequency than the other

 The subject has to responds (say by pressing a button) when the infrequent picture is seen

 This is when P300 will be evoked significantly

 Eg: 2 pictures square and X shown

 Square shown with 80% probability, subject concentrates on the occurrence of X

 When X occurs, P300 is elicited

 Normally, P300 is less than 8 Hz, so a LPF of 8 Hz should be used

 P300 is maximal is midline parietal, i.e. Pz

P300 for non-target and target pictures

X

(14)

EP components – N400

 N400 is the negative component evoked with latency about 400 ms

 It relates the subjects ability in processing semantic information

 Normally, N400 is less than 8 Hz, so a LPF of 8 Hz should be used

 For example, word in a sentence that is an outlier will evoke N400

 “It was nice seeing you eating a cake” may not evoke N400

 But “It was nice seeing you eating a cage” will evoke N400

 So, N400 has been used in assessing language capabilities

(15)

EP components – CNV, readiness, ERS/ERD

 CNV is evoked during the interval when a pair of stimuli are presented

 Especially when the subject had to respond to the second stimulus

 Bereitschafts (readiness) potential precedes a voluntary movement

 There would be increase in negative potential about one second before the movement

 ERD/ERS

 An extension of readiness potential

 In recent years, it has been that ERD/ERS is evoked in contralateral/ipsilateral motor cortex preceding voluntary movement

 In contralateral motor cortex

 ERD occurs just before movement in the 10- 12 Hz (also known as mu band)

 ERS after movement

 Other bands like 14-18 Hz and 36-40 Hz are also used

 Useful for Brain-computer interface

Figure from Cooper et. al., EEG technology, 3^rd ed.

(16)

What are features?

 Features are some values computed from the signals

 Features should be

 Representative of the signal

 Reproducible

 Other criteria of the features will depend on the application

 Smaller dimension than the signal

 Inter-class variance high/intra-class variance low

 Robust/enhanced representation of the signal

(invariant to changes in noise, scale factors, etc.)

(17)

Simple features

 Simple features are like

 Mean (but not very useful as we usually set it to zero)

 Standard deviation

 Energy (which can be computed by using variance after setting mean to zero)

 Very often, we measure the energy in different spectral bands like delta, theta, alpha, beta and gamma and use them as features

 To do this, we filter the EEG signal in the specific bands and then compute the energy

 The MATLAB codes (using IIR Elliptic filter):

 usually set Rs=0.1 dB and Rp=30 dB

%select appropriate N for each band using ellipord N=ellipord (Wp, Ws, Rp, Rs)

%then use ellip to obtain filter coefficients [B,A] = ellip (N, Rp, Rs, Wp)

%then filter the signal using filtfilt Xf=filfilt(B,A,X)

%compute energy using var EXf=var(Xf)

%example of values for the bands

% for delta band

Wp=[0.5 3]; Ws=[0.25 3.5];

% for theta band

Wp=[4 7]; Ws=[3.5 7.5];

% for alpha band

Wp=[8 13]; Ws=[7.5 13.5];

% for beta band

Wp=[14 29]; Ws=[13.5 29.5];

% for gamma band

Wp=[30 50]; Ws=[29.5 50.5];

(18)

Correlation

 Correlation of a test EEG signal with a template EEG signal is used sometimes as features

 In MATLAB, we can use R=corrcoef(X,Y) where X is the test signal and Y is the template signal

 This is useful when we have a template of signals and need to test the class/category of the test signal

 For example, if we have templates for EEG signals from 5 different mental activities, then we can use the correlation value from of the test EEG signal with each of the templates

 R1=corrcoef(EEGX,EEGY1)

 The highest correlation will tell us which activity the test EEG signal belongs to

 However, this method is more suitable for ECG signals rather than EEG

 EEG signals are more ‘random’ as compared to ECG signals which have stable patterns

(19)

Autoregressive coefficents



Autoregressive coefficients are another popular linear feature extraction method for biological signals



Sometimes known as parametric methods



A real valued, zero mean, stationary, non-deterministic, autoregressive process of order p is given by



where p is the model order

 x(n) is the data of the signal at sampled point n

 ak are the real valued AR coefficients

 e(n) represents the white noise error term independent of past samples

 AR modelling is something like obtaining an equation that fits the signal (like in curve fitting)

) ( )

1 ( )

( p x n k e n

k a k n

x   

 



(20)

AR modelling

 AR modelling tries to model the signal assuming that a data point is closely related to the previous few data points

 This is very suitable for modelling bio-signals

 Many different techniques have been proposed to estimate ak

 Yule-Walker equations but requires huge computational time

 So, we have recursive algorithms - estimate ak order p from the ak of order p-1

 Burg’s algorithm

 Levinson–Durbin algorithm

 Burg’s method is more accurate than Levinson-Durbin

 Since it uses the data points directly instead of autocorrelation function, which is generally erroneous for small data segments

 Also uses more data points simultaneously by minimising not only a forward error but also a backward error

 This algorithm is given in the appendix BUT will NOT be tested in the exam

(21)

Choosing the AR model order

 A model order which is too high will overfit the data and represent too much noise

 But a model order which is too small will not sufficiently represent the signal

 So a compromise has to be chosen for the model order

 There are many methods to compute the model order like Akaike Information Criterion (AIC), Final Prediction Error, Criterion Autoregressive Transfer, Minimum Description Length

 We will us AIC

 Where p is the model order, N is the length of the signal and is the variance of the error sequence at order p

 In MATLAB, the code, [A,E] = arburg(x,p) returns the ak coefficients of x in A and error variance in E

 So, we compute AIC for orders 1 to max order and then choose the order p that minimises the AIC

N p

p

AIC ( )  ln(  _p ² )  2 /

2



p

(22)

AIC example

 Assume we have a signal as below

 Computing the AIC (from order 1 to order 20) gives us the following plot

 What order should we choose? Order 4 as this is the global minimum value

(23)

AR coefficients – an example

 Assume for the following signal x, the AR coefficients of order 3 are

 A=-0.46 -0.41 -0.10

 Compute x[9] using this 3

^rd

order AR model. Ignore the error term for now.

 What is the error in this prediction?

 Error=0.64-0.61=0.03

 In MATLAB, we will get AR coefficients in this form [1.0000 -0.4618 -0.4048 -0.1058]

 Note in MATLAB, ignore the first AR coefficient, which is always 1

x=[0.58 0.42 0.52 0.33 0.43 0.26 0.58 0.76 0.53 0.64]

) 3 (

) 1

( x n k

k a k n

x  

 



) ( ) 1 (

)

(

p x n k e n

k ak n

x

  

 



x(9)=-(-0.46*x(8)-0.41*x(7)-0.10*x(6)) x(9)=-(-0.46*0.53-0.41*0.76-0.10*0.58) x(9)=0.61

x[9]

(24)

AR coefficients – an example

 In general, for different mental activities, the AR coefficients are different.

 So this method can be used to differentiate between the EEG signals.

 But this is not so clearly evident from the EEG plots BUT evident from the AR coefficients.

for p=1:max_order, [A,E] = arburg(X,p);

AIC(p)=log(E)+2*p/N;

end

[Mini Ordmin]= min(AIC) ARc=arburg(X,Ordmin)

example for order 3:

ARc=[-0.8155 -0.1350 0.0328]

0 500 1000 1500 2000 2500

-60 -50 -40 -30 -20 -10 0 10 20 30

EEG during resting (relaxing)

-20 -10 0 10 20 30

EEG during math computation example for order 3:

ARc=[-1.5718 0.6553 -0.0482]

(25)

AR coefficient features – another example

 In the example, 4 EEG plots for one subject are shown from two mental activities (math’s activity and imagining an object being rotated)

 Can you say, which is the maths and object rotation activity EEG from the plots?

 Now, use the AR

coefficients (order 6); can you see which is which?

EEG during object rotation EEG during object rotation EEG during math computation

0 500 1000 1500 2000 2500

-40 -30 -20 -10 0 10 20 30

EEG during math computation

ARc=[-1.5661 0.7114 -0.1843 -0.0583 0.2179 -0.0769

Arc=[-1.6091 0.603 -0.1931 -0.0432 0.2112 -0.0553]

0 500 1000 1500 2000 2500

-40 -30 -20 -10 0 10 20 30 40

Arc=[-0.6128 -0.1677 -0.1159 -0.0733 0.0179 0.0299]

Arc=[-0.5647 -0.2189 -0.0826 -0.0756 0.0083 0.0215]

0 500 1000 1500 2000 2500

-40 -30 -20 -10 0 10 20 30 40 50

0 500 1000 1500 2000 2500

-40 -30 -20 -10 0 10 20 30 40

(26)

Spectral features - PSD

 Power spectral density

 PSD values are also used as features for EEG

 Different methods can be used to obtain PSD –periodogram, Welch and AR based

 Example: PSDs for resting and maths activities

 Can you see that there is more spectral power in the lower frequency during maths activity as compared to resting?

Resting EEG Maths EEG

-5 0 5 10 15 20 25 30 35

Power Spectrum Magnitude (dB)

-5 0 5 10 15 20 25 30

(27)

Spectral features – Peak frequency

 Peak frequency

 Sometimes, we identify one or more peaks in the PSD plot and then compute the following parameters as features

 Parameters

 Frequency – the actual frequency value of the peak

 Amplitude – the amplitude (i.e. PSD) value of the peak

 Width – the width of the peak computed using some threshold, eg. 10 dB



Area- Sometimes we also use the area under the peak as features

 This is computed as the sum of PSD values from say, ± 5 Hz

 Example: in the figure, for peak 1

 Frequency =25 Hz

 Amplitude = 35 dB

 Width (10 dB threshold) = 27-22+1=6

 Area= sum the PSD values from frequency 20 to 30 Hz.

This same sort of feature extraction could be done for peaks in time domain – like for EP signals, check slide 7

(28)

Spectral features – Asymmetry ratio PSD

 Asymmetry ratio PSD

 After obtaining the PSD, we can obtain the asymmetry ratio using

 where P1 is the PSD in one channel and P2 is the power in another channel but in the opposite hemisphere.

 This feature is useful when the mental activities that are studied exhibit inter-hemispheric difference

 For example, maths activity exhibit a higher power spectrum in the left hemisphere whereas visual tasks do so in the right hemisphere

 For example, AS

_PSD

using O1 and O2 will be higher for maths activity as compared to visual activity

EEG C4 C3

P3 P4

O1 O2

A1 A2

 

 







 

2 1

P P

P

AS

^PSD

P

(29)

Spectral features – Asymmetry ratio PSD -example

 In the example, the PSD plots for maths activity and object rotation activity are shown

 AS_PSDusing O1 and O2 is computed

 The third plot shows AS_PSD for maths activity and the last plot shows AS_PSDfor visual activity

 If the visual plots for AS_PSDis not very clear, we can further compute the total sum of AS_PSDas a feature

 The results, AS_PSDshould be higher for maths activity as compared to object rotation

0 20 40 60 80 100 120 140

-0.12 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06

AS_PSD for object maths activity

Sum=0.3262

AS_PSD for

object rotation activity

Sum=-0.1433

0 20 40 60 80 100 120 140

-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

PSD Rotation, O1 PSD Rotation, O2

0 0.2 0.4 0.6 0.8 1

5 10 15 20 25 30 35

Frequency

0 0.2 0.4 0.6 0.8 1

10 15 20 25 30 35

Frequency

PSD Maths, O2 PSD Maths, O1

0 0.2 0.4 0.6 0.8 1

5 10 15 20 25 30 35

Frequency

0 0.2 0.4 0.6 0.8 1

5 10 15 20 25 30 35

Frequency

'If the visual plots for ASPSD is not very clear, we can further compute the total sum of ASPSD as a feature.

(30)

PSD using AR coefficients

 We have studied PSD estimation using periodogram and Welch methods.

 We can also use AR method to get PSD

 After obtaining the ak coefficients and using arburg function and with some suitable order p chosen by AIC

 We can proceed to compute PSD

 This is known as parametric estimation of PSD as we are trying to compute PSD not using the signal but using some AR parameters of the signal

 Note that you have to use Euler theorem here as we have e^-j2fkThere, so we need to bring in cosine and sine functions like we did for DFT

 But the good thing is that in MATLAB, we have the function pburg(x,p) that does this PSD computation for us

2



p



 p  k

fkT e i

ak pT f

S

0

|2

| 2

2 )

(





(31)

Comparison of PSD estimation using AR method over periodogram

 Comparison of AR method and classical methods like periodogram:

 Frequency resolution problem is not there for AR method

 AR method is better when the signals have a very narrowband (i.e.

limited to a small frequency range)

 AR method requires fewer cycles -sometimes even one cycle is enough

 Windowing is normally not required for AR method

 But AR method is more sensitive to noise, so use periodogram if there is a

lot of noise in the signal

(32)

Spectral Correlation/Coherence

 This is similar to the correlation that we discussed earlier

 The only difference is that the correlation is studied in frequency domain instead of time

 So the correlation is studied with PSD values

 With MATLAB, we can still use R=corrcoef(X,Y) where X is the PSD of say the rotation EEG and Y is the PSD of the maths EEG

 For example:

 Spectral correlation of object rotation in channel O1 and O2,

 SR=corr(PSDrO1,PSDrO2) =0.9851

 SM=corr(PSDmO1,PSDmO2) =0.9921

 SMRO1=corr(PSDmO1,PSDrO1) =0.9582

 SMRO2=corr(PSDmO2,PSDrO2) =0.9410

 Can you justify the reason behind these values?

(33)

Hjorth descriptors

 Hjorth descriptors are useful feature extraction methods for biological signals

 Hjorth descriptors are 3: activity, mobility and complexity

 Activity

 Activity is simple the variance of the signal, representing the energy

 Mobility

 Mobility is where is the std. dev. of first derivative of x

 Complexity/Form factor

 The most useful is the FF, which gives a computational value for shape – different shapes, different FF values

x x x

M 



_'

 

_x_'

x[n]’=x[n]-x[n-1]

x x

x x x

x

M FF M



 / /

' ' '' '



Second derivative:

x[n]’=x[n]-2x[n-1]+x[n-2]

(34)

Hjorth descriptors - example

 For the following EEG signal, the Hjorth descriptors are:

 Activity =41.4861

 Mobility =0.4043

 FF=3.4178

 This has been applied successfully for the detection of epilepsy – why?

 But these descriptors are very

sensitive to noise and suitable only for EEG or ECG, not EMG

 As EMG signals have a wide bandwidth

0 500 1000 1500 2000 2500

-40 -30 -20 -10 0 10 20

Activity=var(EEG) for i=2:length(EEG), EEG1(i)=EEG(i)-EEG(i-1);

end

EEG1(i)=EEG(i)-EEG(i-1);

Mobility=std(EEG1)/std(EEG) for i=3:length(EEG),

EEG2(i)=EEG(i)-2*EEG(i-1)+EEG(i-2);

end

FF=(std(EEG2)/std(EEG1))/(std(EEG1(1,:)/std(EEG))

(35)

Joint time-frequency

 Sometimes we can compute feature that are both in both time and frequency domains

 Some of the joint time-frequency analysis methods are:

 Linear, non-parametric methods

 Can be obtained by linear filtering

 Like Short-time Fourier transform and Wavelet transform

 Quadratic, non-parametric methods

 Improved time-frequency resolution

 Like Wigner-Ville distribution and related techniques

 Parametric methods

 Assume that the observed signal can be obtained from time-varying statistical models

 Like autoregressive model

 We will not study joint time-frequency analysis in this course

(36)