BME BME 4 4 52 52 Bio Bio medi medi cal Signal cal Signal Processing
Processing Lecture Lecture 7 7
Introduction to EEG and EP Introduction to EEG and EP
Biological signal feature Biological signal feature extraction
extraction
Lecture 7 Outline
In this lecture, we’ll study some details of Electroencephalogram (EEG) and Evoked Potential (EP) signals
We will also study some feature extraction methods for EEG and EP
Some of the methods could be applied to other
biological signals like Electrocardiogram
Electroencephalogram (EEG)
As mentioned in first lecture, EEG signals are electrical potential in microVolts range extracted from the scalp
Figure below shows two electrode placements: 10-20 international system and another extension of this system
10-20 system has 19 actives electrodes, while the other system has 61 active electrodes
FP1
T8 AF7 F7
FC7
O1 OZ TP7
T5 P07
CZ CPZ PZ
POZ AFZ FZ
FCZ FP2
T10 AF8
F8
FC8
O2
TP8 T6 PO8
C5 C3 C1 C2 C4 C6
FC5 FC3 FC1
F5 F3 F1
AF1
FC2 FC4 FC6 F2
AF2 F6 F4
CP2 CP4 CP6 P2 P4 P6
PO4 PO3
CP1 CP5 CP3
P1 P5 P3
FPZ
Figures from Sornmo and Laguna,
‘Bioelectrical Signal Processing in Cardiac and Neurological Applications’
EEG signals (cont.)
EEG signals are stochastic (i.e. opposite to deterministic)
Generally EEG signals are assumed to be stationary, though this is not always correct
Stationary means the statistical properties do not change with time
Normally, we use ‘second order/wide sense stationary’
Eg: Mean and variance do not change over time
This can be tested simply by segmenting the signal and verifying if the segments have similar mean and variance values
The similarity measure is done using t-tests (which will be studied later)
EEG signals are also assumed to be linear in most applications for simplicity though this is also not always correct
Eg: y[n]=x[n]+x[n-1] – linear relationship
Eg: y[n]=x[n]+(x[n-1])3 – non-linear relationship
So we can use linear techniques
EEG signals (cont.)
In general, EEG signals are normally in the following spectral (frequency) ranges
Delta (0.1 to 3 Hz) – during deep sleep/meditation
Theta (4-7 Hz) - during sleep
Alpha (8-13 Hz) – during restful/relaxing state, decreases when eyes are opened
Beta (14-29 Hz) – during some mental activity like computing maths
Gamma (above 30 Hz, up to 50 Hz) – involved in any higher order brain activity.
Also triggering mechanism, alerts the brain to recognise a picture/ sound, etc.
EEG with freq<0.1 Hz is generally baseline wander noise
There are variations to these spectral bands in different books
Transient EEG signals are like K complexes observed during sleep, spikes observed during epilepsy - will not be covered in this course
Event related EEG signals are like P300, mu rhythm
P300 – evoked when a stimulus (audio/visual) is recognised
Mu rhythm – a group of waves (10- 12 Hz) having arcade/comb shape – evoked as contralateral attenuation when there is the thought of movement
These will be studied in the evoked potential section and in later lectures
Evoked potentials (EP)
Evoked potential/event related potential is EEG in response to some stimulus like visual, audio,
somasensory, etc.
In particular, Visual Evoked Potential is studied the most
As we know, the EP is evoked with the brain still active doing other things
So we have EP plus ongoing EEG, which is many higher than EP
So, we need to use ensemble averaging, median filtering,
PCA, etc. to reduce ongoing EEG
Evoked potential – cont.
EP are normally analysed in time domain
Peaks
There are distinguishable peaks in EP (after reducing ongoing EEG, say through averaging)
The peaks are identified as P1,P2,P3,….N1,N2,N3
Sometimes, the peaks are identified using the latency, for example, it is know that the third positive component, P3 has latency about 300 ms, so it is also know as P300
Latency and amplitude
Each peak will have a latency and amplitude
Latency is the time delay (in ms) from the stimulus onset
Amplitude is the amplitude of the peak in microVolts
Sometimes, we also use peak-peak amplitudes, like N1-P2 amplitude
Note: some
books define
+ve/-ve peaks
the other way
around
Types of EPs –Brain stem auditory
Small amplitude EP, with short latencies
Evoked when the stimulus is brief auditory stimuli (clicks or narrow tone bursts)
Consist of about 6 waves in the first 10 ms after the sound
The frequency range is about 100 – 2000 Hz
Figure from Cooper et. al., EEG technology,
Types of EPs –Somatosensory
Tactile stimulation
By electrical stimulation of the median nerve at the wrist or tibial and peroneal nerves at the leg
Total duration is about 400 ms but only the first 20 ms are important
The responses are localised on the scalp over the hand area of the sensorimotor cortex
contralateral to the stimulated wrist
EP spectral content from
100 to 2000 Hz Figure from
Cooper et. al., EEG technology, 3rd ed.
Types of EPs –Visual Evoked Potential
VEP is important in many disease diagnosis and in psychological studies
Evoked when subjects perceive some visual stimulus
Amplitude higher than AEP or SEP, ranging up to 20 microVolt
The frequency ranges varies for different components from 1 to 300 Hz
Example: Checkerboard pattern reversal
Pattern reversal must be less than 10 ms
Can detect problems with the visual system – increase in the latency
Figures from Cooper et.
al., EEG technology, 3rd ed.
EP components
Many components can be evoked each with different experimental paradigm
Some of these components
N100
P300
N400
Contigent negative variation (CNV)
Bereitschafts (readiness) potential
ERD/ERS
EP components – N100
N100 can be evoked easily by auditory, visual and tactile stimuli
For example, clapping a hand, showing a flash of light
It is believed to be the response to stimulus perception, i.e. when the brain recognises that a stimulus has been presented
Normally, N100 is less than 20 Hz, so a LPF of 20 Hz should be used
So, apply a LPF before/after doing averaging
In the figure, why is the flash N100 latency longer than the click N100 latency?
Figure from Cooper et. al., EEG technology, 3rd ed.
EP components – P300
P300 is the third positive component evoked with latency about 300 ms
It relates the subjects ability of attention
P300 is commonly elicited using oddball paradigm
Say, two pictures are presented, one with higher frequency than the other
The subject has to responds (say by pressing a button) when the infrequent picture is seen
This is when P300 will be evoked significantly
Eg: 2 pictures square and X shown
Square shown with 80% probability, subject concentrates on the occurrence of X
When X occurs, P300 is elicited
Normally, P300 is less than 8 Hz, so a LPF of 8 Hz should be used
P300 is maximal is midline parietal, i.e. Pz
P300 for non-target and target pictures
X
EP components – N400
N400 is the negative component evoked with latency about 400 ms
It relates the subjects ability in processing semantic information
Normally, N400 is less than 8 Hz, so a LPF of 8 Hz should be used
For example, word in a sentence that is an outlier will evoke N400
“It was nice seeing you eating a cake” may not evoke N400
But “It was nice seeing you eating a cage” will evoke N400
So, N400 has been used in assessing language capabilities
EP components – CNV, readiness, ERS/ERD
CNV is evoked during the interval when a pair of stimuli are presented
Especially when the subject had to respond to the second stimulus
Bereitschafts (readiness) potential precedes a voluntary movement
There would be increase in negative potential about one second before the movement
ERD/ERS
An extension of readiness potential
In recent years, it has been that ERD/ERS is evoked in contralateral/ipsilateral motor cortex preceding voluntary movement
In contralateral motor cortex
ERD occurs just before movement in the 10- 12 Hz (also known as mu band)
ERS after movement
Other bands like 14-18 Hz and 36-40 Hz are also used
Useful for Brain-computer interface
Figure from Cooper et. al., EEG technology, 3rd ed.
What are features?
Features are some values computed from the signals
Features should be
Representative of the signal
Reproducible
Other criteria of the features will depend on the application
Smaller dimension than the signal
Inter-class variance high/intra-class variance low
Robust/enhanced representation of the signal
(invariant to changes in noise, scale factors, etc.)
Simple features
Simple features are like
Mean (but not very useful as we usually set it to zero)
Standard deviation
Energy (which can be computed by using variance after setting mean to zero)
Very often, we measure the energy in different spectral bands like delta, theta, alpha, beta and gamma and use them as features
To do this, we filter the EEG signal in the specific bands and then compute the energy
The MATLAB codes (using IIR Elliptic filter):
usually set Rs=0.1 dB and Rp=30 dB
%select appropriate N for each band using ellipord N=ellipord (Wp, Ws, Rp, Rs)
%then use ellip to obtain filter coefficients [B,A] = ellip (N, Rp, Rs, Wp)
%then filter the signal using filtfilt Xf=filfilt(B,A,X)
%compute energy using var EXf=var(Xf)
%example of values for the bands
% for delta band
Wp=[0.5 3]; Ws=[0.25 3.5];
% for theta band
Wp=[4 7]; Ws=[3.5 7.5];
% for alpha band
Wp=[8 13]; Ws=[7.5 13.5];
% for beta band
Wp=[14 29]; Ws=[13.5 29.5];
% for gamma band
Wp=[30 50]; Ws=[29.5 50.5];
Correlation
Correlation of a test EEG signal with a template EEG signal is used sometimes as features
In MATLAB, we can use R=corrcoef(X,Y) where X is the test signal and Y is the template signal
This is useful when we have a template of signals and need to test the class/category of the test signal
For example, if we have templates for EEG signals from 5 different mental activities, then we can use the correlation value from of the test EEG signal with each of the templates
R1=corrcoef(EEGX,EEGY1)
R2=corrcoef(EEGX,EEGY2)
R3=corrcoef(EEGX,EEGY3)
R4=corrcoef(EEGX,EEGY4)
R5=corrcoef(EEGX,EEGY5)
The highest correlation will tell us which activity the test EEG signal belongs to
However, this method is more suitable for ECG signals rather than EEG
EEG signals are more ‘random’ as compared to ECG signals which have stable patterns
Autoregressive coefficents
Autoregressive coefficients are another popular linear feature extraction method for biological signals
Sometimes known as parametric methods
A real valued, zero mean, stationary, non-deterministic, autoregressive process of order p is given by
where p is the model order
x(n) is the data of the signal at sampled point n
ak are the real valued AR coefficients
e(n) represents the white noise error term independent of past samples
AR modelling is something like obtaining an equation that fits the signal (like in curve fitting)
) ( )
1 ( )
( p x n k e n
k a k n
x
AR modelling
AR modelling tries to model the signal assuming that a data point is closely related to the previous few data points
This is very suitable for modelling bio-signals
Many different techniques have been proposed to estimate ak
Yule-Walker equations but requires huge computational time
So, we have recursive algorithms - estimate ak order p from the ak of order p-1
Burg’s algorithm
Levinson–Durbin algorithm
Burg’s method is more accurate than Levinson-Durbin
Since it uses the data points directly instead of autocorrelation function, which is generally erroneous for small data segments
Also uses more data points simultaneously by minimising not only a forward error but also a backward error
This algorithm is given in the appendix BUT will NOT be tested in the exam
Choosing the AR model order
A model order which is too high will overfit the data and represent too much noise
But a model order which is too small will not sufficiently represent the signal
So a compromise has to be chosen for the model order
There are many methods to compute the model order like Akaike Information Criterion (AIC), Final Prediction Error, Criterion Autoregressive Transfer, Minimum Description Length
We will us AIC
Where p is the model order, N is the length of the signal and is the variance of the error sequence at order p
In MATLAB, the code, [A,E] = arburg(x,p) returns the ak coefficients of x in A and error variance in E
So, we compute AIC for orders 1 to max order and then choose the order p that minimises the AIC
N p
p
AIC ( ) ln( p 2 ) 2 /
2
pAIC example
Assume we have a signal as below
Computing the AIC (from order 1 to order 20) gives us the following plot
What order should we choose? Order 4 as this is the global minimum value
AR coefficients – an example
Assume for the following signal x, the AR coefficients of order 3 are
A=-0.46 -0.41 -0.10
Compute x[9] using this 3
rdorder AR model. Ignore the error term for now.
What is the error in this prediction?
Error=0.64-0.61=0.03
In MATLAB, we will get AR coefficients in this form [1.0000 -0.4618 -0.4048 -0.1058]
Note in MATLAB, ignore the first AR coefficient, which is always 1
x=[0.58 0.42 0.52 0.33 0.43 0.26 0.58 0.76 0.53 0.64]
) 3 (
) 1
( x n k
k a k n
x
) ( ) 1 (
)
(
p x n k e nk ak n
x
x(9)=-(-0.46*x(8)-0.41*x(7)-0.10*x(6)) x(9)=-(-0.46*0.53-0.41*0.76-0.10*0.58) x(9)=0.61
x[9]
AR coefficients – an example
In general, for different mental activities, the AR coefficients are different.
So this method can be used to differentiate between the EEG signals.
But this is not so clearly evident from the EEG plots BUT evident from the AR coefficients.
for p=1:max_order, [A,E] = arburg(X,p);
AIC(p)=log(E)+2*p/N;
end
[Mini Ordmin]= min(AIC) ARc=arburg(X,Ordmin)
example for order 3:
ARc=[-0.8155 -0.1350 0.0328]
0 500 1000 1500 2000 2500
-60 -50 -40 -30 -20 -10 0 10 20 30
EEG during resting (relaxing)
-20 -10 0 10 20 30
EEG during math computation example for order 3:
ARc=[-1.5718 0.6553 -0.0482]
AR coefficient features – another example
In the example, 4 EEG plots for one subject are shown from two mental activities (math’s activity and imagining an object being rotated)
Can you say, which is the maths and object rotation activity EEG from the plots?
Now, use the AR
coefficients (order 6); can you see which is which?
EEG during object rotation EEG during object rotation EEG during math computation
0 500 1000 1500 2000 2500
-40 -30 -20 -10 0 10 20 30
EEG during math computation
ARc=[-1.5661 0.7114 -0.1843 -0.0583 0.2179 -0.0769
Arc=[-1.6091 0.603 -0.1931 -0.0432 0.2112 -0.0553]
0 500 1000 1500 2000 2500
-40 -30 -20 -10 0 10 20 30 40
Arc=[-0.6128 -0.1677 -0.1159 -0.0733 0.0179 0.0299]
Arc=[-0.5647 -0.2189 -0.0826 -0.0756 0.0083 0.0215]
0 500 1000 1500 2000 2500
-40 -30 -20 -10 0 10 20 30 40 50
0 500 1000 1500 2000 2500
-40 -30 -20 -10 0 10 20 30 40
Spectral features - PSD
Power spectral density
PSD values are also used as features for EEG
Different methods can be used to obtain PSD –periodogram, Welch and AR based
Example: PSDs for resting and maths activities
Can you see that there is more spectral power in the lower frequency during maths activity as compared to resting?
Resting EEG Maths EEG
-5 0 5 10 15 20 25 30 35
Power Spectrum Magnitude (dB)
-5 0 5 10 15 20 25 30
Power Spectrum Magnitude (dB)
Spectral features – Peak frequency
Peak frequency
Sometimes, we identify one or more peaks in the PSD plot and then compute the following parameters as features
Parameters
Frequency – the actual frequency value of the peak
Amplitude – the amplitude (i.e. PSD) value of the peak
Width – the width of the peak computed using some threshold, eg. 10 dB
Area- Sometimes we also use the area under the peak as features This is computed as the sum of PSD values from say, ± 5 Hz
Example: in the figure, for peak 1
Frequency =25 Hz
Amplitude = 35 dB
Width (10 dB threshold) = 27-22+1=6
Area= sum the PSD values from frequency 20 to 30 Hz.
This same sort of feature extraction could be done for peaks in time domain – like for EP signals, check slide 7
Spectral features – Asymmetry ratio PSD
Asymmetry ratio PSD
After obtaining the PSD, we can obtain the asymmetry ratio using
where P1 is the PSD in one channel and P2 is the power in another channel but in the opposite hemisphere.
This feature is useful when the mental activities that are studied exhibit inter-hemispheric difference
For example, maths activity exhibit a higher power spectrum in the left hemisphere whereas visual tasks do so in the right hemisphere
For example, AS
PSDusing O1 and O2 will be higher for maths activity as compared to visual activity
EEG C4 C3
P3 P4
O1 O2
A1 A2
2 1
2 1
P P
P
AS
PSDP
Spectral features – Asymmetry ratio PSD -example
In the example, the PSD plots for maths activity and object rotation activity are shown
ASPSD using O1 and O2 is computed
The third plot shows ASPSD for maths activity and the last plot shows ASPSD for visual activity
If the visual plots for ASPSD is not very clear, we can further compute the total sum of ASPSD as a feature
The results, ASPSD should be higher for maths activity as compared to object rotation
0 20 40 60 80 100 120 140
-0.12 -0.1 -0.08 -0.06 -0.04 -0.02 0 0.02 0.04 0.06
ASPSD for object maths activity
Sum=0.3262
ASPSD for
object rotation activity
Sum=-0.1433
0 20 40 60 80 100 120 140
-0.01 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08
PSD Rotation, O1 PSD Rotation, O2
0 0.2 0.4 0.6 0.8 1
5 10 15 20 25 30 35
Frequency
Power Spectrum Magnitude (dB)
0 0.2 0.4 0.6 0.8 1
10 15 20 25 30 35
Frequency
Power Spectrum Magnitude (dB)
PSD Maths, O2 PSD Maths, O1
0 0.2 0.4 0.6 0.8 1
5 10 15 20 25 30 35
Frequency
Power Spectrum Magnitude (dB)
0 0.2 0.4 0.6 0.8 1
5 10 15 20 25 30 35
Frequency
Power Spectrum Magnitude (dB)
'If the visual plots for ASPSD is not very clear, we can further compute the total sum of ASPSD as a feature.
PSD using AR coefficients
We have studied PSD estimation using periodogram and Welch methods.
We can also use AR method to get PSD
After obtaining the ak coefficients and using arburg function and with some suitable order p chosen by AIC
We can proceed to compute PSD
This is known as parametric estimation of PSD as we are trying to compute PSD not using the signal but using some AR parameters of the signal
Note that you have to use Euler theorem here as we have e-j2fkT here, so we need to bring in cosine and sine functions like we did for DFT
But the good thing is that in MATLAB, we have the function pburg(x,p) that does this PSD computation for us
2
p
p k
fkT e i
ak pT f
S
0
|2
| 2
2 )
(
Comparison of PSD estimation using AR method over periodogram
Comparison of AR method and classical methods like periodogram:
Frequency resolution problem is not there for AR method
AR method is better when the signals have a very narrowband (i.e.
limited to a small frequency range)
AR method requires fewer cycles -sometimes even one cycle is enough
Windowing is normally not required for AR method
But AR method is more sensitive to noise, so use periodogram if there is a
lot of noise in the signal
Spectral Correlation/Coherence
This is similar to the correlation that we discussed earlier
The only difference is that the correlation is studied in frequency domain instead of time
So the correlation is studied with PSD values
With MATLAB, we can still use R=corrcoef(X,Y) where X is the PSD of say the rotation EEG and Y is the PSD of the maths EEG
For example:
Spectral correlation of object rotation in channel O1 and O2,
SR=corr(PSDrO1,PSDrO2) =0.9851
SM=corr(PSDmO1,PSDmO2) =0.9921
SMRO1=corr(PSDmO1,PSDrO1) =0.9582
SMRO2=corr(PSDmO2,PSDrO2) =0.9410
Can you justify the reason behind these values?
Hjorth descriptors
Hjorth descriptors are useful feature extraction methods for biological signals
Hjorth descriptors are 3: activity, mobility and complexity
Activity
Activity is simple the variance of the signal, representing the energy
Mobility
Mobility is where is the std. dev. of first derivative of x
Complexity/Form factor
The most useful is the FF, which gives a computational value for shape – different shapes, different FF values
x x x
M
'
x'x[n]’=x[n]-x[n-1]
x x
x x x
x
M FF M
/ /
' ' '' '
Second derivative:x[n]’=x[n]-2x[n-1]+x[n-2]
Hjorth descriptors - example
For the following EEG signal, the Hjorth descriptors are:
Activity =41.4861
Mobility =0.4043
FF=3.4178
This has been applied successfully for the detection of epilepsy – why?
But these descriptors are very
sensitive to noise and suitable only for EEG or ECG, not EMG
As EMG signals have a wide bandwidth
0 500 1000 1500 2000 2500
-40 -30 -20 -10 0 10 20
Activity=var(EEG) for i=2:length(EEG), EEG1(i)=EEG(i)-EEG(i-1);
end
EEG1(i)=EEG(i)-EEG(i-1);
Mobility=std(EEG1)/std(EEG) for i=3:length(EEG),
EEG2(i)=EEG(i)-2*EEG(i-1)+EEG(i-2);
end
FF=(std(EEG2)/std(EEG1))/(std(EEG1(1,:)/std(EEG))
Joint time-frequency
Sometimes we can compute feature that are both in both time and frequency domains
Some of the joint time-frequency analysis methods are:
Linear, non-parametric methods
Can be obtained by linear filtering
Like Short-time Fourier transform and Wavelet transform
Quadratic, non-parametric methods
Improved time-frequency resolution
Like Wigner-Ville distribution and related techniques
Parametric methods
Assume that the observed signal can be obtained from time-varying statistical models
Like autoregressive model
We will not study joint time-frequency analysis in this course