Theoretical Probability Distributions
PhD Özgür Tosun
THEORETICAL PROBABILITY DISTRIBUTIONS
probability is a measure of chance
probability distributions help us to study the probabilities associated with outcomes of the variable under study
probability theory is the foundation for
statistical inference. A probability
distribution is a device for indicating the
values that a random variable may have
THEORETICAL PROBABILITY DISTRIBUTIONS
Several theoretical probability distributions are important in biostatistics:
I) Binomial II) Poisson
III)Normal
Discrete probability
distributions: Variable takes only integer values .
Continuous probability distribution: Variable has values measured on a
continuous scale.
THE BINOMIAL DISTRIBUTION:
• Variable has only binary/dichotomous outcomes
(male – female; diseased – not diseased;
positive – negative) denoted A and B.
• The probability of A is denoted by p.
P(A) = p and
P(B)= 1-p
THE BINOMIAL DISTRIBUTION:
• When an experiment is repeated n times, p remains constant (outcome is independent from one trial to another)
Such a variable is said to follow a BINOMIAL DISTRIBUTION.
Daniel Bernoulli
The question is:
What is the probability that outcome A occurs x times?
Or
What proportion of n outcomes will be A?
The probability of x outcomes in a group of size n, if each outcome has probability p and is independent from all outcomes is given by Binomial Probability Function:
x - n
x ( 1 p)
x p x) n
P(A
Example
For families with 5 children each, what is the probability that
i) There will be one male child?
Among families with 5 children each, 0.16 have one male child.
0.16 )
50 . 0 1
( 1 0.50
1) 5
P(A
1
5 1
ii) There will be at least one male children?
0.97
0.03 -
1
0.5) -
(1 0 0.5
- 5 1
0) P(A
- 1
) 5 . 0 1
( A 0.5
5
5) P(A
4) P(A
3) P(A
2) P(A
1) P(A
1) P(A
5 0
5 1 A
A 5 A
Using the probabilities associated with possible outcomes, we can draw a probability distribution for the event under study:
NO. OF MALE CHILDREN
5,00 4,00
3,00 2,00
1,00 ,00
PROBABILITY
,4
,3
,2
,1
0,0
Example:
Among men with localized prostate tumor and a PSA<10, the 5-year survival is known to be 0.8. We can use Binimial Distribution to calculate the probability that any particular number (A), out of n, will survive 5 years. For example for a new series of 6 such men:
Non will survive 5 years : P(A=0)=0,000064 Only 1 will survive 5 years : P(A=1)=0,0015 2 will survive 5 years : P(A=2)=0,015
3 will survive 5 years : P(A=3)=0,082
4 will sıurvive 5 years : P(A=4)=0,246
5 will survive 5 years : P(A=5)=0,393
All will survive 5 years : P(A=6)=0,262
Binomial Distribution for n=6 and p=0.8
NO. SURVIVING 5 YEARS
6 5
4 3
2 1
0
PROBABILITY
,5
,4
,3
,2
,1
0,0
THE POISSON DISTRIBUTION:
Like the Binomial, Poisson distribution is a discrete distribution applicable when the outcome is the
“number of times an event occurs”.
Instead of the probability of an outcome, if average number of occurrence of the event is given, associated probabilities can be calculated by using the Poisson Distribution Function which is defined as:
Simeon D. Poisson (1781- 1840)
) !
( A
A e X
P
A
lambda
Example.
If the average number of hospitalizations for a group of patients is calculated as 3.22, the probability that a patient in the group has zero hospitalizations is
04 .
! 0 0
e 22 .
) 3 0 A
(
P
0 3.22
The probability that a patient has exactly one hospitalization is
The probability that a patient will be
hospitalized more than 3 times, since the upper limit is unknown, is calculated as
P(A>3)=1-P(A3)
129 .
!1 0 e 22 .
) 3 1 A
(
P 1 3 . 22
EXAMPLE
Normal Distribution
Karl F. Gauss (1777-1855) Abraham de Moivre (1667-
1754)
NORMAL DISTRIBUTION
Normal (Gaussian) distribution is the most famous probability distribution of continuous variables.
The two parameters of the normal distribution are the mean (μ) and the standard deviation (σ).
The graph has a familiar bell-shaped curve.
The function of normal distribution curve is as follows:
x
i2
2 1
2 ) 1
(
x
ie x
f
The normal distribution is completely defined by the mean and standard deviation of a set of quantitative data:
The mean determines the location of the curve on the x axis of a graph
The standard deviation determines the height of the curve on the y axis
There are an infinite number of normal
distributions- one for every possible combination
of a mean and standard deviation
Pr(X) on the y-axis refers to either frequency or probability.
Examples of Normal Distributions
Examples of Normal Distributions
Frequency
Mean
0 5 10 15 20 25
Frequency
55 60 65 70 75 80 85 90
Heart Rate (BPM) //
Many (but not all) continuous variables are approximately normally distributed. Generally, as sample size increases, the shape of a frequency
distribution becomes more normally distributed.
When data are normally distributed, the mode, median, and mean are identical and are located at the center of the
distribution.
M od e, M ed ia n, M ea n
Frequency of
occurrence
Quantitative variables may also have a skewed distribution:
When distributions are skewed, they have
more extreme values in one direction than the other, resulting in a long tail on one side of the distribution.
The direction of the tail determines whether a distribution is positively or negatively skewed.
A positively skewed distribution has a long tail on the right, or positive side of the curve.
A negatively skewed distribution has the tail
on the left, or negative side of the curve.
For a normally distributed variable:
~68.3% of the observations lie between the mean and 1 standard deviation
~95.4% lie between the mean and 2 standard deviations
~99.7% lie between the mean and 3 standard deviations
68.3 %
95.4 % 99.7 %
Mode, Median, Mean
68.26%
95.44%
99.74%
6826 .
0 )
( x P
9544 .
0 )
2 2
( x P
2 2
9974 .
0 )
3 3
( x P
3 3
Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR F 55 M 66 F 70 M 73 F 77 M 79 F 82 M 57 F 67 F 70 M 73 F 77 M 79 M 82 M 59 F 67 M 70 M 73 F 77 M 79 F 83 F 61 F 68 M 70 M 73 M 77 F 80 M 83 M 61 F 68 F 71 F 74 M 77 F 80 M 83 M 62 F 68 F 71 F 74 F 78 M 80 F 84 M 62 M 68 M 71 F 74 F 78 F 81 F 84 F 63 F 69 M 71 M 74 F 78 F 81 M 85 F 64 M 69 F 72 F 75 F 78 F 81 F 86 M 64 M 69 M 72 F 75 M 78 M 81 F 86 M 64 M 69 F 73 M 75 M 78 F 82 M 89 M 66 F 70 M 73 M 76 M 79 F 82 M 89
0 5 10 15 20 25
Frequency
55 60 65 70 75 80 85 90
Heart Rate (BPM) //
For the heart rate data for 84 adults:
Mean HR = 74.0 bpm SD = 7.5 bpm
Mean 1SD = 74.0 7.5
= 66.5-81.5 bpm
Mean 2SD = 74.0 15.0
= 59.0-89.0 bpm
Mean 3SD = 74.0 22.5
= 51.5-96.5 bpm
HR Data:
• 57/84 (67.9%) subjects are between mean ± 1SD
• 82/84 (97.6%) are between mean ± 2SD
• 84/84 (100%) are between mean ± 3SD
45 50 55 60 65 70 75 80 85 90 95 100
0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 Subject number
Heart rate (bpm)
Mean
+3 SD +2 SD + 1SD
-1 SD -2 SD -3 SD
The “normal” range in medical
measurements is the central 95% of the values for a reference population, and is usually determined from large samples representative of the population.
The central 95% is approximately the mean 2 sd*
Some examples of established reference ranges are:
Serum “Normal” range
fasting glucose 70-110 mg/dL
sodium 135-146 mEq/L
triglycerides 35-160 mg/dL
Note: The value is actually 1.96 sd but for convenience this
is usually rounded to 2 sd.
The Standard Normal Distribution
A normal distribution with a mean of 0, and standard deviation of 1
The distribution is also called the z distribution
Any normal distribution can be converted to the standard normal distribution using the z
transformation.
Each value in a distribution is converted to the number of standard deviations the value is
from the mean.
The transformed value is called a z score.
x z
Once the data are transformed to z-scores, the standard normal distribution can be used to determine areas under the curve for any normal
distribution.
Formula for the z
transformation
Example of a z- transformation
If the population mean heart rate is 74 bpm, and the standard deviation is 7.5, the z score for an
individual with HR = 80 bpm is:
8 . 6 0
80
z x
The individual’s HR of 80 bpm is
0.8 standard deviations above the mean.
The z-value can be looked up in a table for the standard normal distribution to
determine the lower and upper areas
defined by a z-score of 0.8 (the areas are
the lower 78.8% and upper 21.2%)
Using Table
(…) .0082 is
the area under N(0,1) left of z =
-2.40
.0080 is the area under N(0,1) left of
z = -2.41
0.0069 is the area under N(0,1) left of
z = -2.46
Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N ( , ) into the standard Normal curve N (0,1).
The standard Normal distribution
For each x we calculate a new value, z (called a z-score).
N(0,1)
=>
z
x
N(64.5, 2.5)
Standardized height (no units)
The total area under the normal distribution curve is 1:
90% of the area is between ± 1.645 sd 95% of the area is between ± 1.960 sd 99% of the area is between ± 2.575 sd
0 -1.645
+1.960 +2.575 +1.645
-1.960 -2.575
Area = 99%
Area = 95%
Area = 90%
The Normal Distribution &
Confidence Intervals
90% of the area is between ± 1.645 sd
95% of the area is between ± 1.960 sd
99% of the area is between ± 2.575 sd
These are the most commonly used areas for defining
Confidence Intervals
which are used in inferential statistics to estimate population values from sample data
If a certain interval is a 95% confidence interval, then we can say
that if we repeated the procedure of drawing random samples and
computing confidence intervals over and over again, 95% of those
confidence intervals include the true value from the population.
Birth weight (x
i) Z
i=
3200 -0.167
3450 0.25
2980 -0.533
4100 1.333
2900 -0.667
3500 0.333
: :
3400 0.167
=3300 ; =600 =0 ; =1.0
600
3300
x
iIf it is known that the birth weights of
infants are normally distributed with a mean of 3300gr and a standard deviation of 600gr, what is the probability that a randomly
selected infant will weigh less than 3000gr?
More than 3000gr? Ans: 0.19+0.50=0.69
z i x i
3000 3300
600 0.5
P(X i 3000) P(Z i 0.5) 0.31
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Area between 0
and z
If the mean and the standard deviation of the BMI of adult women are 24 and 6 units respectively, what proportion of women will have BMI>30 (what proportion of women will be clssified as obese)?
16% of the adult women will be classified as obese.
z i x i
30 24 6 1.0
P(X i 30) P(Z i 1.0) 0.16
) 1 (
4 )
22 ( 18
) 18
(
P z P z
x P
1587 ,
0
3413 ,
0 5
, 0
Example
• In a normal curve with mean = 30, s = 5, what is the proportion of scores below 27?
27
-4 -3 -2 -1 0 1 2 3 4
Smaller portion of a Z of 0.6 is 0.2743
Mean to Z equals 0.2257 and 0.5 - 0.2257 = 0.2743
Portion 27%
27
27 30 5 0.6
Z
Example
• In a normal curve with mean = 30, s = 5, what is the proportion of scores fall between 26 and 35?
26
-4 -3 -2 -1 0 1 2 3 4
Mean to a Z of 0.8 is 0.2881 Mean to a Z of 1 is 0.3413 0.2881 + 0.3413 = 0.6294 Portion = 62.94% or 63%
.3413 .2881
35
35 30 5 1
Z
26
26 30 5 0.8
Z
Example
• The Stanford-Binet IQ test has a mean of 100 and a SD of 15, how many people (out of 1000 ) have IQs between 120 and 140?
120
-4 -3 -2 -1 0 1 2 3 4
Mean to a Z of 2.66 is 0.4961
Mean to a Z of 1.33 is 0.4082 0.4961 - 0.4082 = 0.0879 Portion = 8.79% or 9%
0.0879 * 1000 = 87.9 or 88 people
140 .4082
.4961
120
120 100 15 1.33
Z
140