Outline: microarray data analysis
Gene expression Microarrays
normalization scatter plots Inferential statistics
Exploratory (descriptive) statistics distances
principal components analysis (PCA)
Microarray data are highly dimensional: there are
many thousands of measurements made from a small number of samples.
Descriptive (exploratory) statistics help you to find meaningful patterns in the data.
A first step is to arrange the data in a matrix.
Next, use a distance metric to define the relatedness of the different data points. Two commonly used
distance metrics are:
-- Euclidean distance
-- Pearson coefficient of correlation
What is a cluster?
A cluster is a group that has homogeneity (internal
cohesion) and separation (external isolation). The
relationships between objects being studied are
assessed by similarity or dissimilarity measures.
Clustering is one of the most important unsupervised learning processes that organizing objects into groups whose members are similar in some way.
Clustering finds structures in a collection of unlabeled data.
A cluster is a collection of objects which are similar between them and are dissimilar to the objects
belonging to other clusters.
• Microarray data quality checking
– Does replicates cluster together?
– Does similar conditions, time points, tissue
types cluster together?
• Cluster genes Prediction of functions of
unknown genes by known ones
Functional significant gene clusters
• Cluster genes Prediction of functions of unknown genes by known ones
• Cluster samples Discover clinical
characteristics (e.g. survival, marker
status) shared by samples.
Bhattacharjee et al. (2001) Human lung carcinomas mRNA expression
profiling reveals distinct adenocarcinoma
Proc. Natl. Acad. Sci.
USA, Vol. 98, 13790- 13795.