Data processing effects on the interpretation of microarray gene expression experiments
Motivation: Microarray gene expression data is collected at an increasing pace and numerous methods and tools exist for analyzing this kind of data. The aim of this study is to evaluate the effect of the basic statistical processing steps of microarray data on the final outcome for gene expression analysis; these effects are most problematic for one-channel cDNA measurements, but also affect other types of microarrays, especially when dealing with grouped samples. It is crucial to determine an appropriate combination of individual processing steps for a given dataset in order to improve the validity and reliability of expression data analysis. Results: We analyzed a large gene expression data set obtained from a one-channel cDNA microarray experiment conducted on 83 human samples that have been classified into four Osteoarthritis related groups. We compared different normalization methods regarding the effect on the identification of differentially expressed genes. Furthermore, we compared different methods for combining spot p-values into gene p-values, and propose Stouffer's method for this purpose. We developed several quality and robustness measures which allow to estimate the amount of errors made in the statistical data preparation. Conclusion: The apparently straight forward steps of gene expression data analysis, i.e. normalization and identification of differentially expressed genes, can be accomplished by numerous different methods. We analyzed multiple combinations of a number of methods to demonstrate the possible effects and therefore the importance of the single decisions taken during data processing. An overview of these effects is essential for the biological interpretation of gene expression measurements. We give guidelines and tools for evaluating methods for normalization, spot combination and detection of differentially regulated genes.
Full Text: PDF