Lecture Notes in Informatics

Home > Vol P-154 (2008)

Bioinformatics strategies in life sciences: from data processing and data warehousing to biological knowledge extraction

Herbert Thiele , Jörg Glandorf and Peter Hufnagel

Abstract

With the large variety of Proteomics workflows, as well as the large variety of instruments and data-analysis software available, researchers today face major challenges validating and comparing their Proteomics data. Here we present a new generation of the ProteinScapeTM bioinformatics platform, now enabling researchers to manage Proteomics data from the generation and data warehousing to a central data repository with a strong focus on the improved accuracy, reproducibility and comparability demanded by many researchers in the field. It addresses scientists` current needs in proteomics identification, quantification and validation. But producing large protein lists is not the end point in Proteomics, where one ultimately aims to answer specific questions about the biological condition or disease model of the analyzed sample. In this context, a new tool has been developed termed PIKE (Protein information and Knowledge Extractor) that allows researchers to control, filter and access specific information from genomics and proteomic databases, to understand the role and relationships of the proteins identified in the experiments. Additionally, an EU funded project, ProDac, has coordinated systematic data collection in public standards-compliant repositories like PRIDE. This will cover all aspects from generating MS data in the laboratory, assembling the whole annotation information and storing it together with identifications in a standardised format. Workflows The extreme complexity of the Proteome calls for different multistep approaches for separation and analysis on protein and on peptide level. These are usually combinations of 1D or 2D gel electrophoresis and one- to multidimensional LC techniques in combination with different MS and MS/MS techniques. A database driven solution is the most effective way to manage these data, to compare experiments, and to extract and gain knowledge based on experiments already done in the past. Nowadays, recent improvements in MS instrumentation and nano-LC reproducibility make a label-free MS based quantification approach feasible. The high throughput compatibility of a label-free approach allows processing large numbers of samples, which is required to obtain statistically valid quantifications from typical biological sample heterogeneity. Handling these workflows from data processing to statistical validation and quantification results is a big challenge. Any kind of software solution for data warehousing and analysis should address these different workflows in a flexible manner. The bioinformatics platform ProteinScapeTM (Bruker Daltonics) supports these various discovery workflows in Proteomics through a flexible analyte hierarchy concept (Fig.1). 2

Full Text: PDF

Lecture Notes in Informatics

INFORMATIK 2009 - Im Focus das Leben P-154, 669-679 (2008).

Gesellschaft für Informatik, Bonn
2008

Editors

Contents

Bioinformatics strategies in life sciences: from data processing and data warehousing to biological knowledge extraction

Abstract

Lecture Notes in Informatics

INFORMATIK 2009 - Im Focus das Leben P-154, 669-679 (2008).

Gesellschaft für Informatik, Bonn 2008

Editors

Contents

Bioinformatics strategies in life sciences: from data processing and data warehousing to biological knowledge extraction

Abstract

Gesellschaft für Informatik, Bonn
2008