Abstract
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NPhard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem  and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.
Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.
We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100fold.
BibTeX  Entry
@InProceedings{dhrkop_et_al:LIPIcs:2018:9325,
author = {Kai D{\"u}hrkop and Marie A. Lataretu and W. Timothy J. White and Sebastian B{\"o}cker},
title = {{Heuristic Algorithms for the Maximum Colorful Subtree Problem}},
booktitle = {18th International Workshop on Algorithms in Bioinformatics (WABI 2018)},
pages = {23:123:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770828},
ISSN = {18688969},
year = {2018},
volume = {113},
editor = {Laxmi Parida and Esko Ukkonen},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9325},
URN = {urn:nbn:de:0030drops93256},
doi = {10.4230/LIPIcs.WABI.2018.23},
annote = {Keywords: Fragmentation trees, Computational mass spectrometry}
}
Keywords: 

Fragmentation trees, Computational mass spectrometry 
Collection: 

18th International Workshop on Algorithms in Bioinformatics (WABI 2018) 
Issue Date: 

2018 
Date of publication: 

02.08.2018 