License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.WABI.2018.20
URN: urn:nbn:de:0030-drops-93227
Go to the corresponding LIPIcs Volume Portal

Mansouri, Mehrdad ; Booth, Julian ; Vityaz, Margaryta ; Chauve, Cedric ; Chindelevitch, Leonid

PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats

LIPIcs-WABI-2018-20.pdf (0.5 MB)


Variable-Number Tandem Repeats (VNTR) are genomic regions where a short sequence of DNA is repeated with no space in between repeats. While a fixed set of VNTRs is typically identified for a given species, the copy number at each VNTR varies between individuals within a species. Although VNTRs are found in both prokaryotic and eukaryotic genomes, the methodology called multi-locus VNTR analysis (MLVA) is widely used to distinguish different strains of bacteria, as well as cluster strains that might be epidemiologically related and investigate evolutionary rates.
We propose PRINCE (Processing Reads to Infer the Number of Copies via Estimation), an algorithm that is able to accurately estimate the copy number of a VNTR given the sequence of a single repeat unit and a set of short reads from a whole-genome sequence (WGS) experiment. This is a challenging problem, especially in the cases when the repeat region is longer than the expected read length. Our proposed method computes a statistical approximation of the local coverage inside the repeat region. This approximation is then mapped to the copy number using a linear function whose parameters are fitted to simulated data. We test PRINCE on the genomes of three datasets of Mycobacterium tuberculosis strains and show that it is more than twice as accurate as a previous method.
An implementation of PRINCE in the Python language is freely available at

BibTeX - Entry

  author =	{Mehrdad Mansouri and Julian Booth and Margaryta Vityaz and Cedric Chauve and Leonid Chindelevitch},
  title =	{{PRINCE: Accurate Approximation of the Copy Number of Tandem Repeats}},
  booktitle =	{18th International Workshop on Algorithms in  Bioinformatics (WABI 2018)},
  pages =	{20:1--20:13},
  series =	{Leibniz International Proceedings in Informatics (LIPIcs)},
  ISBN =	{978-3-95977-082-8},
  ISSN =	{1868-8969},
  year =	{2018},
  volume =	{113},
  editor =	{Laxmi Parida and Esko Ukkonen},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-93227},
  doi =		{10.4230/LIPIcs.WABI.2018.20},
  annote =	{Keywords: Variable-Number Tandem Repeats, Copy number, Bacterial genomics}

Keywords: Variable-Number Tandem Repeats, Copy number, Bacterial genomics
Collection: 18th International Workshop on Algorithms in Bioinformatics (WABI 2018)
Issue Date: 2018
Date of publication: 02.08.2018

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI