Abstract
We study codes that are listdecodable under insertions and deletions ("insdel codes"). Specifically, we consider the setting where, given a codeword x of length n over some finite alphabet Sigma of size q, delta * n codeword symbols may be adversarially deleted and gamma * n symbols may be adversarially inserted to yield a corrupted word w. A code is said to be listdecodable if there is an (efficient) algorithm that, given w, reports a small list of codewords that include the original codeword x. Given delta and gamma we study what is the rate R for which there exists a constant q and list size L such that there exist codes of rate R correcting deltafraction insertions and gammafraction deletions while reporting lists of size at most L.
Using the concept of synchronization strings, introduced by the first two authors [Proc. STOC 2017], we show some surprising results. We show that for every 0 <= delta < 1, every 0 <= gamma < infty and every epsilon > 0 there exist codes of rate 1  delta  epsilon and constant alphabet (so q = O_{delta,gamma,epsilon}(1)) and sublogarithmic list sizes. Furthermore, our codes are accompanied by efficient (polynomial time) decoding algorithms. We stress that the fraction of insertions can be arbitrarily large (more than 100%), and the rate is independent of this parameter. We also prove several tight bounds on the parameters of listdecodable insdel codes. In particular, we show that the alphabet size of insdel codes needs to be exponentially large in epsilon^{1}, where epsilon is the gap to capacity above. Our result even applies to settings where the uniquedecoding capacity equals the listdecoding capacity and when it does so, it shows that the alphabet size needs to be exponentially large in the gap to capacity. This is sharp contrast to the Hamming error model where alphabet size polynomial in epsilon^{1} suffices for unique decoding. This lower bound also shows that the exponential dependence on the alphabet size in previous works that constructed insdel codes is actually necessary!
Our result sheds light on the remarkable asymmetry between the impact of insertions and deletions from the point of view of errorcorrection: Whereas deletions cost in the rate of the code, insertion costs are borne by the adversary and not the code! Our results also highlight the dominance of the model of insertions and deletions over the Hamming model: A Hamming error is equal to one insertion and one deletion (at the same location). Thus the effect of deltafraction Hamming errors can be simulated by deltafraction of deletions and deltafraction of insertions  but insdel codes can deal with much more insertions without loss in rate (though at the price of higher alphabet size).
BibTeX  Entry
@InProceedings{haeupler_et_al:LIPIcs:2018:9080,
author = {Bernhard Haeupler and Amirbehshad Shahrasbi and Madhu Sudan},
title = {{Synchronization Strings: List Decoding for Insertions and Deletions}},
booktitle = {45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)},
pages = {76:176:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959770767},
ISSN = {18688969},
year = {2018},
volume = {107},
editor = {Ioannis Chatzigiannakis and Christos Kaklamanis and D{\'a}niel Marx and Donald Sannella},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {http://drops.dagstuhl.de/opus/volltexte/2018/9080},
URN = {urn:nbn:de:0030drops90807},
doi = {10.4230/LIPIcs.ICALP.2018.76},
annote = {Keywords: List Decoding, Insertions and Deletions, Synchronization Strings}
}
Keywords: 

List Decoding, Insertions and Deletions, Synchronization Strings 
Collection: 

45th International Colloquium on Automata, Languages, and Programming (ICALP 2018) 
Issue Date: 

2018 
Date of publication: 

04.07.2018 