Abstract
Clustering is a fundamental problem in unsupervised learning. In many realworld applications, the tobeclustered data often contains various types of noises and thus needs to be removed from the learning process. To address this issue, we consider in this paper two variants of such clustering problems, called kmedian with m outliers and kmeans with m outliers. Existing techniques for both problems either incur relatively large approximation ratios or can only efficiently deal with a small number of outliers. In this paper, we present improved solution to each of them for the case where k is a fixed number and m could be quite large. Particularly, we gave the first PTAS for the kmedian problem with outliers in Euclidean space R^d for possibly high m and d. Our algorithm runs in O(nd((1/epsilon)(k+m))^(k/epsilon)^O(1)) time, which considerably improves the previous result (with running time O(nd(m+k)^O(m+k) + (1/epsilon)k log n)^O(1))) given by [Feldman and Schulman, SODA 2012]. For the kmeans with outliers problem, we introduce a (6+epsilon)approximation algorithm for general metric space with running time O(n(beta (1/epsilon)(k+m))^k) for some constant beta>1. Our algorithm first uses the kmeans++ technique to sample O((1/epsilon)(k+m)) points from input and then select the k centers from them. Compared to the more involving existing techniques, our algorithms are much simpler, i.e., using only random sampling, and achieving better performance ratios.
BibTeX  Entry
@InProceedings{feng_et_al:LIPIcs:2019:11557,
author = {Qilong Feng and Zhen Zhang and Ziyun Huang and Jinhui Xu and Jianxin Wang},
title = {{Improved Algorithms for Clustering with Outliers}},
booktitle = {30th International Symposium on Algorithms and Computation (ISAAC 2019)},
pages = {61:161:12},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771306},
ISSN = {18688969},
year = {2019},
volume = {149},
editor = {Pinyan Lu and Guochuan Zhang},
publisher = {Schloss DagstuhlLeibnizZentrum fuer Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2019/11557},
URN = {urn:nbn:de:0030drops115573},
doi = {10.4230/LIPIcs.ISAAC.2019.61},
annote = {Keywords: Clustering with Outliers, Approximation, Random Sampling}
}
Keywords: 

Clustering with Outliers, Approximation, Random Sampling 
Collection: 

30th International Symposium on Algorithms and Computation (ISAAC 2019) 
Issue Date: 

2019 
Date of publication: 

28.11.2019 