Abstract
Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic knearest neighbor (kNN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast kNN classifiers. Our main aim is to obtain query times that are independent of k while using nearlinear space.
We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the knearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n^{1/2} log n) for points in ℝ¹, and with query times varying between O(n^{2/3}log^{2/3} n) and O(n^{5/6} polylog n), depending on the distance measure used, for points in ℝ². These results can be extended to work in higher dimensions as well. Since the query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1ε)f^* times, where f^* is the frequency of the most frequent color, we obtain a query time of O(log n + log log_{1/(1ε)} n) in ℝ¹ and expected query times ranging between Õ(n^{1/2}ε^{3/2}) and Õ(n^{1/2}ε^{5/2}) in ℝ² using nearlinear space (ignoring polylogarithmic factors).
BibTeX  Entry
@InProceedings{vanderhorst_et_al:LIPIcs.ESA.2022.67,
author = {van der Horst, Thijs and L\"{o}ffler, Maarten and Staals, Frank},
title = {{Chromatic kNearest Neighbor Queries}},
booktitle = {30th Annual European Symposium on Algorithms (ESA 2022)},
pages = {67:167:14},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772471},
ISSN = {18688969},
year = {2022},
volume = {244},
editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/17005},
URN = {urn:nbn:de:0030drops170055},
doi = {10.4230/LIPIcs.ESA.2022.67},
annote = {Keywords: data structure, nearest neighbor, classification}
}
Keywords: 

data structure, nearest neighbor, classification 
Collection: 

30th Annual European Symposium on Algorithms (ESA 2022) 
Issue Date: 

2022 
Date of publication: 

01.09.2022 