Abstract
In this work, we study fairness in centroid clustering. In this problem, k cluster centers must be placed given n points in a metric space, and the cost to each point is its distance to the nearest cluster center. Recent work of Chen et al. [Chen et al., 2019] introduces the notion of a proportionally fair clustering, in which no group of at least n/k points can find a new cluster center which provides lower cost to each member of the group. They propose a greedy capture algorithm which provides a 1+√2 approximation of proportional fairness for any metric space, and derive generalization bounds for learning proportionally fair clustering from samples in the case where a cluster center can only be placed at one of finitely many given locations in the metric space.
We focus on the case where cluster centers can be placed anywhere in the (usually infinite) metric space. In case of the L² distance metric over ℝ^t, we show that the approximation ratio of greedy capture improves to 2. We also show that this is due to a special property of the L² distance; for the L¹ and L^∞ distances, the approximation ratio remains 1+√2. We provide universal lower bounds which apply to all algorithms.
We also consider metric spaces defined on graphs. For trees, we show that an exact proportionally fair clustering always exists and provide an efficient algorithm to find one. The corresponding question for general graph remains an interesting open question.
Finally, we show that for the L² distance, checking whether a proportionally fair clustering exists and implementing greedy capture over an infinite metric space are NPhard problems, but (approximately) solvable in special cases. We also derive generalization bounds which show that an approximately proportionally fair clustering for a large number of points can be learned from a small number of samples. Our work advances the understanding of proportional fairness in clustering, and points out many avenues for future work.
BibTeX  Entry
@InProceedings{micha_et_al:LIPIcs:2020:12492,
author = {Evi Micha and Nisarg Shah},
title = {{Proportionally Fair Clustering Revisited}},
booktitle = {47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)},
pages = {85:185:16},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771382},
ISSN = {18688969},
year = {2020},
volume = {168},
editor = {Artur Czumaj and Anuj Dawar and Emanuela Merelli},
publisher = {Schloss DagstuhlLeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2020/12492},
URN = {urn:nbn:de:0030drops124923},
doi = {10.4230/LIPIcs.ICALP.2020.85},
annote = {Keywords: Fairness, Clustering, Facility location}
}
Keywords: 

Fairness, Clustering, Facility location 
Collection: 

47th International Colloquium on Automata, Languages, and Programming (ICALP 2020) 
Issue Date: 

2020 
Date of publication: 

29.06.2020 