Abstract
Given an npoint metric space ({𝒳},d) where each point belongs to one of m = O(1) different categories or groups and a set of integers k₁, …, k_m, the fair MaxMin diversification problem is to select k_i points belonging to category i ∈ [m], such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to downsample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results:
1) We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor 2approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a 6approximation that is guaranteed to satisfy the fairness constraints up to a factor 1ε for any constant ε. We also present a linear time algorithm returning an m+1 approximation with exact fairness. The best previous result was a 3m1 approximation.
2) We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. {For constant dimensions, categories and any constant ε > 0, we present a 1+ε approximation algorithm that runs in O(nk) + 2^{O(k)} time where k = k₁+…+k_m.} We can improve the running time to O(nk)+poly(k) at the expense of only picking (1ε) k_i points from category i ∈ [m].
Finally, we present algorithms suitable to processing massive data sets including singlepass data stream algorithms and composable coresets for the distributed processing.
BibTeX  Entry
@InProceedings{addanki_et_al:LIPIcs.ICDT.2022.7,
author = {Addanki, Raghavendra and McGregor, Andrew and Meliou, Alexandra and Moumoulidou, Zafeiria},
title = {{Improved Approximation and Scalability for Fair MaxMin Diversification}},
booktitle = {25th International Conference on Database Theory (ICDT 2022)},
pages = {7:17:21},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959772235},
ISSN = {18688969},
year = {2022},
volume = {220},
editor = {Olteanu, Dan and Vortmeier, Nils},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2022/15881},
URN = {urn:nbn:de:0030drops158812},
doi = {10.4230/LIPIcs.ICDT.2022.7},
annote = {Keywords: algorithmic fairness, diversity maximization, data selection, approximation algorithms}
}
Keywords: 

algorithmic fairness, diversity maximization, data selection, approximation algorithms 
Collection: 

25th International Conference on Database Theory (ICDT 2022) 
Issue Date: 

2022 
Date of publication: 

19.03.2022 
Supplementary Material: 

Audiovisual (Video of the Presentation): https://doi.org/10.5446/57487 