License: Creative Commons Attribution 4.0 International license (CC BY 4.0)
When quoting this document, please refer to the following
DOI: 10.4230/LIPIcs.ICALP.2021.11
URN: urn:nbn:de:0030-drops-140803
Go to the corresponding LIPIcs Volume Portal

Agarwal, Pankaj K. ; Hu, Xiao ; Sintos, Stavros ; Yang, Jun

Dynamic Enumeration of Similarity Joins

LIPIcs-ICALP-2021-11.pdf (0.9 MB)


This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of n points A,B in ℝ^d, a metric ϕ(⋅), and a distance threshold r > 0, report all pairs of points (a, b) ∈ A × B with ϕ(a,b) ≤ r. Our goal is to store A,B into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from A or B.
We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for 𝓁₁, 𝓁_∞ metrics with log^{O(1)} n update time and delay. We show that such a data structure is not feasible for the 𝓁₂ metric for d ≥ 4. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for 𝓁_p metric, with log^{O(1)} n delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH).

Keywords: dynamic enumeration, similarity joins, worst-case delay guarantee
Collection: 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Issue Date: 2021
Date of publication: 02.07.2021

