Abstract
Consider an ordinal tree T on n nodes, each of which is assigned a category from an alphabet [σ] = {1,2,…,σ}. We preprocess the tree T in order to support {categorical path counting queries}, which ask for the number of distinct categories occurring on the path in T between two query nodes x and y. For this problem, we propose a linearspace data structure with query time O(√n lg((lg σ)/(lg w))), where w = Ω(lg n) is the word size in the wordRAM. As shown in our proof, from the assumption that matrix multiplication cannot be solved in time faster than cubic (with only combinatorial methods), our result is optimal, save for polylogarithmic speedups. For a tradeoff parameter 1 ≤ t ≤ n, we propose an O(n+ n²/t²)word, O(t lg ((lg σ)/(lg w))) query time data structure. We also consider capproximate categorical path counting queries, which must return an approximation to the number of distinct categories occurring on the query path, by counting each such category at least once and at most c times. We describe a linearspace data structure that supports 2approximate categorical path counting queries in O((lg n)/(lg lg n)) time.
Next, we generalize the categorical path counting queries to weighted trees. Here, a query specifies two nodes x,y and an orthogonal range Q. The answer to thus formed categorical path range counting query is the number of distinct categories occurring on the path from x to y, if only the nodes with weights falling inside Q are considered. We propose an O(n lg lg n +(n/t)⁴)word data structure with O(t lg lg n) query time, or an O(n+(n/t)⁴)word} data structure with O(t lg^ε n) query time. For an appropriate choice of the tradeoff parameter t, this implies a linearspace data structure with O(n^{3/4} lg^ε n) query time. We then extend the approach to the trees weighted with vectors from [n]^{d}, where d is a constant integer greater than or equal to 2. We present a data structure with O(n lg^{d1+ε} n + (n/t)^{2d+2}) words of space and O(t (lg^{d1} n)/((lg lg n)^{d2})) query time. For an O(n⋅polylog n)space solution, one thus has O(n^{{2d+1}/{2d+2}}⋅polylog n) query time.
The inherent difficulty revealed by the lower bound we proved motivated us to consider data structures based on {sketching}. In unweighted trees, we propose a sketching data structure to solve the approximate categorical path counting problem which asks for a (1±ε)approximation (i.e. within 1±ε of the true answer) of the number of distinct categories on the given path, with probability 1δ, where 0 < ε,δ < 1 are constants. The data structure occupies O(n+n/t lg n) words of space, for the query time of O(t lg n). For trees weighted with ddimensional weight vectors (d ≥ 1), we propose a data structure with O((n + n/t lg n) lg^d n) words of space and O(t lg^{d+1} n) query time.
All these problems generalize the corresponding categorical range counting problems in Euclidean space ℝ^{d+1}, for respective d, by replacing one of the dimensions with a tree topology.
BibTeX  Entry
@InProceedings{he_et_al:LIPIcs.CPM.2021.15,
author = {He, Meng and Kazi, Serikzhan},
title = {{Data Structures for Categorical Path Counting Queries}},
booktitle = {32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021)},
pages = {15:115:17},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
ISBN = {9783959771863},
ISSN = {18688969},
year = {2021},
volume = {191},
editor = {Gawrychowski, Pawe{\l} and Starikovskaya, Tatiana},
publisher = {Schloss Dagstuhl  LeibnizZentrum f{\"u}r Informatik},
address = {Dagstuhl, Germany},
URL = {https://drops.dagstuhl.de/opus/volltexte/2021/13966},
URN = {urn:nbn:de:0030drops139662},
doi = {10.4230/LIPIcs.CPM.2021.15},
annote = {Keywords: data structures, weighted trees, path queries, categorical queries, coloured queries, categorical path counting, categorical path range counting}
}
Keywords: 

data structures, weighted trees, path queries, categorical queries, coloured queries, categorical path counting, categorical path range counting 
Collection: 

32nd Annual Symposium on Combinatorial Pattern Matching (CPM 2021) 
Issue Date: 

2021 
Date of publication: 

30.06.2021 