Figure 46: Range tree search.
What is the query time for a range tree? Recall that it takes O(logn)
time to locate the nodes representing the canonical subsets for the 1
dimensional range query. For each, we invoke a 1 dimensional range
search. Thus there O(logn) canonical sets, each invoking an O(log n)
range search, for a total time of O((log n)^2). As before, listing the
elements of these sets can be performed in additional k time by just
traversing the trees. Counting queries can be answered by precomputing
the subtree sizes, and then just adding them up.
The space used by the data structure is O(n log n) in the plane (and
O(n log (d-1) n) in dimension d). The reason comes by summing the
sizes of the two data structures. The tree for the x coordinates
requires only O(n) storage. But we claim that the total storage in all
the auxiliary trees is O(n log n). We want to count the total sizes of
all these trees.
The number of nodes in a tree is proportional to the number of leaves,
and hence the number of points stored in this tree. Rather than count
the number of points in each tree separately, instead let us count the
number of trees in which each point appears. This will give the same
total. Observe that a point appears in the auxiliary trees of each of
its ancestors. Since the tree is balanced, each point has O(log n)
ancestors, and hence each point appears in O(logn) auxiliary trees.
It follows that the total size of all the auxiliary trees is O(n log n).
By the way, observe that because the auxiliary trees are just 1
dimensional trees, we could just store them as a sorted array.
We claim that it is possible to construct a 2 dimensional range tree
in O(n log n) time. Constructing the 1 dimensional range tree for the
x coordinates is easy to do in O(n log n) time. However, we need to
be careful in constructing the auxiliary trees, because if we were to
sort each list of y coordinates separately, the running time would be
O(n (logn)^2). Instead, the trick is to construct the auxiliary trees
in a bottom up manner. The leaves, which contain a single point are
trivially sorted. Then we simply merge the two sorted lists for each
child to form the sorted list for the parent. Since sorted lists can
be merged in linear time, the set of all auxiliary trees can be
constructed in time that is linear in their total since, or O(n log
n). Once the lists have been sorted, then building a tree from the
sorted list can be done in linear time.
Summarizing, here is the basic idea to this (and many other query
problems based on leveled searches). Let (S, R(1)) denote the range
space, consisting of points S and range sets R(1). Construct a data
structure, which represents S by a collection of canonical sets,
{S(1), S(2), ... S(m)}. For each canonical subset, construct a data
structure for answering the second type of range query (and so
on). The main property of the canonical subsets is that, for any range
query, we can efficiently determine a small number of canonical sets
whose disjoint union is equal to the answer to the query (in our case
this was O(logn) subsets). Furthermore, this collection of canonical
subsets can be determined efficiently (in our case in O(logn)
time).
To answer a range query, we solve the first range query, resulting in
a collection of canonical subsets whose union is the answer to this
query. We then invoke the second range query problem on each of these
subsets, and so on. Finally we take the union of all the answers to
all these queries.
Fractional Cascading: Can we improve on the O((log n)^2) query time?
We would like to reduce the query time to O(log n). As we descend the
search the x interval tree, for each node we visit, we need to search
the corresponding y interval tree. It is this combination that leads
to the squaring of the logarithms. If we could search each y interval
in O(1) time, then we could eliminate this second log factor. The
trick to doing this is used in a number of places in computational
geometry, and is generally a nice idea to remember.
We are repeatedly searching different lists, but always with the same
key. The idea is to merge all the different lists into a single
massive list, do one search in this list in O(logn) time, and then use
the information about the location of the key to answer all the
remaining queries in O(1) time each. This is a simplification of a
more general search technique called fractional cascading.
In our case, the massive list on which we will do one search is the
entire set of points, sorted by y coordinate. In particular, rather
than store these points in a balanced binary tree, let us assume that
they are just stored as sorted arrays. (The idea works for either
trees or arrays, but the arrays are a little easier to visualize.)
Call these the auxiliary lists. We will do one (expensive) search on
the auxiliary list for the root, which takes O(logn) time. However,
after this, we claim that we can keep track of the position of the y
range in each auxiliary list in constant time as we descend the tree
of x coordinates.
Here is how we do this. Let v be an arbitrary internal node in the
range tree of x coordinates, and let v(L) and v(R) be its left and
right children. Let A v be the sorted auxiliary list for v and let
A(L) and A(R) be the sorted auxiliary lists for its respective
children. Observe that A(v)is the disjoint union of AL and AR
(assuming no duplicate y coordinates). For each element in A(v) , we
store two pointers, one to the item of equal or larger value in A(L) and
the other to the item of equal or larger value in A(R). (If there is no
larger item, the pointer is null.) Observe that once we know the
position of an item in A(v), then we can determine its position in
either A(L) or A(R) in O(1) additional time.
Here is a quick illustration of the general idea. Let v denote a node
of the x tree, and let v(L)and v(R) denote its left and right
children. Suppose that (in bottom to top order) the associated nodes
within this range are: , and suppose that in v(L)we store the points
and in v(R) we store
. This is shown below. For each point in
the auxiliary list for v, we store a pointer to the lists v(L) and
v(R), to the position this element would be inserted in the other list
(assuming sorted by y values). That is, we store a pointer to the
largest element whose y value is less than or equal to this point.
At the root of the tree, we need to perform a binary search against
all the y values to determine which points lie within this interval,
for all subsequent levels, once we know where the y interval falls
with respect to the order points here, we can drop down to the next
level in O(1) time.
Figure 47: Cascaded search in range trees.
Thus (as with fractional cascading) the running time is O(2 log n),
rather than O( (log n)^2). It turns out that this trick can only be
applied to the last level of the search structure, because all other
levels need the full tree search to compute canonical sets.
Theorem: Given a set of n points in R d , orthogonal rectangular range
queries can be answered in O(log (d-1) n + k) time, from a data
structure of size O(n log (d-1) n) which can be constructed in
O(n log (d-1) n) time.