Lecture 17: More Orthogonal Range Searching

We saw that kd trees could be used to answer orthogonal range queries in the plane in O(sqrt(n) + k) time.

Now we consider a better data structure, called orthogonal range trees. An orthogonal range tree is a data structure which, in all dimensions d >= 2, uses O(n (log n)(d-1)) space, and can answer orthogonal rectangular range queries in O((log n)(d-1)+k) time, where k is the number of points reported. Preprocessing time is the same as the space bound. Thus, in the plane, we can answer range queries in time O(log n) and space O(n log n). We will present the data structure in two parts, the first is a version that can answer queries in O((log n)2) time in the plane, and then we will show how to improve this in order to strip off a factor of log n from the query time.

The data structure is based on the concept of a multi level search tree. In this method, a complex search is decomposed into a constant number of simpler range searches. We cascade a number of search structures for simple ranges together to answer the complex range query. In this case we will reduce a d dimensional range search to a series of 1 dimensional range searches.

Suppose you have a query which can be stated as the intersection of a small number of simpler queries. For example, a range query in the plane can be stated as two range queries: Find all the points whose x coordinates are in the range [x(lo), x(hi)] and all the points whose y coordinates are in the range [y(lo), y(hi)]. Let us consider how to do this for 2 dimensional range queries, and then consider how to generalize the process. First, build a range tree for the first range query, which in this case is just a 1 dimensional range tree for the x range. Recall that this is just a balanced binary tree on these points sorted by x coordinates. Also recall that each node of this binary tree is implicitly associated with a canonical subset of points. These are the points lying in the subtree rooted at this node. The answer to any 1 dimensional range query can be represented as the disjoint union of a small collection of m = O(log n) canonical subsets, {S(1), S(2), ... S(m)}, where each subset corresponds to a node in the search. This constitutes the first level of the search tree. For the second level, for each node v in this x range tree, we build an auxiliary tree, each of which is a y coordinate range tree, which contains all the points in the canonical subset associated with v.

Thus the data structure consists of a x-range tree, such that each node points to auxiliary y range tree. This notion of a tree of trees is basic to solving range queries by leveling. (For example, for d dimensional range trees, we will have d levels of trees.)

To answer a query, we determine the canonical sets that satisfy the first query (there will be O(logn) of them). Each of these sets is just represented as a node in the range tree. We know that that these sets are disjoint, and that every point in these sets lies within the range of x coordinates. Thus to answer the query, we just need to find out which points from each canonical subset lies within the range of y coordinates. To do this, for each canonical subset, we access the auxiliary tree for this node, and perform a 1 dimensional range search on the y range. This process is illustrated in the following figure.


Range tree search.

What is the query time for a range tree? Recall that it takes O(logn) time to locate the nodes representing the canonical subsets for the 1 dimensional range query. For each, we invoke a 1 dimensional range search. Thus there O(logn) canonical sets, each invoking an O(log n) range search, for a total time of O((log n)^2). As before, listing the elements of these sets can be performed in additional k time by just traversing the trees. Counting queries can be answered by precomputing the subtree sizes, and then just adding them up.

The space used by the data structure is O(n log n) in the plane (and O(n log (d-1)n) in dimension d). The reason comes by summing the sizes of the two data structures. The tree for the x coordinates requires only O(n) storage. But we claim that the total storage in all the auxiliary trees is O(n log n). We want to count the total sizes of all these trees.

The number of nodes in a tree is proportional to the number of leaves, and hence the number of points stored in this tree. Rather than count the number of points in each tree separately, instead let us count the number of trees in which each point appears. This will give the same total. Observe that a point appears in the auxiliary trees of each of its ancestors. Since the tree is balanced, each point has O(log n) ancestors, and hence each point appears in O(logn) auxiliary trees.

It follows that the total size of all the auxiliary trees is O(n log n). By the way, observe that because the auxiliary trees are just 1 dimensional trees, we could just store them as a sorted array.

We claim that it is possible to construct a 2 dimensional range tree in O(n log n) time. Constructing the 1 dimensional range tree for the x coordinates is easy to do in O(n log n) time. However, we need to be careful in constructing the auxiliary trees, because if we were to sort each list of y coordinates separately, the running time would be O(n (logn)^2). Instead, the trick is to construct the auxiliary trees in a bottom up manner. The leaves, which contain a single point are trivially sorted. Then we simply merge the two sorted lists for each child to form the sorted list for the parent. Since sorted lists can be merged in linear time, the set of all auxiliary trees can be constructed in time that is linear in their total since, or O(n log n). Once the lists have been sorted, then building a tree from the sorted list can be done in linear time.

Summarizing, here is the basic idea to this (and many other query problems based on leveled searches). Let (S, R(1)) denote the range space, consisting of points S and range sets R(1). Construct a data structure, which represents S by a collection of canonical sets, {S(1), S(2), ... S(m)}. For each canonical subset, construct a data structure for answering the second type of range query (and so on). The main property of the canonical subsets is that, for any range query, we can efficiently determine a small number of canonical sets whose disjoint union is equal to the answer to the query (in our case this was O(logn) subsets). Furthermore, this collection of canonical subsets can be determined efficiently (in our case in O(logn) time).

To answer a range query, we solve the first range query, resulting in a collection of canonical subsets whose union is the answer to this query. We then invoke the second range query problem on each of these subsets, and so on. Finally we take the union of all the answers to all these queries.

Fractional Cascading: Can we improve on the O((log n)^2) query time? We would like to reduce the query time to O(log n). As we descend the search the x interval tree, for each node we visit, we need to search the corresponding y interval tree. It is this combination that leads to the squaring of the logarithms. If we could search each y interval in O(1) time, then we could eliminate this second log factor. The trick to doing this is used in a number of places in computational geometry, and is generally a nice idea to remember.

We are repeatedly searching different lists, but always with the same key. The idea is to merge all the different lists into a single massive list, do one search in this list in O(logn) time, and then use the information about the location of the key to answer all the remaining queries in O(1) time each. This is a simplification of a more general search technique called fractional cascading.

In our case, the massive list on which we will do one search is the entire set of points, sorted by y coordinate. In particular, rather than store these points in a balanced binary tree, let us assume that they are just stored as sorted arrays. (The idea works for either trees or arrays, but the arrays are a little easier to visualize.) Call these the auxiliary lists. We will do one (expensive) search on the auxiliary list for the root, which takes O(logn) time. However, after this, we claim that we can keep track of the position of the y range in each auxiliary list in constant time as we descend the tree of x coordinates.

Here is how we do this. Let v be an arbitrary internal node in the range tree of x coordinates, and let v(L) and v(R) be its left and right children. Let A v be the sorted auxiliary list for v and let A(L) and A(R) be the sorted auxiliary lists for its respective children. Observe that A(v)is the disjoint union of AL and AR (assuming no duplicate y coordinates). For each element in A(v) , we store two pointers, one to the item of equal or larger value in A(L) and the other to the item of equal or larger value in A(R). (If there is no larger item, the pointer is null.) Observe that once we know the position of an item in A(v), then we can determine its position in either A(L) or A(R) in O(1) additional time.

Here is a quick illustration of the general idea. Let v denote a node of the x tree, and let v(L)and v(R) denote its left and right children. Suppose that (in bottom to top order) the associated nodes within this range are: , and suppose that in v(L)we store the points and in v(R) we store . This is shown below. For each point in the auxiliary list for v, we store a pointer to the lists v(L) and v(R), to the position this element would be inserted in the other list (assuming sorted by y values). That is, we store a pointer to the largest element whose y value is less than or equal to this point.

At the root of the tree, we need to perform a binary search against all the y values to determine which points lie within this interval, for all subsequent levels, once we know where the y interval falls with respect to the order points here, we can drop down to the next level in O(1) time.


Figure 47: Cascaded search in range trees.

Thus (as with fractional cascading) the running time is O(2 log n), rather than O( (log n)^2). It turns out that this trick can only be applied to the last level of the search structure, because all other levels need the full tree search to compute canonical sets.

Theorem: Given a set of n points in R d , orthogonal rectangular range queries can be answered in O(log (d-1) n + k) time, from a data structure of size O(n log (d-1) n) which can be constructed in O(n log (d-1) n) time.


This course is modeled on the Computational Geometry Course taught by Dave Mount, at the University of Maryland. These notes are modifications of his Lecture Notes, which are copyrighted as follows: Copyright, David M. Mount, 2000, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These lecture notes were prepared by David Mount for the course CMSC 754, Computational Geometry, at the University of Maryland, College Park. Permission to use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright notice appear in all copies.