Lecture 10: Point Location in Trapezoidal Maps



Point Location: Last time we presented a randomized incremental algorithm for constructing a trapezoidal map in the plane. Today we consider how to modify this algorithm to answer point location queries for the resulting trapezoidal decomposition. The preprocessing time will be O(n log n) in the expected case (as was the time to construct the trapezoidal map), and the space and query time will be O(n) and O(logn), respectively, in the expected case. Note that this may be applied to any spatial subdivision, by treating it as a set of line segments, and then building the resulting trapezoidal decomposition and using this data structure.

Recall from last time that we are treating the input as a set of segments S = {s1, ... sn} (permuted randomly), that Si denotes the subset consisting of the first i segments of S, and Ti denotes the trapezoidal map of Si. One important element of the analysis to remember from last time is that each time we add a new line segment, it may result in the creation of the collection of new trapezoids, which were said to depend on this line segment. We presented a backwards analysis that the number of new trapezoids that are created with each stage is expected to be O(1). This will play an important role in today's analysis.

Point Location Data Structure: As in Kirkpatrick's algorithm, the point location data structure will be based on a rooted directed acyclic graph. In particular, to the query processor it will look like a binary tree, but there may be sharing of subtrees. There are two types of nodes, x nodes and y nodes. Each x node contains the x coordinate of an endpoint of one of the segments. Its two children correspond to the points lying to the left and to the right of this coordinate. Each y node contains a pointer to a segment. The left and right children correspond to whether the query point is above or below the line containing the segment, respectively. (We will visit a y node only if we already know that the x coordinate of the query point lies between the left and right endpoints of the segment.)

Our construction of the point location data structure mirrors the incremental construction of the trapezoidal map. In particular, if we freeze the construction just after the insertion of any segment, the existing structure will be a point location structure for the existing trapezoidal map. In the figure below we show a simple example of what the data structure looks like for two line segments. There is one leaf for each trapezoid. The y nodes are shown as hexagons. For example, if the query point is in trapezoid D, we would first detect that it is to the right of p1 , then left of q1 , then below s1 (the right child), then right of p2, then above s2 (the left child).


Figure 54: Trapezoidal map point location data structure.

The question is how do we build this data structure incrementally? First observe that when a new line segment is added, we only need to adjust the portion of the tree that involves the trapezoids that have been deleted as a result of this new addition. This will be set of leaves in the current tree. Each such leaf will be replaced with a new subtree, which determines which one of the new trapezoids contains the query point. In Kirkpatrick's algorithm we just said ``check each of the new triangles that overlapped one of the old triangles'', and we will do essentially the same thing here. As in Kirkpatrick's algorithm, it will turn out that each old trapezoid is overlapped by a constant number of new trapezoids. However, let us consider the process in a little more detail here.

Suppose that we add a line segment. This results in the replacement of an existing set of trapezoids with a set of new trapezoids (shaded in the figure below). If the segment passes entirely through an existing trapezoid, then there will be two overlapping trapezoids in the new trapezoidal map, and thus we just need to compare against the newly added segment (one y node). If the existing trapezoid contains one or both endpoints, then we need to test on which sides of these endpoints we lie (one or two x nodes) and if we lie between them, then we need to test whether we lie above or below the line segment (one y node). If we add a line segment to the example above, resulting in the replacement of C, D, E and G with new trapezoids, we will replace these leaves of the subtree as shown in the figure below. It is important to notice that (through sharing) each trapezoid appears exactly once as a leaf in the resulting structure.


Figure 55: Line segment insertion.

Analysis: We claim that the size of the point location data structure is O(n) and the query time is O(log n), both in the expected case. As usual, the expectation depends only on the order of insertion, not on the line segments or the location of the query point. To prove the space bound of O(n), observe that the number of new nodes added to the structure with each new segment is proportional to the number of newly created trapezoids. Last time we showed that with each new insertion, the expected number of trapezoids that were created was O(1). Therefore, we add O(1) new nodes with each insertion in the expected case, implying that the total size of the data structure is O(n).

Analyzing the query time is a little subtler. In a normal probabilistic analysis of data structures, we think of the data structure as being fixed, and then compute expectations over random queries. Here the approach will be to imagine that we have exactly one query point to handle. The query point can be chosen arbitrarily (imagine an adversary that tries to select the worst possible query point) but this choice is made without knowledge of the random choices the algorithm makes. We will show that for any query point, most random orderings of the line segments will lead to a search path of length O(log n) in the resulting tree.

Let q denote the query point. Rather than consider the search path for q in the final search structure, we will consider how q moves incrementally through the structure with the addition of each new line segment. Let ti denote the trapezoid of the map that q lies in after the insertion of the first i segments. Observe that if ti-1 = ti, then insertion of the ith segment did not affect the trapezoid that q was in, and therefore q will stay where it is relative to the current search structure. (For example, if q was in trapezoid B prior to adding s3 in the figure above, then the addition of s3 does not incur any additional cost to locating q.) However, if ti-1 != ti, then the insertion of the ith segment caused q's trapezoid to be deleted.

As a result, q must locate itself with respect to the newly created trapezoids that overlap ti-1. Since there are a constant number of such trapezoids (at most four), there will be O(1) work needed to locate q with respect to these. In particular, q may fall as much as three levels in the search tree.

(For example, if q was in trapezoid C, before the addition of s3 in the figure above, and q was in trapezoid J afterwards, then q would have to pass through two new levels of the structure as a result of this insertion.)

To compute the expected length of the search path, it suffices to compute the probability that the trapezoid that contains q changes as a result of the ith insertion. Let Pi denote this probability (where the probability is taken over random insertion orders, irrespective of the choice of q). Since q could fall through up to three levels in the search tree as a result of each the insertion, the expected length of q's search path in the final structure is at most

Sum(i = 1..n) [3 * Pi]


We will show that Pi <= 4/i. From this it will follow that the expected path length is at most
Sum(i = 1..n) [3 * Pi] = 
Sum(i = 1..n) [3 * 4 / i] = 
12 * Sum(i = 1..n) [1 / i] = 

which is roughly 12 ln n = O(logn) by the Harmonic series.

To show that Pi <= 4/i, we apply a backwards analysis. In particular, consider the trapezoid that contains q after the ith insertion. Recall from last time that this trapezoid is dependent on at most four segments, which define the top and bottom edges, and the left and right sides of the trapezoid. Since each segment is equally likely to be the last segment to have been added, the probability that the last insertion caused q to belong to a new trapezoid is at most 4/i. This completes the proof.

Guarantees on Search Time: The only problem with this result is that even though the search time is provably small in the expected case for a given query point, it might still be the case that once the data structure has been constructed there is a single very long path in the search structure, and the user repeatedly performs queries along this path. The result provides no guarantees on the running time of all queries.

Although we will not prove it, one can derive a a stronger result, namely that the length of the maximum search path is also O(logn) with high probability. In particular, one can prove the following.

Lemma: Given a set of n non crossing line segments in the plane, and a parameter L >= 0, the probability that the total depth of the randomized search structure exceeds 3L * ln(n + 1), is at most
            2
-------------------------
[ (n+1)^(L * ln 1.25-3)] 

For example, for L = 20, the probability that the search path exceeds 60 ln(n + 1) is at most 2/[(n + 1)^1.5]. (The constant factors here are rather weak, but a more careful analysis leads to a better bound.)

Nonetheless, this itself is enough to lead to variant of the algorithm for which O(log n) time is guaranteed. Rather than just running the algorithm once and taking what it gives, instead keep running it and checking the structure's depth. As soon as the depth is at most c log n for some suitably chosen c, then stop here. Depending on c and n, the above lemma indicates how long you may need to expect to repeat this process until the final structure has the desired depth. For sufficiently large c, the probability of finding a tree of the desired depth will be bounded away from 0 by some constant factor, and therefore after a constant number of trials (depending on this probability) you will eventually succeed in finding a point location structure of the desired depth. A similar argument can be applied to the space bounds.

Theorem: Given a set of n non crossing line segments in the plane, in expected O(n log n) time, it is possible to construct a point location data structure of (worst case) size O(n) that can answer point location queries in (worst case) time O(log n).

Next class we are going to talk about data structure for other queries about points in a plane. In particular, we are going to consider counting queries where one is given, ahead of time a collection of points (instead of a planar subdivision), and the goal is to create a data structure which can answer queries such as "how many of the given points are within the following rectangle?".
This course is modeled on the Computational Geometry Course taught by Dave Mount, at the University of Maryland. These notes are modifications of his Lecture Notes, which are copyrighted as follows: Copyright, David M. Mount, 2000, Dept. of Computer Science, University of Maryland, College Park, MD, 20742. These lecture notes were prepared by David Mount for the course CMSC 754, Computational Geometry, at the University of Maryland, College Park. Permission to use, copy, modify, and distribute these notes for educational purposes and without fee is hereby granted, provided that this copyright notice appear in all copies.