Lecture 15: Point Location in Trapezoidal Maps
Reading: Chapter 6 of the 4M's.
Point Location: Last time we presented a randomized incremental
algorithm for constructing a trapezoidal map in the plane. Today we
consider how to modify this algorithm to answer point location queries
for the resulting trapezoidal decomposition. The preprocessing time
will be O(n log n) in the expected case (as was the time to construct
the trapezoidal map), and the space and query time will be O(n) and
O(logn), respectively, in the expected case. Note that this may be
applied to any spatial subdivision, by treating it as a set of line
segments, and then building the resulting trapezoidal decomposition
and using this data structure.
Recall from last time that we are treating the input as a set of
segments S = {s1, ... sn} (permuted randomly),
that Si denotes the subset consisting of the first i
segments of S, and Ti denotes the trapezoidal map of
Si. One important element of the analysis to remember from
last time is that each time we add a new line segment, it may result
in the creation of the collection of new trapezoids, which were said
to depend on this line segment. We presented a backwards analysis that
the number of new trapezoids that are created with each stage is
expected to be O(1). This will play an important role in today's
analysis.
Point Location Data Structure: As in Kirkpatrick's algorithm, the
point location data structure will be based on a rooted directed
acyclic graph. In particular, to the query processor it will look like
a binary tree, but there may be sharing of subtrees. There are two
types of nodes, x nodes and y nodes. Each x node contains the x
coordinate of an endpoint of one of the segments. Its two children
correspond to the points lying to the left and to the right of this
coordinate. Each y node contains a pointer to a segment. The left and
right children correspond to whether the query point is above or
below the line containing the segment, respectively. (We will visit a y
node only if we already know that the x coordinate of the query point
lies between the left and right endpoints of the segment.)
Our construction of the point location data structure mirrors the
incremental construction of the trapezoidal map. In particular, if we
freeze the construction just after the insertion of any segment, the
existing structure will be a point location structure for the existing
trapezoidal map. In the figure below we show a simple example of what
the data structure looks like for two line segments (from our
textbook). There is one leaf for each trapezoid. The y nodes are shown
as hexagons. For example, if the query point is in trapezoid D, we
would first detect that it is to the right of p1 , then
left of q1 , then below s1 (the right child),
then right of p2, then above s2 (the left
child).
The question is how do we build this data structure incrementally?
First observe that when a new line segment is added, we only need to
adjust the portion of the tree that involves the
Figure 54: Trapezoidal map point location data structure.
trapezoids that have been deleted as a result of this new
addition. This will be set of leaves in the current tree. Each such
leaf will be replaced with a new subtree, which determines which one
of the new trapezoids contains the query point. In Kirkpatrick's
algorithm we just said ``check each of the new triangles that
overlapped one of the old triangles'', and we will do essentially the
same thing here. As in Kirkpatrick's algorithm, it will turn out that
each old trapezoid is overlapped by a constant number of new
trapezoids. However, let us consider the process in a little more
detail here.
Suppose that we add a line segment. This results in the replacement of
an existing set of trapezoids with a set of new trapezoids (shaded in
the figure below). If the segment passes entirely through an existing
trapezoid, then there will be two overlapping trapezoids in the new
trapezoidal map, and thus we just need to compare against the newly
added segment (one y node). If the existing trapezoid contains one or
both endpoints, then we need to test on which sides of these endpoints
we lie (one or two x nodes) and if we lie between them, then we need
to test whether we lie above or below the line segment (one y
node). If we add a line segment to the example above, resulting in the
replacement of C, D, E and G with new trapezoids, we will replace
these leaves of the subtree as shown in the figure below. It is
important to notice that (through sharing) each trapezoid appears
exactly once as a leaf in the resulting structure.
Figure 55: Line segment insertion.
Analysis: We claim that the size of the point location data structure
is O(n) and the query time is O(log n), both in the expected case. As
usual, the expectation depends only on the order of insertion, not on
the line segments or the location of the query point. To prove the
space bound of O(n), observe that the number of new nodes added to the
structure with each new segment is proportional to the number of newly
created trapezoids. Last time we showed that with each new insertion,
the expected number of trapezoids that were created was
O(1). Therefore, we add O(1) new nodes with each insertion in the
expected case, implying that the total size of the data structure is
O(n).
Analyzing the query time is a little subtler. In a normal
probabilistic analysis of data structures, we think of the data
structure as being fixed, and then compute expectations over random
queries. Here the approach will be to imagine that we have exactly one
query point to handle. The query point can be chosen arbitrarily
(imagine an adversary that tries to select the worst possible query
point) but this choice is made without knowledge of the random choices
the algorithm makes. We will show that for any query point, most
random orderings of the line segments will lead to a search path of
length O(log n) in the resulting tree.
Let q denote the query point. Rather than consider the search path for
q in the final search structure, we will consider how q moves
incrementally through the structure with the addition of each new line
segment. Let ti denote the trapezoid of the map that q lies in
after the insertion of the first i segments. Observe that
if ti-1 = ti,
then insertion of the ith segment did not affect the trapezoid that q
was in, and therefore q will stay where it is relative to the current
search structure. (For example, if q was in trapezoid B prior to
adding s 3 in the figure above, then the addition of s 3 does not
incur any additional cost to locating q.) However,
if ti-1 != ti,
then the insertion of the ith segment caused q's trapezoid to be deleted.
As a result, q must locate itself with respect to the newly created
trapezoids that overlap ti-1. Since there are a constant
number of such trapezoids (at most four), there will be O(1) work
needed to locate q with respect to these. In particular, q may fall as
much as three levels in the search tree.
(For example, if q was in trapezoid C, before the addition of
s3 in the figure above, and q was in trapezoid J
afterwards, then q would have to pass through two new levels of the
structure as a result of this insertion.)
To compute the expected length of the search path, it suffices to
compute the probability that the trapezoid that contains q changes as
a result of the ith insertion. Let Pi denote this
probability (where the probability is taken over random insertion
orders, irrespective of the choice of q). Since q could fall through
up to three levels in the search tree as a result of each the
insertion, the expected length of q's search path in the final
structure is at most
Sum(i = 1..n) [3 * Pi]
We will show that Pi <= 4/i. From this it will follow that
the expected path length is at most
Sum(i = 1..n) [3 * Pi] =
Sum(i = 1..n) [3 * 4 / i] =
12 * Sum(i = 1..n) [1 / i] =
which is roughly 12 ln n = O(logn) by the Harmonic series.
To show that Pi <= 4/i, we apply a backwards analysis. In
particular, consider the trapezoid that contains q after the ith
insertion. Recall from last time that this trapezoid is dependent on
at most four segments, which define the top and bottom edges, and the
left and right sides of the trapezoid. Since each segment is equally
likely to be the last segment to have been added, the probability that
the last insertion caused q to belong to a new trapezoid is at most
4/i. This completes the proof.
Guarantees on Search Time:
The only problem with this result is that even though the search time
is provably small in the expected case for a given query point, it
might still be the case that once the data structure has been
constructed there is a single very long path in the search structure,
and the user repeatedly performs queries along this path. The result
provides no guarantees on the running time of all queries.
Although we will not prove it, the book presents a stronger result,
namely that the length of the maximum search path is also O(logn) with
high probability. In particular, they prove the following.
Lemma: Given a set of n non crossing line segments in the plane, and a
parameter L >= 0, the probability that the total depth of the
randomized search structure exceeds 3L * ln(n + 1), is at most
2
-------------------------
[ (n+1)^(L * ln 1.25-3)]
For example, for L = 20, the probability that the search path exceeds
60 ln(n + 1) is at most 2/[(n + 1)^1.5]. (The constant factors here
are rather weak, but a more careful analysis leads to a better bound.)
Nonetheless, this itself is enough to lead to variant of the algorithm
for which O(log n) time is guaranteed. Rather than just running the
algorithm once and taking what it gives, instead keep running it and
checking the structure's depth. As soon as the depth is at most c log
n for some suitably chosen c, then stop here. Depending on c and n,
the above lemma indicates how long you may need to expect to repeat
this process until the final structure has the desired depth. For
sufficiently large c, the probability of finding a tree of the desired
depth will be bounded away from 0 by some constant factor, and
therefore after a constant number of trials (depending on this
probability) you will eventually succeed in finding a point location
structure of the desired depth. A similar argument can be applied to
the space bounds.
Theorem: Given a set of n non crossing line segments in the plane, in
expected O(n log n) time, it is possible to construct a point location
data structure of (worst case) size O(n) that can answer point
location queries in (worst case) time O(log n).
This course is modeled on the Computational Geometry Course taught by
Dave Mount, at the University of Maryland. These notes are
modifications of his Lecture Notes, which are copyrighted as follows:
Copyright, David M. Mount, 2000, Dept. of Computer Science,
University of Maryland, College Park, MD, 20742. These lecture notes
were prepared by David Mount for the course CMSC 754, Computational
Geometry, at the University of Maryland, College Park. Permission to
use, copy, modify, and distribute these notes for educational purposes
and without fee is hereby granted, provided that this copyright notice
appear in all copies.