6.2   Geometric Search


This chapter under construction.


Prime application = database search. Applications: databases, geographic information systems (GIS), computer graphics, robotics. Which employees are between x1 and x2 years of age, make between y1 and y2 dollars a year, have been working between z1 and z2 months and have never been arrested? Which celestial bodies are located in a certain region of space with a given magnitude in a given wavelength band?

Possible dataset zips-1990.txt 29,470 zip codes with longitude, latitude, and two other parameters. Astro objects??

Another example: 5 dimensions when searching for diamonds on Blue Nile (carat, cut, color, clarity, cost). Client uses sliders to select ranges on each parameter; server does range search to find matching diamonds.

Set of points given in advance. Range queries come later. Goal: preprocess the data to support efficient queries.

1-D range searching. Goal: answer range searches in time O(log n + k), where k is the number of matching points, and range counts in time O(log n). Output-sensitive. Can achieve by sorting the points and using binary search. This method doesn't generalize to higher dimensions since we can't define a meaningful order for two dimensional points. Also, it does not dynamically support insertions and deletions.

Create a (balanced) binary search tree, where the elements correspond to points in the set. Program RangeSearch.java implements this strategy using a randomized BST. (Should we simply as in 74intersection/RangeSearch.java to assume no value associated with key.) We will use this data structure to implement h-v line segment intersection in Section X.Y.

Should we simplify things by only doing a range count?

public Iterable range(Key k1, Key k2) { 
   LinkedList list = new LinkedList();
   if (less(k2, k1)) return list;
   range(root, k1, k2, list);
   return list;
}

private void range(Node x, Key k1, Key k2, LinkedList list) {
   if (x == null) return;
   if (lte(k1, x.key)) range(x.left, k1, k2, list);
   if (lte(k1, x.key) && lte(x.key, k2)) list.add(x.key);
   if (lte(x.key, k2)) range(x.right, k1, k2, list);
}
Consider using an Interval as an argument instead of two Key endpoints.

Interval search trees. Introduced by McCreight (1981) and Edelsbrunner (1982). An open interval I = (lo, hi) is the set of points lo < x < hi. Desired operations

It uses the helper data type Interval.java. We will use this data structure to implement 2D interval intersection in the next section.

The key to maintaining an interval search tree is to store each interval in a (balanced) binary search tree, sorted by the left endpoint. In addition, we maintain some auxiliary information in each node x, namely the maximum value of any (right) endpoint stored in the subtree rooted at x. Assuming we are able to maintain this information in each node, we can implement insert, delete, and search as follows. If two intervals with identical endpoints are inserted, we only maintain one copy.

Maintaining the auxiliary information. We do this similar to the way we maintain size information in a randomized binary search tree: when we go down the tree to insert interval I, for each node x we visit, we update x.max = max(x.max, I.hi). As the tree shape changes via rotations, we also maintain the required information.

To find all overlaps: if we go down left subtree *and* find an overlap, then go down right subtree too O(k log N)

2-D orthogonal range searching. Find all basketball players with a certain range of steals, assists, and points.

Gridding. Since we don't have coordinates, would need to take order statistics of points in x and y coordinates, and use those as grid points.???? Perhaps save until nearest neighbor. Bottom line: good for uniform data, can be bad for skewed data.

How to extend 1-D range searching to higher dimensions.

kD trees.

Applications: ray-tracing, N-body simulation. KD trees generalize binary search trees (one key) to higher dimensions (k keys). The ith level of the tree uses the i mod kth coordinate to discriminate. For x- and y- coordinates in the plane, 2-D trees partition space into (possibly unbounded) axis-aligned rectangles, where the subdivisions alternate among vertical and horizontal. To guarantee kD tree is balanced, always partition on median element. Best way to do this is to presort by x and by y. O(n) space, O(n log n) construction if choose the median coordinate as the partition at each step.

Orthogonal range query with kD tree. 2D: O(n) space, O(n log n) preprocessing, O(sqrt(n) + k). kD in d dimensions: O(dn) space, O(d n log n) preprocessing, O(n^(1-1/d)) + k). Reference.

Degeneracies: for simplicity we assumed to two points had the same x or y coordinate. Can overcome this limitation by breaking ties by the "other" coordinate. Use + or - infinity for the 2D query interval.

Nearest neighbor with kd tree.

Worst case = Theta(N), e.g., if N points are approximately on circumference of a circle and query point is close to the center of the circle. Average case = hard to analyze, depends on input distribution. Typically O(log N). kd tree reference

Find all points within distance d of a given query point p. Search through kD tree. If any coordinate is more than distance d from p, no need to search subtrees below.

Binary space partition (BSP) trees are a generalization where the cutting planes are not necessarily axis-aligned and the cells are convex polygons in 2D or polytopes in higher dimensions. Castle Wolfenstein uses a 2d-tree to render hallways. As a result they are perpendicular to x and y axes. Doom stepped up to a BSP tree. This enables hallways at arbitrary angles.

Quad trees.

As opposed to quad-tries. Program QuadTree.java implements a quad tree with orthogonal range searching.

Range trees.

Bentley. kD tree is Not asymptotically optimal, although practical in many settings. 2D-range tree O(N log N) storage, O(k + log^2 N) query time. Range tree: maintain points in balanced BST ordered by x-coordinate; in each internal node, maintain a BST of the points in its subtree ordered by y-coordinate. Searching by first dimension gives log N trees, which together contain all the points in the 2D range. Program RangeTree.java implements a 2D range tree. It works assuming all x and y coordinates are distinct.

Possible to extend to dynamic case with O(log^2 N) per insert/delete. Can reduce query time to O(k + log N) with fractional cascading.



Exercises

  1. Date searching. Create a data structure that supports the following operations: insert a date, search for a date, and count the number of dates in the data structure that lie in a particular interval. Hint: use a 1D range tree.
  2. Social Security numbers. First 3 digits of social security number is determined by geographic region. Use intervals of geographic region codes to determine in which state a US citizen was born.
  3. Draw the 2D tree that results when you insert the following points in order.
  4. Balanced 2d-tree. Implement a 2D tree that is perfectly balanced. Assume the constructor is given the N points all at once.
  5. Circle range query. Describe how to modify a kD tree to report all points that lie in a circular range.
  6. 3d-tree. Implement a kD tree for points in 3-space, i.e., k = 3.