Multi-dimensional Data Study Guide

Author: Kartik Kapur

Overview

Overview

Multi-dimensional Data We are now looking at the problem of storing multi-dimensional keys. A common problem that we deal with with Muli-dimensional data is finding the nearest element. It is hard to find an efficient solution using our data structures that we have learned so far such as the Hash Table. In fact, it takes O(N) time to run nearest. This is because in a HashTable, we have no clue where points are so we have to iterate through all the entries.

Uniform Partitioning In order to improve our naive version of nearest in a HashMap, we can attempt to employ Uniform Partitioning. Uniform Partitioning basically means that 1 bucket in our HashMap will correspond to a certain subset of points in our data. This means we have to only iterate through the bucket which has points closest to us. As a result, running Nearest takes time inversely proportional to the number of buckets. That means nearest usually takes O(N/number of buckets time)- which is still O(N), but it is much better in the real world.

QuadTrees As we know, Trees have a sense of ordering in contrast to Hashtables which have no such quality. An idea would be to use a Binary Search Tree, but how would we tell if a two dimension point is “less than” another (would we sort on X value or Y Value). No matter which one we pick, we lose some information.

A natural approach is to make a new type of Tree– the QuadTree. The QuadTree has 4 neighbors, Northwest,Northeast, Southwest, and Southeast. This is called spatial partitioning, and differs from our approach of Uniform Partitioning because we do not have precut areas where we have points. Instead, each time we move down the QuadTree, we narrow down our possible searchable area until we reach a goal point.

K-D Trees One final data structure that we have for dealing with 2 dimensional data would be the K-d Tree. Essentially the idea of a K-D tree is that it’s a normal Binary Search Tree, except we alternate what value we’re looking at when we traverse through the tree. For example at the root everything to the left has an X value less than the root and everything to the right has a X value greater than the root. Then on the next level, every item to the left of some node has a Y value less than that item and everything to the right has a Y value greater than it.