Challenge Lab 7: Binary Search Tree Performance

Introduction
Setup - Optimum Average Depth for BST Nodes
Experiment 1 - BST Generation Using Insertion
Experiment 2 and 3 - BST Generation Using Insertion and Deletion
- Experiment 2
- Experiment 3
Submission
Open Research Problem

Introduction

In this lab, you’ll conduct some experiments on binary search trees, including some strange results that have been observed for binary search tree deletion.

This lab has the flavor of computer science research. We give only a high level goal and leave it to you to figure out how to realize it.

Setup - Optimum Average Depth for BST Nodes

The average depth of a BST was defined in lecture.

The “Internal Path Length” of a BST is defined as the average depth times the number of nodes. Or equivalently, it is the sum of the lengths of the paths to every node.

Fill in ExperimentHelper.optimalIPL(N) to return the optimal internal path length of a BST. Also fill in optimalAverageDepth(N), which should return the average depth of a BST of size N.

Experiment 1 - BST Generation Using Insertion

We’ve provided for you an implementation of a binary search tree in BST.java. It has the following methods:

* int size() - returns the number of items in the BST.
* boolean contains(Key key) - returns whether the key is in the BST.
* add(Key key) - adds the key to the BST if it doesn't exist.
* deleteTakingSuccessor(Key key) - deletes key, replacing by successor if necessary
* deleteTakingRandom(Key key) - deletes key, replacing by either predecessor or successor
* Key getRandomKey() - returns a random item

Fill in the experiment1() method of Experiments.java so that it runs a computational experiment where you insert 5000 random items into a BST. Make a plot of the average depth of your BST (defined in Task 1) vs. the number of items. See ChartDemo for an example of how to create charts using the org.knowm.xchart package. You’ll need to add a method to BST that computes the average depth of the tree.

On the same axes, also plot the average depth of an optimal BST vs. the number of items.

As a sanity check, for an optimal BST, the average depth should be near 10. For a randomly generated BST, it should be near 14. This is an experimental proof of our claim in lecture that random trees are bushy.

Experiment 2 and 3 - BST Generation Using Insertion and Deletion

In 1975, Gary Knott conducted an empirical study of binary search trees for his Ph.D. thesis. In pseudocode, his experiment was roughly follows:

Initialize a tree by randomly inserting N items. Record the average depth observed as the ‘starting depth’.
Randomly delete an item using asymmetric Hibbard deletion.
Randomly insert a new item.
Record the average depth of the tree.
Repeat steps 2-4 a total of M times.
Plot the data.

Here, by asymmetric Hibbard deletion, we mean the deletion process described in class, where deleted nodes with two children are replaced with their successor.

Based on his experiments, Knott conjectured that the random deletion/insertion process improved the height of the tree.

A later experiment by Epplinger disproved this conjecture and showed that random insertion/deletions using Hibbard delete actually make things worse. This study also showed that if you modify Hibbard deletion to be symmetrical by randomly switching between picking the predecessor and the successor, then Knott’s original conjecture (that the depth of the tree gets better) was actually true.

Experiment 2

Fill in experiment2() to repeat Knott’s experiment using asymmetric deletion (i.e. always picking successor). We have provided this deletion method as deleteTakingSuccessor. Generate a plot with the number of operations on the x-axis, and the average depth on the y-axis. Do not include the initialization operations in step 1 in your plot.

You’ll need to add methods to ExperimentHelper that randomly insert and delete items from a tree.

You should see that for some number of operations, the average depth actually drops as we randomly insert and delete. However, as the insertion/deletion cycle continues, you should see the depth climb well above the starting depth.

Experiment 3

Fill in experiment3() so that it repeats experiment 2, but using symmetric deletion (i.e. randomly picking between successor and predecessor). We have provided this deletion method as deleteTakingRandom. This will be very easy once you’ve completed experiment 2.

You should see that the average depth drops and stays down. It should converge to 88% of the starting depth. Nobody knows why this happens. It’s OK if your results are off by a little bit.

Submission

Submit to this Gradescope assignment a PDF file with the plot from above showing the average depth of your binary search tree as a function of the number of randomized insertion/deletion operations. Also append the code you used for simulation to the end of your report. Your submission will be manually graded later.

Open Research Problem

Derive a proof for why average depth is 88% after a sequence of insertions and symmetric deletions. This remains an open problem in computer science, and previous work suggests that the proof is going to be very challenging to find.