- Proj2A - Extrinsic MinPQ
- Project 2A FAQ
- Proj2A Submission & Grading
- Proj2B - K-d Tree
- Proj2B Submission & Grading
Over the next 6 weeks, you are going to building a system very similar to Google Maps.
This system will have three major parts:
- Project 2AB (due 3/23): Building 2 useful data structures (ExtrinsicPQ and KdTree).
- HW4 (due 4/10): Building an AI for solving problems (AStarSolver).
- Project 2C (due 4/17): Combining all the pieces into a web-browser mapping application called BearMaps.
For this part of the project (proj2ab), you will have an intermediate deadline of 3/16 to complete part 2a, which is the ExtrinsicPQ. The entire project is due 3/23. All parts of this project should be completed individually. Please refer to the course policies for our official policy on academic dishonesty.
This part of the project will be as much about testing as it is about implementation. The autograder will be very sparse, and it will be up to you to test your own code! We will begin running the full autograder on 4/2, so absolutely NO extensions will be given beyond that point.
In this document, we’ll only discuss proj2ab. More details about the mapping system (hw4 and proj2c) will come later.
Proj2A - Extrinsic MinPQ
A quick video overview can be found at this link.
For this part of the project, you will build an implementation of the ExtrinsicMinPQ
interface. Ultimately, this will be useful for implementing the AStarSolver
class that will be described in the HW4 spec.
If you’re not familiar with how heaps and priority queues work, please see Lecture 20 before proceeding. It is absolutely imperative that you do this before starting this assignment.
The ExtrinsicMinPQ
interface quite similar to the MinPQ
interface that we discussed in lecture 20. The operations are described below:
public boolean contains(T item)
: Returns true if the PQ contains the given item.public void add(T item, double priority)
: Adds an item of typeT
with the given priority. Edit (3/13/19): If the item already exists, throw anIllegalArgumentException
. You may assume that item is never null.public T getSmallest()
: Returns the item with smallest priority. If no items exist, throw aNoSuchElementException
.public T removeSmallest()
: Removes and returns the item with smallest priority. If no items exist, throw aNoSuchElementException
.public int size()
: Returns the number of items.public void changePriority(T item, double priority)
: Sets the priority of the given item to the given value. If the item does not exist, throw aNoSuchElementException
.
There are four key differences between this abstract data structure and the MinPQ ADT described in lecture 20.
- The priority is extrinsic to the object. That is, rather than relying on some sort of comparison function to decide which item is less than another, we simply assign a priority value using the
add
orchangePriority
functions. - There is an additional
changePriority
function that allows us to set the extrinsic priority of an item after it has been added. - There is a
contains
method that returns true if the given item exists in the PQ. - There may only be one copy of a given item in the priority queue at any time. Edit (3/13/19): To be more precise, if you try to add an item that is equal to another according to .equals, the PQ should throw an exception.
- If there are 2 items with the same priority, you may break ties arbitrarily.
Provided Files
We have provided a NaiveMinPQ
, which is a slow but correct implementation of this interface. contains
, getSmallest
and removeSmallest
use built in ArrayList
or Collections
methods to do a brute force search over the entire ArrayList. This takes time proportional to the length of the list, or O(n)
. Edit (3/13/19): This implementation does not throw the correct exception for the add
method. This is because this exception would make this class too slow to be usable for comparing runtimes with our implementation.
Edit (3/13/19): We have also provided a class called PrintHeapDemo.java
that can print out an array as a nice heap drawing. You’re welcome to adapt this code into your own. It might be useful for debugging.
Edit (3/13/19): Yet another example file TimingTestDemo.java
shows how you can use the System.currentTimeMillis
method or the edu.princeton.cs.algs4.Stopwatch
class to track the runtime of code.
ArrayHeapMinPQ
Your job for this part of the project is to create the ArrayHeapMinPQ
class, which must implement the ExtrinsicMinPQ
interface.
Your ArrayHeapMinPQ
is subject to the following rules:
- One of your instance variables must be a min heap, and this min heap must be represented as either an array or a
java.util.ArrayList
, as described in lecture 20 (tree representation 3 or 3b). It is OK to have additional private variables. - You may only import from the
java.util
andorg.junit
libraries. You’re not required to import anything. Edit (3/16/19): You may also import theStopwatch
andStdRandom
classes from the Princeton libraries. - Edit (3/13/19): Your
getSmallest
,contains
,size
andchangePriority
methods must run inO(log(n))
time. Youradd
andremoveSmallest
must run inO(log(n))
average time, i.e. they should be logarithmic, except for the rare resize operation. For reference, when making 1000 queries on a heap of size 1,000,000, our solution is about300x
faster than the naive solution. Iterating over your entire min heap array (or ArrayList) for any of these methods will be linear time and thus too slow! - If using an array representation for your min heap, make sure to increase the array size by a multiplicative factor when resizing, otherwise
add
will be linear time on average, not log time. If using a heap orderedArrayList
, you don’t need to worry about this requirement asArrayLists
automatically resize their underlying array. - If using an array representation for your min heap, it should never be more than (edit: 3/13/19) approximately 3/4s empty. If using a heap ordered
ArrayList
, you don’t need to worry about this requirement asArrayLists
automatically resize their underlying array. - You may not add additional public methods. Private methods or “package private” methods are fine. See the testing section below for a description of what “package private” means.
Note: We have not discussed how you should implement the changePriority
method in lecture. You’ll have to invent this yourself. You may discuss your approach at a high level with other students (e.g. drawing out diagrams), but you should not share or look at each other’s code, nor should you work closely enough that your code may resemble each other’s.
Testing Proj2A
We will not provide any skeleton tests nor autograder messages (beyond a basic sanity check) for this project. You will be responsible for writing your own tests and ensuring the correctness of your code. You may not share tests with any other students - please ensure that all code in your possession, including tests, were written by you.
You should write your tests in a file called ArrayHeapMinPQTest.java
. This file should be part of the BearMaps
package.
If you’re not sure how to start writing tests, some tips follow.
-
We encourage you to primarily write tests that evaluate correctness based on the outputs of methods that provide output (e.g.
getSmallest
andremoveSmallest
). This is in contrast to tests that try to directly test the states of instance variables (see tip #3 below). For example, to testchangePriority
, you might use a sequence ofadd
operations, achangePriority
call, and then finally check the output of aremoveSmallest
call. This is (usually) a better idea than iterating over your private variables to see ifchangePriority
is setting some specific instance variable. -
Write tests for functions in the order that you write them. You might even find it helpful to write the tests first, which can help elucidate the task you’re trying to solve. Since each function may call previously written functions, this helps ensure that you are building a solid foundation.
-
If you want to write tests that require looking inside of your private instance variables, these tests should be part of the
ArrayHeapMinPQ
class itself. For example, if you want to write a test that only calls theadd
method, there’s no way to write a test in the manner suggested in tip #1. As a specific test you might want to write, suppose that you want to verify that your array is [1, 2, 4, 5, 3] after inserting 5, 4, 3, 2, 1. In this case, a hypotheticaltestAdd54321
method would need to be part of theArrayHeapMinPQ
class since theArrayHeapMinPQTest
should not be able to access private variables. It’s debatable whether or not you should even write such tests, since tests of type #1 should ideally catch any errors. Follow your heart. -
One annoying issue in Java is testing of private helper methods. For example, suppose you have a
leftChild
method inArrayHeapMinPQ
that you’d like to test inArrayHeapMinPQTest
. If this method is private, then the test file will be unable to call the method. For this project, if you have helper methods you’d like to test, you can either include the tests inArrayHeapMinPQ
, or you should make those helper methods “package private”. To do this, simply remove the access modifier completely. That is, rather than sayingpublic
orprivate
, you should add no access modifier at all. This will make this method accessible only to other classes in the package. You can think of package private as a level of access control in between public (anybody can use it) and private (only this class can use it). -
Don’t forget edge cases - consider how the heap could look before inserting or removing items, and ensure that your code handles all possible cases (for example, sinking a node when its priority is greater than both of its children).
-
Rather than thinking about all possible edge cases, feel free to perform randomized tests by comparing the outputs of your data structure and the provided
NaiveMinPQ
, similar to what you did in project 1b. You are not required to do randomized tests! If you do decide to use randomized testing, see the note about pseudorandom numbers in the proj2B part of this spec. -
(Edit (3/13/19)) To test the runtime of your code, you might find it useful to use the
System.currentTimeMillis
method or theedu.princeton.cs.algs4.Stopwatch
class. SeeTimingTestDemo.java
for examples. Keep in mind that calling add N times will take O(N log N) total time if each add is O(log N) time. Also keep in mind that if you call contains() N times on a data structure of size N, you’d again expect total runtime to grow as O(N log N). Note: If you attempt to fit a log function using some sort of fancy software, you will probably be disappointed. Timing tests are Likely to be too noisy for such fitting tools to work well.
External Resources
You are welcome to look at source code for existing priority queue implementations. Make sure to cite any sources you use with the @source tag somewhere in your code. As always you should not use other student’s solutions to this project as a source, e.g. roommate’s code, something you found on pastebin, etc.
You might find the MinPQ implementation from the optional textbook to be a helpful reference. However, you should not copy and paste MinPQ.java into your implementation and then try to figure out how to bend it to match our spec. While it might seem like this will save you time and effort, it will will end up being more work than just building everything yourself from scratch. Many students tried doing something similar with AList.java from lecture and ArrayDeque.java from project1a and the results were generally very bad.
Project 2A FAQ
Q: MyHow do I create a Node[] if my Node class stores objects of generic type T? If I try (Node[]) new Object[10], I get a classCastException at runtime, but if I try new Node[10], I get a generic array creation error at compile time. This is one particularly annoying issue with Java generics. Without going into the details, the easy fix is that you can simply say new ArrayHeapMinPQ.Node[10]
and it will work. See this stack overflow post for more details.
Q: When I make a test for changePriority and test NaiveMinPQ against ArrayHeapMinPQ, the runtime for ArrayHeapMinPQ isn’t substantially better. Let’s consider how NaiveMinPQ
stores items. It stores them in a list, left to right. So if I insert items 0-1,000,000, it would have item 0
at the 0th index, 1
at the 1st index, and so on. Now, when it changes the priority of an item, it has to scan this list from left to right looking for this item. If my test were to insert items 0-1,000,000, then change the priority of items 0-1,000, this is actually the best case input for NaiveMinPQ::changePriority
, since these are the closest items it can find. We recommend changing your test to randomly select 1000 items in the heap and change their priority.
Proj2A Submission & Grading
Although the full project isn’t due until 3/23, you must submit Project 2A by 3/16.
Required Files for Proj2A
Your proj2A submission must contain the following files:
- ArrayHeapMinPQ
- ArrayHeapMinPQTest (should contain tests)
You can also include other files that you create, e.g. TestNaiveMinPQ
.
Grading
ArrayHeapMinPQ
We will test your ArrayMinHeapPQ
for correctness and efficiency. Correctness tests will be very similar to the randomized testing described in the Randomized Testing section of the Proj2B spec below, where we will compare your ArrayMinHeapPQ
’s getSmallest
, removeSmallest
, add
, and changePriority
results against our staff solution’s results.
Efficiency tests will be similar to the correctness tests and will still require correctness, but they will also see how your solution performs with a very large number of items (millions) and many queries to changePriority
(thousands). Your changePriority
runtime will be compared to our staff changePriority
runtime, and we will assign points based on that. If you implement changePriority
correctly (O(log(n))
) and do not have repeated, redundant, or unnecessary function calls, your runtime should be fine for these tests.
ArrayHeapMinPQTest
You must submit your tests. Unlike proj1b, your tests will not be tested with an autograder.
Point Distribution
- ~50%:
ArrayHeapMinPQ
correctness. - ~50%:
ArrayHeapMinPQ
correctness and efficiency (that is, 50% for being correct AND efficient).
Proj2B - K-d Tree
For this part of the project, your primary goal will be to implement the KDTree
class. This class implements a k-d tree as described in lecture 22.
If you’re not familiar with how k-d trees work, please see Lecture 22 before proceeding. It is absolutely imperative that you do this before starting this assignment.
This will ultimately be useful in your BearMaps application. Specifically, a user will click on a point on the map, and your k-d tree will be used to find the nearest intersection to that click.
Given the tight timeline for this project before spring break, I’ve created a pseudo-walkthrough video where I give specific recommendations for steps that you might want to take to complete the project.
This walkthrough also includes links to a series of solutions videos, where in each video, I solve one part of the project. These videos cover some, but not all of the project. Even if you solve the project entirely on your own, you still might find these solutions videos useful, so that you can compare your problem solving strategy to mine.
Slides for this pseudo-walkthrough (including links to solution videos) can be found at this link. You are not required to follow these steps or use this walkthrough!
Provided Files
We provide a class called Point.java
that represents a location with an x and y coordinate. You may not modify this class. The only three methods in this class are:
public double getX()
public double getY()
public double distance(Point o)
: Note that this function returns the squared Euclidean distance.
We also provide a interface called PointSet
that represents a set of such points. It has only one method.
public Point nearest(double x, double y)
: Returns the point in the set nearest to x, y.
NaivePointSet
Before you create your efficient KDTree class, you will first create a naive linear-time solution to solve the problem of finding the closest point to a given coordinate. The goal of creating this class is that you will have a alternative, albeit slower, solution that you can use to easily verify if the result of your k-d tree’s nearest
is correct. Create a class called NaivePointSet
The declaration on your class should be:
public class NaivePointSet implements PointSet {
...
}
Your NaivePointSet
should have the following constructor and method:
public NaivePointSet(List<Point> points)
: You can assumepoints
has at least size 1.public Point nearest(double x, double y)
: Returns the closest point to the inputted coordinates. This should take \(\theta(N)\) time where \(N\) is the number of points.
Note that your NaivePointSet
class should be immutable, meaning that you cannot add or or remove points from it. You do not need to do anything special to guarantee this.
Our staff solution for this class is only 32 lines, and you should not be spending more than 1 to 2 hours to do this.
Example Usage
Point p1 = new Point(1.1, 2.2); // constructs a Point with x = 1.1, y = 2.2
Point p2 = new Point(3.3, 4.4);
Point p3 = new Point(-2.9, 4.2);
NaivePointSet nn = new NaivePointSet(List.of(p1, p2, p3));
Point ret = nn.nearest(3.0, 4.0); // returns p2
ret.getX(); // evaluates to 3.3
ret.getY(); // evaluates to 4.4
Part 2: K-d Tree
Now, create the class KDTree
class. Your KDTree
should have the following constructor and method:
public KDTree(List<Point> points)
: You can assumepoints
has at least size 1.public Point nearest(double x, double y)
: Returns the closest point to the inputted coordinates. This should take \(O(\log N)\) time on average, where \(N\) is the number of points.
As with NaivePointSet
, your KDTree
class should be immutable. Also note that while k-d trees can theoretically work for any number of dimensions, your implementation only has to work for the 2-dimensional case, i.e. when our points have only x and y coordinates.
Edit (3/17/19): For nearest, we recommend that you write a simple version and verify that it works before adding in the various optimizations from the pseudocode. For example, you might start by writing a nearest method that simply traverses your entire tree. Then, you might add code so that it always visits the “good” child before the “bad” child. Then after verifying that this works, you might try writing code that prunes the bad side. By doing things this way, you’re testing smaller ideas at once.
Defining Comparators Using Lambda Functions (Optional)
You may need to use comparators in this project for comparing objects. You can construct a concrete class that implements Comparator
, or if you’d like, you can define a lambda function in Java. The syntax for declaring a lambda for a comparator is:
Comparator<Type> function = (Type arg1, Type arg2) -> (...);
Comparator<Type> functionEquivalent = (arg1, arg2) -> (...);
where (...)
is your desired return value that can use arguments arg1
and arg2
.
Examples:
Comparator<Integer> intComparator = (i, j) -> i - j;
Comparator<Integer> intComparatorEquivalent = (i, j) -> {
int diff = i - j;
return diff;
};
Note: lambdas will not be in scope for this course (i.e. not tested), but we are presenting you this option because it may make your life slightly easier depending on how you approach the project.
Another handy tip: If you want to compare two doubles, you can use the Double.compare(double d1, double d2)
method.
Testing NaivePointSet and KDTree
There are a number of different ways that you can construct tests for this part of the project, but we will go over our recommended approach.
Randomized Testing
Creating test cases that span all of the different edge cases of proj2B will be more difficult than in proj2A due to the complexity of the problem we are solving. To avoid thinking about all possible strange edge cases, we can turn towards techniques other than creating specific, deterministic tests to cover all of the possible errors.
Our suggestion is to use randomized testing which will allow you to test your code on a large sample of points which should encompass most, if not all, of the edge cases you might run into. By using randomized tests you can generate a large number of random points to be in your tree, as well as a large number of points to be query points to the nearest
function. To verify the correctness of the results you should be able to compare the results of your KDTree
’s nearest
function to the results to the NaivePointSet
’s nearest
function. If we test a large number of queries without error we can build confidence in the correctness of our data structure and algorithm.
An issue is that randomized tests are not deterministic. This mean if we run the tests again and again, different points will be generated each time which will make debugging nearly impossible because we do not have any ability to replicate the failure cases. However, randomness in computers is almost never true randomness and is instead generated by pseudorandom number generators (PRNGs). Tests can be made deterministic by seeding the PRNG, where we are essentially initializing the PRNG to start in a specific state such that the random numbers generated will be the same each time. We suggest you use the class Random
which will allow you to generate random coordinate values as well as provide the seed for the PRNG. More can be found about the specifications of this class in the online documentation.
Please put any tests you write in KDTreeTest.java
. You cannot share your tests with other students, but you are free to discuss testing strategies with them.
External Resources
Wikipedia is a pretty good resource that goes through nearest neighbor search in depth. You are free to reference pseudocode from course slides and other websites (use the @source tag to annotate your sources), but all code you write must be your own. We strongly recommend that you use the approach described in 61B, as it is simpler than described elsewhere.
As with the MinPQ, you are welcome to look at source code for existing implementations of a k-d tree, though of course you should not be looking at solutions to this specific 61B project. Make sure to cite any sources you use with the @source tag somewhere in your code. As always you should not use other student’s solutions to this project as a source, e.g. roommate’s code, something you found on pastebin, etc.
Proj2B Submission & Grading
Required Files for Proj2B
You are required to submit the following files:
NaivePointSet.java
: your complete naive solution as described above.KDTree.java
: your complete correct k-d tree implementation as described above.KDTreeTest.java
: any tests you write for your k-d tree implementation. If you do not write any tests, you should still submit this file.
You are also welcome to submit other files.
You will not get results from our full correctness and efficiency tests on your code when you submit on Gradescope.
Grading
NaivePointSet
We will only test your NaivePointSet
for correctness, not efficiency.
K-d Tree
We will test your KDTree
for correctness and efficiency. Correctness tests will be very similar to the randomized testing described in the Testing section, where we will compare your KDTree
’s nearest
results against our staff solution’s results. We will not be grading these based on efficiency, but please ensure that the method call completes in a reasonable about of time.
Efficiency tests will be similar to the correctness tests and will still require correctness, but they will also see how your solution performs with a very large number of points (hundreds of thousands) and many queries to nearest
(tens of thousands). Your KDTree
’s runtime will be compared to our staff KDTree
’s runtime, and we will assign points based on that. If you implement the k-d tree correctly (similar to how you learn in lecture) and do not have repeated, redundant, or unnecessary function calls, you should be fine for these tests. For reference, our KDTree
’s runtime while making 10,000 queries with 100,000 points is about 65-85x faster than our NaivePointSet
’s runtime for the same points and queries. In addition, we will have a test that ensures your constructor is at most 10x slower than our staff KDTree
’s constructor. This should not be a strict test as our constructor is relatively naive and doesn’t do anything fancier than what you learned in class.
K-d Tree Tests
You should also be submitting your tests but we will not be testing your tests with an autograder.
Provided Tests
Since we are not releasing results of your submission on our full autograder, we will provide you with some very basic sanity checks to make sure your code will run with our full autograder once we run it. These will include:
- File, API, compilation, style, and dependency Checks
- Two small correctness checks for your
NaivePointSet
andKDTree
classes’nearest
functions tested with one query on ten points each. - One speed check for
KDTree
. We will run your solution on many points and many queries. If your solution takes 15 times as long as the staff solution or faster, you will pass this test. This test will be the exact same as the lowest tier (easiest) of our speed tests, except that it will not test for correctness.
These tests are by no means a good indicator of how well you will do on our full suite of tests, so make sure you write your own tests as well. Your Gradescope submission may show that you receive full points for this assignment if you pass all these tests, but this score is not your final score.
Point Distribution
The point distribution for this project will be:
- ~5%:
NaivePointSet
correctness. - ~25%:
KDTree
correctness. - ~70%:
KDTree
correctness and efficiency (that is, 70% for being correct AND efficient).