Exercises from PFDS Section 2.2

Posted on Dec 4, 2012

Section 2.2’s exercises define some optimizations to the section’s unbalanced tree set implementation.

Exercise 2.2

The implementation of member that Okasaki gives for binary search trees in section 2.2 performs 2d comparisons for a tree of depth d in the worst case, when searching for a number that is the farthest to the right on the tree, and when the right path in the tree is of depth d itself. This is because every call of member checks whether x < y and, if not, whether y < x. This strategy allows immediate detection of the case in which the value being searched for is contained in the current node, permitting short-circuit cases like where the searched-for value is in the root node, at the cost of requiring double the comparisons when traversing rightward.

Exercise 2.2 proposes a strategy in which the second comparison is delayed until the bottom of the tree. This removes the 2d worst-case number of comparisons, at the cost of raising the minimum number of comparisons to the number of nodes in the shortest path from the root to a leaf. The implementation is relatively straightforward:

We introduce a new parameter to member, c: Option[T], which is None when there is not yet a candidate for equality, and Some(d) when d is a candidate for equality. Then, when we reach a leaf node, we check to see whether our candidate is equal to the value we’re searching for.

Exercise 2.3

Exercise 2.3 proposes an optimization to insert. The implementation given by Okasaki performs wasteful path copying when the value being inserted is already in the tree. If the value being inserted is present in a particular node, the only thing that’s done is to short-circuit the operation and return the node as it exists, which doesn’t reverse any of the path copying performed while traversing the tree down to the node.

Throwing an exception in the case where the element being inserted already exists avoids the wasteful path copying. Okasaki requires that the function establish only one exception handler per insert, not per function call, so we catch the exception in the helper insert method of the UnbalancedTreeSet class.

Exercise 2.4

Exercise 2.4 is to integrate the improvements to member from exercise 2.2 into the improved version of insert from 2.3. This is straightforward: we follow the pattern established by 2.2, threading an equality candidate through the recursive calls, and checking for equality only at the end. Now insert performs no unnecessary path copying if inserting a pre-existing element, and performs at most d + 1 comparisons along the way.

Exercise 2.5

Exercise 2.5 is about generating balanced binary trees that share as much as possible. The exercise assumes that all of the values in the trees are the same, which doesn’t make much sense for the set abstraction, but ensures that all subtrees of the same size are identical.

For the first part, we implement a function complete that creates a complete tree of d levels. As a binary tree, it should have 2^d - 1 values inside. The implementation is a straightforward recursive function. The base case, a binary tree of 0 levels, is just a leaf node. The recursive case creates an internal node whose children are both the result of recursing with one less level.

For the second part, we implement a function balanced that creates a tree with d values that is as balanced as possible: every internal node’s subtrees differ in size by at most 1.

I put both of these methods directly on BinaryTree’s companion object. I think this is idiomatic, but it’s not really clear to me: there are so many different ways to do things in Scala.

I wrote a few tests for the tree generation functions in a normal assertion-based style before remembering the existence of ScalaCheck and its obvious applicability for this kind of thing. I’ve been itching to use a QuickCheck-style testing framework for a while. If you haven’t seen it before, you should definitely check it out. It allows you to write tests in a declarative style, without specifying individual test cases. The framework will generate random test cases for you, either by using built-in random generators for simple cases like arbitrary strings or integers, or by using a user-defined random object generator.

In this case, since I wanted my test cases to vary the number of levels or nodes in a balanced tree, I was able to just use the built-in random integer generator. Writing tests in this way is concise and fun. Here’s what they ended up looking like:

Exercise 2.6

This exercise asks the reader to modify UnbalancedTreeSet so that it represents a map instead of a set. This is very straightforward, if tedious: we must add another data member to BinaryTreeNode that represents the value of an entry in the map, change member to lookup and have it return the value data member instead of true, and throw an exception instead of false, and finally change insert to bind, give it an extra value argument, and set the BinaryTreeNode’s value field to that argument. I didn’t implement this.

As usual, the new code’s on GitHub.