# Balancing Binary Trees Before Insertion

## Using the "Middle Out" Technique

I’m currently working through A Common-Sense Guide to Data Structures and Algorithms. The examples are in Ruby and Python. But since I’m learning Rust, I implement the examples and exercises as a transfer exercise in that programming language.

Node-based data structures such as linked lists, trees, and graphs are chronically harder to implement in Rust than in most other programming languages due to its ownership concept. Having chosen to work on data structures and algorithms in Rust, I was fully aware of the uphill battle I’ve entered. With some help from the very friendly Rust community, I managed to work through chapter 14 on (doubly) linked lists, implementing a Deque in the process.

Today, I wanted to work through chapter 15 on binary trees. I was familiar with most of the content. However, deleting a node from a tree is not so easy, because there are different cases:

- The node can be a leaf, in which case it just can be dropped.
- The node can have one child, in which case the sole child is elevated to the current node’s position.
- The node can have two children, in which case a proper child has to be found among the successors of both children.

The third case even has a special case—at least it’s special in Rust: If the
node to be deleted is the root node, the `delete()`

method has to consume the
node and to return a new one.

This made me think: When the API has to be designed in a way such that the delete operation returns a fresh tree, why not just build up a fresh tree with the value to be deleted missing? (This of course implies set semantics for the tree: every value is unique. But it could also be done without that restriction.)

It’s Sunday morning, after all, and I’m not feeling like wrestling with the borrow checker today. (Yes, I am a bit of a coward, but I can implement the proper algorithm later.)

But the idea made me think: Is it possible to give the client a benefit in exchange for the performance penalty of building up a fresh tree? When I rebuild the tree from scratch, I could make sure that it comes out balanced afterwards! But how can I pull this off?

But first, let’s introduce the tree data structure.

# Binary Tree in Rust

Let’s begin with a new project:

```
cargo new --lib binary_tree
```

The binary tree implementation should go into `src/binary_tree.rs`

, and
`src/lib.rs`

should make use of that module:

```
pub mod binary_tree;
```

A `Node`

shall contain a value with a generic type and a child on both the left
and right side (`src/binary_tree.rs`

):

```
pub struct Node<T: Clone + Ord> {
value: T,
left: Option<Box<Node<T>>>,
right: Option<Box<Node<T>>>,
}
```

The node’s `value`

type must both implement the `Clone`

and `Ord`

trait, so that
the values can be cloned and sorted. The `left`

and `right`

children are wrapped
in an `Option`

because they could be missing, and in a `Box`

so that the child
nodes are placed on the heap, breaking the endless cycle of nodes containing
nodes.

There won’t be a `Tree`

structure, but the client is expected to use the root
node as a reference to the entire tree. For `Node`

, a couple of operations shall
be implemented. Let’s start with a `new`

associated function (the following
methods are all placed in the same `impl`

block in `src/binary_tree.rs`

):

```
impl<T> Node<T>
where
T: Clone + Ord,
{
pub fn new(value: T) -> Self {
Node {
value,
left: None,
right: None,
}
}
}
```

This allows us to create a new node as follows:

```
let node = Node::new(7);
```

Children shall be inserted at their proper place. For a binary search tree, values smaller than the current node’s value go to the left side, values greater than the current node’s value go to the right side:

```
pub fn insert(&mut self, value: T) {
match value.cmp(&self.value) {
Ordering::Less => match &mut self.left {
Some(ref mut node) => node.insert(value),
None => self.left = Some(Box::new(Self::new(value))),
},
Ordering::Greater => match &mut self.right {
Some(ref mut node) => node.insert(value),
None => self.right = Some(Box::new(Self::new(value))),
},
Ordering::Equal => (),
}
}
```

Make sure to import the `Ordering`

enum for this purpose:

```
use std::cmp::Ordering;
```

If the node already has a child on the left/right side, the value shall be inserted further down the tree in the proper direction. If the node has no child at the respective place, it can be inserted directly as a new node. If the value already exists in the tree, it is simply ignored. (We’re working with set semantics here.)

To get the values out, the following method traverses the tree and returns its
values *in order*, i.e. sorted:

```
pub fn get_values(&self) -> Vec<T> {
let left = match &self.left {
Some(node) => node.get_values(),
None => Vec::new(),
};
let middle = vec![self.value.clone()];
let right = match &self.right {
Some(node) => node.get_values(),
None => Vec::new(),
};
[left, middle, right].concat()
}
```

That’s enough code for some automated tests, to be written in
`src/binary_tree.rs`

, too:

```
#[cfg(test)]
pub mod tests {
use super::*;
#[test]
fn test_get_values() {
let mut root: Node<usize> = Node::new(4);
root.insert(2);
root.insert(1);
root.insert(3);
root.insert(6);
root.insert(5);
root.insert(7);
let expected: Vec<usize> = (1..=7).collect();
let actual = root.get_values();
assert_eq!(actual, expected);
}
}
```

A tree is built up with values from 1 up to 7 (inclusive). The insertion order was hand-crafted carefully so that we end up with a balanced tree:

This yields the values in an ascending order.

# Deleting a Node, Rebuilding the Tree

Deleting a node by its value is very simple if a new tree is built up in the
process. Let’s re-use the existing methods to implement `delete()`

(again in the
`impl`

block in `src/binary_tree.rs`

):

```
pub fn delete(&self, value: &T) -> Option<Self> {
let values = self.get_values();
let values: Vec<T> = values.into_iter().filter(|v| v != value).collect();
if values.is_empty() {
None
} else {
let mut root = Self::new(values[0].clone());
for v in &values[1..] {
root.insert(v.clone());
}
Some(root)
}
}
```

The method expects a `Node`

reference, so nothing is modified, but a fresh tree
is returned. If the `value`

to be deleted is the current node’s value, and if
the current node has no children, an empty tree or *no* tree is returned, thus
`Option`

as the return type.

The values are obtained using the `get_values()`

method from before. Now the
value to be removed is filtered out. With the remaining values, a new tree is
built up.

This is very simple. But since `get_values()`

returns the items *in order*,
i.e. sorted ascendingly, the resulting tree (or “tree”) will look like this
(e.g. after removing the node with value 3):

Not much of a binary tree, is it? Lookups will take O(n) instead of O(log n) steps.

So the values must be handed over to the new tree in a different manner. There are two options I’d like to demonstrate:

- The values are obtained in
*pre-order*. - The values are re-sorted in
*middle out*order.

## Pre-Order Traversal

If the tree is already in a decent shape to begin with, obtaining the values in a way such that the value comes before its children’s values will yield decent results.

Let’s extend the `get_values()`

method by telling it in what order the nodes
shall be traversed. For that purpose, an `enum`

is created:

```
enum Traversal {
InOrder,
PreOrder,
}
```

There’s also a *post order*, which isn’t relevant for our use case. Now
`get_values()`

shall be extended as follows:

```
pub fn get_values(&self, order: &Traversal) -> Vec<T> {
let left = match &self.left {
Some(node) => node.get_values(order),
None => Vec::new(),
};
let middle = vec![self.value.clone()];
let right = match &self.right {
Some(node) => node.get_values(order),
None => Vec::new(),
};
match order {
Traversal::InOrder => [left, middle, right].concat(),
Traversal::PreOrder => [middle, left, right].concat(),
}
}
```

The `order`

is expected as a `Traversal`

reference and handed down to the child
nodes. The `left`

, `right`

, and `middle`

parts are still established in the same
way. However, upon returning the result, the resulting vector is constructed in
a different order depending on the `Traversal`

variant.

Let’s extend the test to verify the behaviour:

```
#[test]
fn test_get_values() {
let mut root: Node<usize> = Node::new(4);
root.insert(2);
root.insert(1);
root.insert(3);
root.insert(6);
root.insert(5);
root.insert(7);
let expected: Vec<usize> = (1..=7).collect();
let actual = root.get_values(&Traversal::InOrder);
assert_eq!(actual, expected);
let expected: Vec<usize> = vec![4, 2, 1, 3, 6, 5, 7];
let actual = root.get_values(&Traversal::PreOrder);
assert_eq!(actual, expected);
}
```

If the elements are inserted in that particular order, we’ll get back the original tree.

But let’s extend the `delete`

method, which was the actual purpose of this
digression:

```
pub fn delete(&self, value: &T) -> Option<Self> {
let values = self.get_values(&Traversal::PreOrder); // NOTE: order specified
let values: Vec<T> = values.into_iter().filter(|v| v != value).collect();
if values.is_empty() {
None
} else {
let mut root = Self::new(values[0].clone());
for v in &values[1..] {
root.insert(v.clone());
}
Some(root)
}
}
```

The only modification is the way `get_values()`

is being called: with `PreOrder`

specified as the `Traversal`

order.

Let’s test the deletion of a leaf:

```
fn test_delete_leaf() {
let mut root: Node<usize> = Node::new(4);
root.insert(2);
root.insert(1);
root.insert(3);
root.insert(6);
root.insert(5);
root.insert(7);
let tree = root.delete(&3).unwrap();
assert_eq!(tree.get_values(&Traversal::InOrder), vec![1, 2, 4, 5, 6, 7]);
}
```

Which yields a pretty decent tree:

Now let’s delete the root node’s right child:

```
#[test]
fn test_delete_middle() {
let mut root: Node<usize> = Node::new(4);
root.insert(2);
root.insert(1);
root.insert(3);
root.insert(6);
root.insert(5);
root.insert(7);
let tree = root.delete(&6).unwrap();
assert_eq!(tree.get_values(&Traversal::InOrder), vec![1, 2, 3, 4, 5, 7]);
}
```

Which also yields a decent tree:

Let’s try deleting the root node:

```
#[test]
fn test_delete_root() {
let mut root: Node<usize> = Node::new(4);
root.insert(2);
root.insert(1);
root.insert(3);
root.insert(6);
root.insert(5);
root.insert(7);
let tree = root.delete(&4).unwrap();
assert_eq!(tree.get_values(&Traversal::InOrder), vec![1, 2, 3, 5, 6, 7]);
}
```

That doesn’t look so nice anymore:

This tree has now four levels, but its 6 elements could easily fit on three.

It would be nice to have a way to pre-order the values in a way such that the
tree will be perfectly balanced in the first place. Enter *middle out*!

## Middle Out

I’m such an idiot. Middle out. Middle out! MIDDLE OUT!"

— Richard Hendricks, *Silicon Valley* (Season 1, Episode 8)

Here’s how the values can be put in an order that will yield a balanced tree:

- Obtain the tree’s values
*in order*. - Split the list in the middle; return the median element and the list on the left (smaller elements) and on the right (bigger elements).
- Process the sublists in the same way, thereby adding their median values to a sequence, until all remaining sublists are empty.
- The sequence of median values can now be inserted into the tree.

The values are extracted from the sequence *middle out*, so to speak.

Let’s create a new module in `src/middle_out.rs`

, which is to be referred in
`src/lib.rs`

as follows:

```
pub mod binary_tree;
pub mod middle_out;
```

The `middle_out()`

function shall expect an array of values and return them in
proper insertion order as a vector of the same type (`src/middle_out.rs`

):

```
pub fn middle_out<T: Ord + Clone>(values: &[T]) -> Vec<T> {
if values.is_empty() {
Vec::<T>::new()
} else {
do_middle_out(&vec![values.to_vec()], &mut Vec::new())
}
}
```

If there are no values, an empty vector is returned. Otherwise, a recursive algorithm with an initially empty accumulator vector is returned, which is defined as follows:

```
fn do_middle_out<T: Ord + Clone>(splits: &Vec<Vec<T>>, acc: &mut Vec<T>) -> Vec<T> {
if splits.iter().all(|s| s.is_empty()) {
return acc.clone();
}
let mut remainder = Vec::new();
for split in splits {
let (median, splits) = split_median(split);
if let Some(median) = median {
acc.push(median.clone());
}
for split in splits {
if !split.is_empty() {
remainder.push(split);
}
}
}
do_middle_out(&remainder, acc)
}
```

A vector of “split” vectors is expected; initially, it’s only a single vector. If there are only empty split vectors, all the median values have been extracted in proper order, and the accumulator is returned as the result.

As long as there are non-empty split vectors, both the median and the left/right sub-vectors of the split vector is computed. The found median is added to the accumulator. The newly found split vectors are added to a remainder, which is then processed (tail-)recursively.

The `split_median()`

function is implemented as follows:

```
fn split_median<T: Ord + Clone>(values: &[T]) -> (Option<T>, Vec<Vec<T>>) {
let n = values.len();
if n == 0 {
(None, Vec::new())
} else if n == 1 {
(Some(values[0].clone()), Vec::new())
} else {
let mut values = values.to_owned();
values.sort();
let m = if n % 2 == 0 { n / 2 - 1 } else { n / 2 };
let median = &values[m];
let left = values[0..m].to_owned();
let right = values[m + 1..n].to_owned();
(Some(median.clone()), vec![left, right])
}
}
```

There are three cases to be covered:

- If there are no elements, neither median nor split vectors are returned.
- If there is one value, it is returned as the median, but no split vectors are returned.
- If there are multiple elements, they are first sorted. Then the median element is extracted, and the slices on the left and right of it are extracted. Both median and slices are returned.

Let’s verify this algorithm with some test cases.

First, an empty vector shall yield an empty vector:

```
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_middle_out_empty() {
assert_eq!(middle_out(&Vec::<usize>::new()), Vec::new());
}
}
```

Second, the algorithm must work with an even number of elements (to be placed in
the above `test`

module, too):

```
#[test]
fn test_middle_out_small() {
assert_eq!(
middle_out(&(1..=8).collect::<Vec<usize>>()),
vec![4, 2, 6, 1, 3, 5, 7, 8]
);
}
```

Inserting the values in this order will yield this nicely balanced tree:

Third, if an odd number of elements is used, the tree shall be balanced, too:

```
#[test]
fn test_middle_out_big() {
assert_eq!(
middle_out(&(0..=14).collect::<Vec<usize>>()),
vec![7, 3, 11, 1, 5, 9, 13, 0, 2, 4, 6, 8, 10, 12, 14]
);
}
```

Again, this insertion order yields a balanced tree:

Let’s integrate this functionality into the tree module!

## Wrapping Up

Back to the `binary_tree`

module, let’s import the `middle_out()`

function:

```
use crate::middle_out::middle_out;
```

The `delete()`

method shall be extended as follows:

```
pub fn delete(&self, value: &T) -> Option<Self> {
let values = self.get_values(&Traversal::InOrder); // NOTE: back to InOrder
let values: Vec<T> = values.into_iter().filter(|v| v != value).collect();
let values = middle_out(&values); // NOTE: added line
if values.is_empty() {
None
} else {
let mut root = Self::new(values[0].clone());
for v in &values[1..] {
root.insert(v.clone());
}
Some(root)
}
}
```

The values are retrieved *in order* again. Depending on the sorting algorithm
used by `middle_out`

, ordering a pre-ordered vector *might* be faster than an
unordered one. The values are then re-arranged as discussed by calling the
`middle_out()`

function.

Let’s check the effects on the delete operations again, starting from this tree:

First, let’s delete the leaf `3`

, which yields the following tree:

The tree is still balanced, but has a slightly different layout.

Second, let’s delete the middle node with the value `6`

, which yields this tree:

This tree is also still balanced, but also looks a bit different.

Third, let’s delete the root node, which yields this tree:

Unlike as before, after deleting the root node, a balanced tree is returned.

So *middle out* really is a thing!

# Conclusion

Dealing with node-based data structures doesn’t have to be a hassle in Rust when one is willing to sacrifice some performance and to re-build data structures instead of modifying them in-place.

Binary trees can be traversed in different orders. The *in order* traversal
yields a sequence of (ascendingly) ordered values. The *pre-order* traversal
yields a sequence with the upper level values of the tree before the lower level
ones.

To pre-establish an insertion order that will yield a balanced binary tree, the
*in order* sequence of values can be split up into a left side, a median, and a
right side. The resulting medians are added to a resulting sequence of items to
be inserted; the process is repeated until the resulting split sequences become
all empty. I called this algorithm *middle out*.

The source code can be obtained from GitHub.