Treaps: Probabilistic Binary Search Trees

6 min readMar 29, 2020

Background Information:

To understand treaps, a basic understanding of a Binary Search Tree and Binary Heaps are necessary.

Binary Search Trees:

A tree is a data structure that has a root node that contains some value, as well as pointers to child nodes. A binary tree is a tree that is restricted to at most two children. A binary search tree builds upon this and has another property, the value of each parent node should always be greater than those in it’s left subtree, and less than those in the right subtree (equal values can go solely in the left or the right subtree).

A BST allows one to insert, delete, and search for values inside the tree. An unbalanced BST has the following runtimes for such operations:

Insertion: O(n)
Deletion: O(n)
Search: O(n)

Note the reason we have O(n) runtime is because since the tree is unbalanced, it could devolve into a linked list (for example, if we created a BST with sorted data). Source: Robert Sedgewick. Algorithms, 4th Edition.

To achieve a quicker runtime of O(logn), we can balance the BST through various means (i.e. AVL Trees and Red-Black Trees). By giving each node a specific property, we can constantly order and rearrange the tree in such a way that the tree will not devolve into a linked list.

Heaps:

A heap is another kind of tree-based data structure. A heap is a near-complete binary tree (each level is filled left-to-right when inserting), where each parent node either has a greater value (max-heap) or a smaller value (min-heap) than it’s children. For simplicity reasons, I shall be referring to a max-heap whenever a heap is mentioned for the rest of this article.

With a heap, one only has access to the root node and thus, cannot search in a manner similar to BSTs. Instead, one can only push and pop the top node, which results in rotations throughout the tree in order for the initial property to remain. This leads to the following runtimes:

Insertion: O(logn) due to the rotations that result.
Deletion: O(logn) due to the rotations that result.
Searching: O(nlogn). Worst case we have to empty the entire heap to find what we are looking for, and then reinsert everything back.

Tree Rotations

A subtree can be “rotated” by shifting the root in a specific direction. As a result, the parent becomes and one of the children becomes the parent.

Left Rotation:

Note a root, A, a left child, B, and a right child, C
Make C’s left child the right child of A
Make A the left child of C

Right Rotation:

Note a root, A, a left child, B, and a right child, C
Make B’s right child the left child of A
Make A the right child of B

What is a Treap?

Structure of a Treap

In a treap, each node consists of a value, priority key, left child, and right child. With these fields comes the structuring property of a treap:

Order the values of nodes as a Binary Search Tree
Order the priority keys of nodes as a Heap

For every parent in the tree, it’s left subtree should have values smaller than it and it’s right subtree having values greater than it. It also means that itss priority key should be greater than that of its subtrees’. By doing this, one would end up with a tree that is (very likely to be) balanced.

How Treaps Work

The con with treaps comes from their randomness. Treaps are a probabilistic data structure, meaning there can never be exact certainty in its runtimes. While it will most have a runtime of O(logn), there is still the chance (albeit very unlikely) of getting a runtime O(n).

Assume you wanted to create a BST out of n items. If the tree was created out of sorted data, we would end up with a linked list. Now, let us instead shuffle the data before inserting. Note that if we shuffle n items, there are n! possible permutations. This means the chances of a sorted tree end up being 2/n!.

A treap takes the idea of shuffling and simulates it with a random key generation. The key that is created for each node can be thought of as the node’s placement if you (assuming you had all the data in the tree) you shuffled it and then created a BST.

Operations

Since are considering a treap a BST when it comes to the values in our nodes, that means that inserting, searching, and deleting are done largely in a BST-like manner. The main difference here is the addition of rotations.

Insertion: O(logn)

If current parent is larger than insertion value:
If a left child exists, call insert on the left subtree.
Otherwise, make the parent’s left child the insertion value. If the child has a higher priority than the parent, perform right rotations.
If our current parent is less than our insertion value:
If a right child exists, call insert on the right subtree.
Otherwise, make the parent’s right child the insertion value. If the child has a higher priority than the parent, perform left rotations.

Search: O(logn)

If current parent is larger than our search value, go to left subtree.
Else if current parent is less than our search value, go to right subtree.
If our current parent is equal to our search value, return true.
If we reached the end of the treap (parent is null), return false.

Deletion O(logn)

To delete, first search for the desired value. If found, do the following:

If this is a leaf node, just remove it from the treap.
If the node has one child, move up the child to replace it.
If there are two children, compare their priority values:
If the right child has a higher priority, do a left rotation at the subtree root and call delete on the new root’s left child.
If the left child has a higher priority, do a right rotation at the subtree root and call delete on the new root’s right child.

This github repo provides the full code for creating a Treap.

Benefits of a Treap

Compared to balanced BSTs that are not probabilistic (AVL trees), a treap is more simplistic and thus, quicker to code, since its structural property relies solely on the priority keys.

Splitting a Set Into Two: O(logn)

A treap can be used to quickly join and split sets of data. Given two sets of data, you can perform a split by merely inserting a dummy node with an infinite priority value. By doing this, the dummy ends up as the root of the tree, with it’s left and right children being the two sets of data.

Joining Two Sets Into One: O(logn)

Performing a join operation of two sets of data involves deleting the dummy node from the tree. Both of these operations can be done in O(log(n)) time.

Conclusion

With this article, you should now have a good understanding of how treaps work and why they are usable (despite their randomness).