“I refuse to use a self-referential motto”


Like last week, the collection of images comes from Sammy’s archive. Except for the hand-drawn sketches of data structures, those are mine. May the contrast bring out the quality of Sammy’s work even more.
~Aljoscha


I made a thing: a visual guide to zip-trees.
Every once in a while, I think about data structures. Efficient implementations of the three-dimensional data model of Willow require three-dimensional data structures. And those are tricky: whereas the one-dimensional case admits various search trees that allow for both querying and updating in logarithmic time, the multi-dimensional case is more difficult (here’s a rough introduction). There are problems with all existing data structures, so I have pondered a generalisation of zip-trees to three dimensions for a while now.
This week, I finally worked out the last missing algorithms. As part of my note taking I developed a graphical notation that could record the algorithms a lot more concisely than pseudocode can:

While I used this notation only as an aid in developing the algorithms and as a means of recording them, I naturally started to wonder whether the notation contained enough information to fully replace pseudo-code in a proper writeup. I need to share my algorithms for them to be useful to anyone, after all, but writing pseudocode is a pain. In particular, when it comes to symmetries and rotations, pseudocode is either verbose or littered with difficult-to-grok parameters. The graphic notation, in contrast, handles rotations and symmetries gracefully.
So I started refining the notation a bit: properly handling empty trees, teasing apart individual diagrams into sequential steps, incorporating item ranks, etc. Additionally, I started wondering about generating the diagrams instead of drawing them by hand. And when I was somewhat confident that all of this could possibly work and even be useful, I decided to create a proof-of-concept document based on one-dimensional zip-trees. And that is the thing I made this week.
~Aljoscha

Before the graphic-notation-frenzy took over my week, I made some additional progress on multi-dimensional data structures. I did a complexity analysis of the multi-dimensional zip-tree algorithms. In a tree of logarithmic height, query times are logarithmic. Update times are logarithmic on average, but they can be in O(n^0.792) (master theorem for 3 subproblems of size n / 4) in the worst case. This is not great, but probably good enough maybe? At the very least, there are such results in the first place, whereas R-trees and their ilk typically just hope for the best.

Okay, this update is now officially turning into a data structure fiesta, because I have one more insight to share, but it takes a bit of preliminary work. Let me start with a sketch of my definition of two-dimensional zip-trees.
Given a set of points in a two-dimensional space, and a function from those points to ranks (natural numbers), the 2d-zip-tree on the set is a 4-ary tree, defined recursively: the root is the point of maximal rank (I’ll simply write "maximal rank", implicitly assuming that there is some deterministic tie-breaking in place in case of duplicate ranks).
Next, partition the remaining points into four sets:
- bottom-left (less than the root in both dimensions),
- top-left (less that the root in the first dimension, greater than the root in the second dimension),
- bottom-right (greater that the root in the first dimension, less than the root in the second dimension), and
- and top-right (greater than the root in both dimensions).
Recursively build the 2d-zip-trees for these sets, and use the resulting trees as the bottom-left, top-left, bottom-right, and top-right children of the root.

This definition directly generalises the corresponding one-dimensional definition.

I want to sketch a definition of two-dimensional skip lists, but to get there we first need to take a look at one-dimensional skip-lists. I’ll assume familiarity with the concept, but want to present an alternate, non-standard definition. This definition makes the isomorphism between skip-lists and zip-trees much more obvious than the standard definition.
Given a set of points with associated ranks, we can define the bottommost layer of the skip-list recursively: select the point of maximal rank, call it the pivot. Partition the remaining points into two sets: those less than the pivot, and those greater than the pivot. Recursively construct the bottommost skip-list layer for each of these sets. Add an edge from the greatest point in the lesser recursive construction to the pivot, and add an edge from the pivot to the least point in the greater recursive construction.

For the higher layers, you proceed identically, except that you filter out items whose rank is less than the layer count. Then you add inter-layer links as usual.
This definition might seem overly complicated, given that it simply results in layers that are ascendingly sorted linked lists. But it has two advantages: first, it is utterly obvious how the definition relates to that of zip-trees, and second it generalises nicely to higher dimensionalities.

From the preceeding definition of one-dimensional skip-lists, we can easily get to a definition of two-dimensional skip-lists. And the connection to two-dimensional zip-trees will be just as obvious as in the one-dimensional case.
Given a set of points in a two-dimensional space, with associated ranks, the bottommost layer of the 2d-skip-list on the set is defined recursively: the pivot is the point of maximal rank.
Next, partition the remaining points into four sets:
- bottom-left (less than the pivot in both dimensions),
- top-left (less that the pivot in the first dimension, greater than the pivot in the second dimension),
- bottom-right (greater that the pivot in the first dimension, less than the pivot in the second dimension), and
- top-right (greater than the pivot in both dimensions).
Recursively build the bottommost 2d-skip-list layers for these sets. Add an edge from the "greatest" (defined later) element of the bottom-left one to the pivot, and add an edge from the pivot to the "least" (defined later) element of the top-right. Add an edge from the pivot to the "least" elements of the top-left and bottom-right ones, and add edges from the "greatest" element of the top-left and the bottom-right one to the pivot.
The "least" element in a subset is defined as follows: throw away all elements that are greater in both dimensions than some other element in the subset. Of the remaining elements, the one of least rank is the "least" element. Analogously, the "greatest" element in a subset is defined as follows: throw away all elements that are less in both dimensions than some other element in the subset. Of the remaining elements, the one of least rank is the "greatest" element.

And then you do the usual business of constructing sparser layers and connecting the layers. The resulting data structure supports efficient range queries (unlike Nickerson’s ad-hoc constructions), and is isomorphic to two-dimensional zip-trees.
The structure of the resulting graphs is quite neat: effectively, you get a linked list from the bottom-left to the top-right, with choices between incomparable successor candidates tie-broken by rank. This main diagonal doesn’t contain all items, but it sprouts off smaller diagonals, which go from the bottom-left to the top-right of smaller quadrants. And those have smaller sub-diagonals themselves, and so on. And because those sub-diagonals are connected back to the mid-point of their super-diagonal, you get the expected connectivity properties of one-dimensional skip-lists: for any two vertices in the data structure, there is a directed path from one to the other.
For higher dimensionalities, the pattern generalises nicely: you have a primary diagonal from the least to the greatest element, and then a bunch of sub-diagonals, each again with their sub-diagonals.

A final fun fact about two-dimensional skip-lists: if you do not connect sub-lists back to the pivot of their super-list, you loose the fact that there's a directed path between any two vertices, but every vertex is still reachable from the least (bottom-left-most, ties broken by minimal rank) vertex in the list. In fact, searching for items starting from there (and also doing range queries, I think), still works. This is analogous to my observation about two kinds of edges in one-dimensional skip-lists in Figure 18 of the G-tree paper.
































