The Great Tree-List Recursion Problem

StefanKarpinski · on March 7, 2019

The continuing emphasis on pointer-heavy data structures in computer science curricula is really doing a massive disservice to students and all the employers that go on to hire those students. Pointer-heavy data structures are TERRIBLE on modern hardware. They were great back in the era when pointers were small (≤32 bits) and unpredictable memory access was roughly the same speed as CPU operations. We have left that era long ago. Today pointers are a whopping 64 bits and looking up your next node through a pointer is so slow that it's the CPU equivalent of taking a ten year sabbatical.

For example: at what size does it become better to use a list rather than a vector when doing lots of insertions and deletions which are O(1) for lists and O(n) for vectors? NEVER. It is always better to use a vector. Don't take if from me, this is according to Bjarne Stroustrup:

https://www.youtube.com/watch?v=YQs6IC-vgmo

Pointer-heavy data structures like linked lists are so bad for memory locality and storage overhead on modern hardware that it is impossible for them to catch up with contiguous memory vectors.

There are still situations where you need a tree but if you do, try very hard to use a representation like a binary array heap where the child links are implicit. Or if you must use "pointers" consider allocating all the values in a contiguous block and storing the link structure separately as 32-bit or smaller integer indexes into the data array. This at least improves the memory overhead and locality.

</rant>

rstuart4133 · on March 8, 2019

> There are still situations where you need a tree but if you do, try very hard to use a representation like a binary array heap where the child links are implicit. Or if you must use "pointers" consider allocating all the values in a contiguous block and storing the link structure separately as 32-bit or smaller integer indexes into the data array. This at least improves the memory overhead and locality.

There is an ironic side to that advice. If the node in the tree is larger than 8 bytes (ie 2 don't fit into a cache line), then pointers vs vector doesn't get you a win as either way you hit a different cache line when you access the next element. Stroustrup's example is a nice "gotcha", but the reality most of us don't store integers in a vector or tree, we store bigger things. That boils down to two cases: this thing is so big storing a couple of pointers along side them doesn't matter, or you store a pointer to the bigger thing in your tree which means you are using a pointer heavy data structure anyway.

OK, OK, I've cheated. I get what you are saying is using a linked list to store those pointers to the big object is always worse than storing them in an array. But there is a third option which is better than both: embed the pointers into the big object. If that pointer ends in the same cache line as whatever you are using to distinguish the objects the pointer heavy method will be faster than the non-pointer method, and that is an irony.

This pattern has been followed by hoary old C programmers like myself for decades. It's all over the Linux kernel for instance. Granted it ain't so easy to do in languages other than C. (Even rust lacks the necessary "I have a pointer to linked list node that I know is embedded in a object of type T, give me the pointer to the T object" operation because it's so hard to make type safe.) But just because most languages don't support it doesn't mean pointer heavy data structures are bad.

StefanKarpinski · on March 8, 2019

> but the reality most of us don't store integers in a vector or tree, we store bigger things.

I work in numerical computing where storing 64/32-bit floats is by far the most common use case. In this world, storing a lot of small, immutable values is the norm and using pointer-heavy data structures would be a disaster.

That's a good trick that I'll have to think about good ways to expose safely in a higher level language.

ecshafer · on March 7, 2019

Do all programmers program on 64 bit x86 machines? I am not an embedded or systems programmer so I don't have the answer to this question. But does your statement hold true for all of the other chips that uses in the embedded space? If not, then teaching those pointer heavy structures is useful for programmers that go on to use arm or other non x86 chips.

BubRoss · on March 7, 2019

I'm not sure why he made a big deal about pointer size. That is not the problem as far as I have seen. Memory latency (in combination with TLB lookups, which can end up also hitting memory latency) is the problem.

Everything else he says I completely agree with, but I think pointer sizes muddies the water.

StefanKarpinski · on March 8, 2019

Doubling of pointer size doubles the memory overhead (where overhead is defined as memory used that isn't your data). Sure, it's not the worst part of pointer-heavy data structures, but if the overhead is doubled, that's potentially twice as many unnecessary cache misses.

kazinator · on March 7, 2019

Lol, I made a C library on this very topic years twenty years ago.

http://www.kylheku.com/~kaz/austin.html

"Austin" provides several container data structures in a single instance. With a function call it can be instructed to change the representation from any one of them to any other. E.g. a red-black tree can be told to become an AVL tree or sorted list and so on.

When I wrote this and announced it, a gentleman from Austria named Carl Van Tast contributed the AVL. I've not encountered him since; I remember telling him it was Van-Tast-ic. That could why I haven't encountered him since.

The API is in "udict.h" (universal dictionary). The algorithms are enumerated thus:

  #define UDICT_LIST      0
  #define UDICT_BST       1
  #define UDICT_REDBLACK  2
  #define UDICT_SPLAY     3
  #define UDICT_AVL       4

cousin_it · on March 7, 2019

I think there's a simpler solution, with one loop and no recursion, in O(n) time and O(1) space.

     static Node treeToCycle(Node root) {
      Node pseudoRoot = new Node();
      pseudoRoot.left = null;
      pseudoRoot.right = root;
      Node tail = pseudoRoot;
      Node rest = root;
      while (rest != null) {
        if (rest.left == null) {
          rest.left = tail;
          tail = rest;
          rest = rest.right;
        } else {
          Node temp = rest.left;
          rest.left = temp.right;
          temp.right = rest;
          rest = temp;
          tail.right = temp;
        }
      }
      Node head = pseudoRoot.right;
      head.left = tail;
      tail.right = head;
      return head;
    }

DerekL · on March 9, 2019

Let me explain to everyone else how your function works. It's turning a tree into a list. The tree is rooted at “rest”, and the list starts at “pseudoRoot” and ends at “tail”.

If the root of the tree has no child on the left, it pops that node and adds it to the end of the list. If it does have a left child, then it does a right rotation on the tree, which makes the left side smaller. (See https://en.wikipedia.org/wiki/Tree_rotation .)

I was wondering if this really runs in linear time. Call a node in the tree good if it lies on the right-most part of the tree. In other words, a node is good if the path from the root to that node only goes right at each step. The root is always good. A node is bad if it is not good. Each right rotation turns a bad node into a good one, and the pruning removes one good node. So it is really O(n) time.

cousin_it · on March 9, 2019

Yes, this is all exactly right. To convince everyone else that I'm not a wizard - my function is just a slightly modified version of the tree-to-vine function from the Day-Stout-Warren algorithm. https://en.wikipedia.org/wiki/Day%E2%80%93Stout%E2%80%93Warr...

benmmurphy · on March 7, 2019

there is is an alternative solution where you basically do it using an inorder traversal. it uses more stack space because it holds 1 node pointer in global state and two node pointers between the left and right recursion. While the building a list solution just holds a single node pointer during the left and right recursion.

lets say you want to convert a tree to single linked list in order then you can just do an in order traversal of the tree and keep track of the previous node and assign previous.left = current_node and this is fine because if you are at the successor to a node then the whole of the previous nodes left subtree has been visited because it is smaller and you will never need to follow the left pointer again.

    previous = nil
    in_order(tree) do |node|
      if previous
        previous.left = node
      end
      previous = node
    end

however, you can't do node.right = previous until you have visited the right subtree of the current node so you can't use exactly the same code for a doubly linked list. however, if you do the fix up after you have finished processing the current node then it works fine. so if you have an iteration that will give you both an inorder and postorder callback and a way of sharing state between them you can just do the fixup in the postorder callback where it is safe.

    previous = nil
    head = nil
    multi_order(tree) do |node, order, state|
      if order == :inorder
        if previous
          fixup_state = [node, previous]
        else
          head = node
        end
        previous = node
        fixup_state #passed back into iterator and is return in post_order callback
      else
        if state
          fixup_node, fixup_previous = state
          fixup_previous.left = fixup_node
          fixup_node.right = fixup_previous
        end
      end
    end
    previous.next = head
    head.prev = previous
    head

eklitzke · on March 7, 2019

I did this exact problem on Leetcode the other day: https://leetcode.com/problems/convert-binary-search-tree-to-...

hopler · on March 7, 2019

Login wall

mrburton · on March 7, 2019

Why wouldn't you do a pre-order traversal and connect the appropriate nodes? When they say "recursion is key", that's actually misleading and this problem doesn't seem hard at all.

What am I missing?

mwkaufma · on March 7, 2019

The spirit of the problem, as it was presented to us in undergrad, is to modify the data structure in-place, not by allocating a scratch array.

benmmurphy · on March 7, 2019

I think if you place a limit on O(log(n)) space being allocated and guarantee the height of the tree is at most O(log(n)) then it breaks the scratch solution and makes the inplace solution where you use scratch on the stack the only solution. probably the inplace solution can be fixed to use O(log(n)) space even if it the tree is unbalanced because you don't need to allocate stack space unless a node has both a left and a right pointer.

panarky · on March 7, 2019

>> The operation can be done in O(n) time -- essentially operating on each node once.

> Why wouldn't you do a pre-order traversal

Because then it wouldn't be O(n)

sqd · on March 7, 2019

I can surely do a pre-order DFS and record the order of the nodes I visited, and then another DFS to modify the pointers? That would still be O(n).

tristanj · on March 7, 2019

There is a follow up to this question. Given two (balanced) binary search trees, where the first BST has N nodes and the second has M nodes, how would you merge both trees in linear time and constant space?

(The solution is here if you are interested https://stackoverflow.com/questions/7540546/merging-2-binary... )

alkibiades · on March 7, 2019

don’t get what’s so great about this? was asked this during a facebook interview and solved it on the spot

hopler · on March 7, 2019

Congratulations, you are brilliant and earned that elite job.