Tees: Where and How to Use
What is a tree in regards to Computer Science?
When most people hear the word tree they might imagine a conifer or palm, maybe aspen or maple tree. It really depends on the region you come from. But when computer scientists talk about trees it is in regard to a specific type of data structure. This is because when you think about how the tree data structure is organized it is reminiscent of an organic tree.
What are the core components of a tree?
The core components of a tree are its nodes. Each tree has a node from which you can find all other nodes this is called the root node there can only be one of these for each tree. This is like how with real trees you can trace a path from the trunk of a tree to each of its leaves. The nodes of a tree can be one of these three things a root, a branch, or a leaf. A root node is special it is the place where you start traversing the tree from. A branch node is a node that has a left or right child node. A leaf node has no left or right child nodes.
How are these components connected?
These nodes are all connected together by pointers. The root node has the ability to hold 2 or more pointers to more nodes. Each node under it also has the ability to point to 2 or more nodes and so on and so forth. In this article, we will mostly focus on binary search trees, which only have two children for each node. But even so, the tree quickly divides into many separate paths that can be traced from the root node to a leaf node. As a note, just like in real trees you cannot have two separate nodes in a tree point to the same node. That would be like having two branches on a real tree coming together to make one branch. You also cannot have a node point to itself in one of its left or right pointers. Real-life trees don’t grow in circles. Please completely ignore what you see to the left that’s not real life. It was probably photoshopped, right? Look below for an image detailing what is and is not a computer science tree.
How are they similar to other data structures such as lists, or hash tables?
How are trees organized compared to other data structures?
Binary search trees are data structures that are inherently ordered. They have a hierarchical structure of nodes in which each node has the possibility to have a child that has a value that is less than and greater than itself. Trees themselves are not stored contiguously in memory but instead are a system of nodes that connect to each other using pointers. A left pointer points to a node with data that is less than our current node and a right pointer to a node with data greater than.
Trees are very similar to linked lists in that they both use nodes and pointers in order to know where the next piece of information is. The key difference between a linked list and a tree is that a tree has two or more pointers to a new node for each node it has.
If you have heard about trees before you might have heard that they are especially fast at being searched. In that regard that kind of similar to a hash table. Let us talk about the structure of a hash table. A hash table is normally implemented with an array, that way you can easily find the index of any given key and value that you insert. In order to find the index of any given key, you create a deterministic hashing function that is, whenever you put in the same key you get the same value and most keys have a different value. This function returns a number representing the key, then you find the index of the bucket this key belongs to by determining the remainder of the keys hash value divided by the length of the array you are using for the hash table. There are some other things that hash tables do but we won’t discuss them right now. So if you want to learn more visit this great post here.
What are the advantages of using trees compared to other structures?
Suffice to say hash tables use an array to track the location of all of their items and it uses an algorithm to insert and locate an item which is very fast. This is in comparison to an array that you can use to find items very quickly by indexing if you know where they are. If not you have to iterate through your array until you find the item you are looking for. This compares to a linked list where you always need to iterate through the list in order to find something you are looking for. Trees you also need to iterate down but the advantage is that with each step you get rid of half of the remaining options instead of a linked list which gets rid of one option. This makes binary search trees a lot faster than linked lists but not quite as fast as hash tables under ideal circumstances.
More details on binary search trees.
How do you search for information in a tree, and what’s it’s time complexity?
The quickest way to get to the information in a binary search tree is to traverse down from the root by comparing the item you are looking for to the data of the current node if the item is smaller go left if it is larger go right. If your tree is reasonably balanced that is there are not overwhelmingly more nodes on one side of it than the other then the time it takes to search for an item in it will take O(log(n)) where n is the number of items in the tree. This is because you should be halving the amount of information you need to search through with each node you move to.
# Pseudo CodeFind this item, given a root_node and an item to look for.
Set current_node equal to root_node.
Create a loop that will run until the current nodes data is equal to the item we are looking for or current_node is not equal to anything.
If the item is less than the current nodes data.
Then current_node is set to current_nodes.left pointer.
Else if the item is greater than the current nodes data.
Then current_node is set to current_nodes.right pointer.
Once the loop breaks if current_node is not anything.
Then we return 'Could not find this node'
Else if the current_node is equal to the item.
Then we return 'Found this item'
What’s the time complexity of adding information to a tree?
To add a new node to a tree you need to make the node which is constant, O(1). Then you need to traverse the tree to the spot where that data would go if it were in the tree this will take the same amount of time as searching the tree, O(log(n)). Then when you find the location where it belongs you make the node above where it belongs point to the new node you have made.
How do you break a tree to achieve the worst time complexity of all?
This process of adding to a tree can give us some unexpected results sometimes. Most of the time we don’t expect data to be appended in any particular order which is good and means our data should be roughly evenly distributed across our tree. But if we are starting with an ordered list, say the letters a, b, c, d, e, f, g and we append them in order; each letter value is greater than the lasts so each would fall to the right of their parent. This gives us a tree with a height of 6 instead of a possible height of 2. This also means that when we search through the tree we would have to iterate through each element in the tree in order to see if something is or is not there, exactly like a linked list. That time complexity would be O(n) which is not something that we want.
How can trees be optimized?
One way to optimize the search time would be to take the elements out of the tree and put them back in, in optimal searching order. The optimal searching order would be the order in which you would binary search through the given elements if they were sorted. This would require you to copy the entire contents of your tree somewhere else, then sort it into the proper order, then create a new tree and then put the elements back in. Needless to say, this would be very time and space inefficient if you were to do this every time you wanted to add an item to your tree. Luckily other more efficient ways were developed.
Introducing AVL trees.
AVL stands for the names of the inventors, Adelson-Velskii and Landis (http://people.cs.ksu.edu/). It is a binary tree structure that self balances by rotating the tree when inserting new nodes. If the tree is always balanced then you always have the optimum search time to find any given node.
What are the extra functions that you need to implement in order to create an AVL tree?
In order to implement an AVL tree, you need to implement a tree rotation function for both the left and right rotations. These can be used in combination to build a right-left and left-right rotations.
# Pseudo CodeA given nodes balance is eqal to its left height minus its rightIn a right rotation you set the parent's left node's right pointer to the parent and the parent's left pointer equal to the left nodes right pointer.in a left rotation you set the parent's right node's left pointer to the parent, and the parent's right pointer equal to the right nodes, left pointer.if balance is greater than one and the item is less than the parents.left nodes data preform a left rotation.if balance is less than negative one and the item is greater than the parents.right nodes data preform a left rotation.if balance is greater than one and the item is greater than the parents.left nodes data preform a left rotation then a right rotation on the same sub tree.if balance is less than negative one and the item is greater than the parents.right nodes data preform a right rotation then a left rotation on the same sub tree.
Binary trees allow you to efficiently store and search through data. Though their efficiency goes down as they become more unbalanced. A good way to prevent this unbalancing is to implement a self-balancing tree. An AVL tree is a great way to do this though there are probably many others.