In our previous lesson, we talked about binary
trees in general. Now, in this lesson we are going to talk about binary search tree, a
special kind of binary tree which is an efficient structure to organize data for quick search
as well as quick update. But before I start talking about binary search tree, I want you
to think of a problem. What data structure will you use to store a modifiable collection?
So, lets say you have a collection and it can be a collection of any data type, records
in the collection can be of any type. Now, you want to store this collection in computers
memory in some structure and then you want to be able to quickly search for a record
in the collection and you also want to be able to modify the collection. You want to
be able to insert an element in the collection or remove an element from the collection.
So, what data structure will you use? Well, you can use an array or a linked list. These
are two well known data structure in which we can store a collection. Now, what will
be running time of these operations – search, insertion or removal, If we will use an array
or a linked list. Lets first talk about arrays and for sale of simplicity, lets say we want
to store integers. To store a modifiable list or collection of integers, we can create a
large enough array and we can store the records in some part of the array. We can keep the
end of the list marked. In this array that I am showing here, we have integers from 0
till 3. We have records from 0 till 3 and rest of the array is available space. Now
to search some X in the collection, we will have to scan the array from index 0 till end
and in worst case, we may have to look at all the elements in the list. If n is the
number of elements in the list, time taken will be proportional to n or in other words,
we can say that time complexity will be big-oh of n. Ok, now what will be the cost of insertion.
Lets say we want to insert number 5 in this list. So, if there is some available space,
all the cells in yellow are available, we can add one more cell by incrementing this
marker ‘end’ and fill in the integer to be added. Time taken for this operation will
be constant. Running time will not depend upon number of elements in the collection.
So, we can say that time complexity will be big-oh of 1. Ok, now what about removal. Lets
say we want to remove 1 from the collection. What we’ll have to do is, we’ll have to shift
all records to the right of one by one position to the left and then we can decrement end.
The cost of removal once again will be big-oh of n. In worst case, we will have to shift
n-1 elements. Here, the cost of insertion will be big-oh of 1, if the array will have
some available space. So, the array has to be large enough. If the array gets filled,
what we can do is, we can create a new larger array, typically we create an array twice
the size of the filled up array. So, we can create a new larger array and then we can
copy the content of the filled up array into this new larger array. The copy operation
will cost us big-oh of n. We have discussed this idea of dynamic array quite a bit in
our previous lessons. So, insertion will be big-oh of 1 if array is not filled up and
it will be big-oh of n if array is filled up. For now, lets just assume that the array
will always be large enough. Lets now discuss the cost of these operations if we will use
a linked list. If we would use a linked list, I have drawn a linked list of integers here,
data type can be anything, the cost of search operation once again will be big-oh of n where
n is number of records in the collection or number of nodes in the linked list. To search,
in worst case, we will have to traverse the whole list. We will have to look at all the
nodes. The cost of insertion in a linked list is big-oh of 1 at head and its big-oh of n
at tail. We can choose to insert at head to keep the cost low. So, running time of insertion,
we can say is big-oh of 1 or in other words, we will take constant time. Removal once again
will be big-oh of n. We will first have to traverse the linked list and search the record
and in worst case, we may have to look at all the nodes. Ok, so this is the cost of
operations if we are going to use array or linked list. Insertion definitely is fast.
But, how good is big-oh of n for an operation like search. What do you think? if we are
searching for a record X, then in the worst case, we will have to compare this record
X with all the n records in the collection. Lets say, our machine can perform a million
comparisons in one second. So, we can say that machine can perform 10 the power 6 comparisons
in one second. So, cost of one comparison will be 10 to the power -6 second. Machines
in today’s world deal with really large data. Its very much possible for real world data
to have 100 million records or billion records. A lot of countries in this world have population
more than 100 million. 2 countries have more than a billion people living in them. If we
will have data about all the people living in a country, then it can easily be 100 million
records. Ok, so if we are saying that the cost of 1 comparison is 10 the power -6 second.
If n will be 100 million, time taken will be 100 seconds. 100 seconds for a search is
not reasonable and search may be a frequently performed operation. Can we do something better?
Can we do better than big-oh of n. Well, in an array, we can perform binary search if
its sorted and the running time of binary search is big-oh of log n which is the best
running time to have. I have drawn this array of integers here. Records in the array are
sorted. Here the data type is integer. For some other data type, for some complex data
type, we should be able to sort the collection based on some property or some key of the
records. We should be able to compare the keys of records and the comparison logic will
be different for different data types. For a collection of strings, for example, we may
want to have the records sorted in dictionary or lexicographical order. So, we will compare
and see which string will come first in dictionary order. Now this is the requirement that we
have for binary search. The data structure should be an array and the records must be
sorted. Ok, so the cost of search operation can be minimized if we will use a sorted array.
But, in insertion or removal, we will have to make sure that the array is sorted afterwards.
In this array, if I want to insert number 5 at this stage, i can’t simply put 5 at index
6. what I’ll have to do is, I’ll first have to find the position at which I can insert
5 in the sorted list. We can find the position in order of log n time using binary search.
We can perform a binary search to find the first integer greater than 5 in the list.
So, we can find the position quickly. In this case, its index 2. But then, we will have
to shift all the records starting this position one position to the right. And now, I can
insert 5. So, even though we can find the position at which a record should be inserted
quickly in big-oh of log n, this shifting in worst case will cost us big-oh of n. So,
the running time overall for an insertion will be big-oh of n and similarly, the cost
of removal will also be big-oh of n. We will have to shift some records. Ok, so when we
are using sorted array, cost of search operation is minimized. In binary search for n records,
we will have at max log n to the base 2 comparisons. So, if we can perform million comparisons
in a second, then for n equal 2 to the power 31 which is greater than 2 billion, we are
going to take only 31 micro-seconds. log of 2 to the power 31 to base 2 will be 31. Ok,
we are fine with search now. we will be good for any practical value of n. But what about
insertion and removal. They are still big-oh of n. Can we do something better here? Well,
if we will use this data structure called binary search tree, I am writing it in short
– BST for binary search tree, then the cost of all these 3 operations can be big-oh of
log n in average case. The cost of all the operations will be big-oh of n in worst case.
But we can avoid the worst case by making sure that the tree is always balanced. We
had talked about balanced binary tree in our previous lesson. Binary search tree is only
a special kind of binary tree. To make sure that the cost of these operations is always
big-oh of log n, we should keep the binary search tree balanced. We’ll talk about this
in detail later. Let’s first see what a binary search tree is and how cost of these operations
is minimized when we use a binary search tree. Binary search tree is a binary tree in which
for each node, value of all the nodes in left sub-tree is lesser and value of all the nodes
in right sub-tree is greater. I have drawn binary tree as a recursive structure here.
As we know, in a binary tree, each node can have at most 2 children. We can call one of
the children left child. If we will look at the tree as recursive structure, left child
will be the root of left sub-tree and similarly, right child will be the root of right sub-tree.
Now, for a binary tree to be called binary search tree, value of all the nodes in left
sub-tree must be lesser or we can say lesser or equal to handle duplicates and the value
of all the nodes in right sub-tree must be greater and this must be true for all the
nodes. So, in this recursive structure here, both left and right sub-trees must also be
binary search trees. I’ll draw a binary search tree of integers. Now, I have drawn a binary
search tree of integers here. Lets see whether this property that for each node value of
all the nodes in left subtree is lesser or equal and the value of all the nodes in right
sub-tree must be greater is true or not. Lets first look at the root node. Nodes in the
left subtree have values 10, 8 and 12. So, they are all lesser than15 and in right subtree,
we have 17, 20 and 25, they are all greater than 15. So, we are good for the root node.
Now, lets look at this node with value 10. In left, we have 8 which is lesser. In right,
we have 12 which is greater. So, we are good. We are good for this node too having value
20 and we don’t need to bother about leave nodes because they do not have children. So,
this is a binary search tree. Now, what if I change this value 12 to 16. Now, is this
still a binary search tree. Well, for node with value 10, we are good. The node with
value 16 is in its right. So, not a problem. But for the root node, we have a node in left
sub-tree with higher value now. So, this tree is not a binary search tree. I’ll revert back
and make the value 12 again. Now, as we were saying we can search, insert or delete in
a binary search tree in big-oh of log n time in average case. How is it really possible?
Lets first talk about search. If these integers that I have here in the tree were in a sorted
array, we could have performed binary search and what do we do in binary search. Lets say
we want to search number 10 in this array. What we do in binary search is, we first define
the complete list as our search space. The number can exist only within the search space.
I’ll mark search space using these two pointers – start and end. Now, we compare the number
to be searched or the element to be searched with mid element of the search space or the
median. And if the record being searched, if the element being searched is lesser, we
go searching in the left half, else we go searching in the right half. In case of equality,
we have found the element. In this case, 10 is lesser than 15. So, we will go searching
towards left. Our search space is reduced now to half. Once again, we will compare to
the mid-element and bingo, this time, we have got a match. In binary search, we start with
n elements in search space and then if mid element is not the element that we are looking
for, we reduce the search space to n/2 and we go on reducing the search space to half,
till we either find the record that we are looking for or we get to only one element
in search space and be done. In this whole reduction, if we will go from n to n/2 to
n/4 to n/8 and so on, we will have log n to the base 2 steps. If we are taking k steps,
then n upon 2 to the power k will be equal to 1 which will imply 2 to the power k will
be equal to n and k will be equal to log n to the base 2. So, this is why running time
of binary search is big-oh of log n. Now, if we will use this binary search tree to
store the integers, search operation will be very similar. Lets say, we want to search
for number 12. What we’ll do is, we start at root and then we will compare the value
to be searched, the integer to be searched with value of the root. if its equal, we are
done with the search, if its lesser, we know that we need to go to the left sub-tree because
in a binary search tree, all the elements in left sub-tree are lesser and all the elements
in right sub-tree are greater. Now, we will go and look at the left child of node with
value 15. We know that number 12 that we are looking for can exist in this sub-tree only
and anything apart from the sub-tree is discarded. So, we have reduced the search space to only
these 3 nodes having value 10, 8 and 12. Now, once again, we will compare 12 with 10. We
are not equal. 12 is greater, so we know that we need to go looking in right sub-tree of
this node with value 10. So, now our search space is reduced to just one node. Once again,
we will compare the value here at this node and we have a match. So, searching an element
in binary search tree is basically this traversal in which at each step, we will go either towards
left or right and hence at each step, we will discard one of the sub-trees. if the tree
is balanced, we call a tree balanced if for all nodes, the difference between the heights
of left and right sub-trees is not greater than 1. So, if the tree is balanced, we will
start with a search space of n nodes and when we will discard one of the sub-trees, we will
discard n/2 nodes. So, our search space will be reduced to n/2 and then in next step, we
will reduce the search space to n/4. We will go on reducing like this till we find the
element or till our search space is reduced to only one node when we will be done. So,
the search here is also a binary search. And that’s why the name – Binary Search Tree.
This tree that I am showing here is balanced. In fact this is a perfect binary tree. But
with same records, we can have an unbalanced tree like this. This tree has got the same
integer values as we had in the previous structure and this is also a binary search tree, but
this is unbalanced. This is as good as a linked list. In this tree there is no right sub-tree
for any of the nodes. Search space will be reduced by only one.at each step. From n nodes
in search space, we will go to n-1 nodes and then to n-2 nodes, all the way till 1 will
be n steps. In binary search tree, in average case, cost of search, insertion or deletion
is big-oh of log n and in worst case, this is the worst case arrangement that I am showing
you. The running time will be big-oh of n. We always try to avoid the worst case by trying
to keep the binary search tree balanced. With same records in the tree, there can be multiple
possible arrangements. For these integers in this tree, another arrangement is this.
For all the nodes, we have nothing to discard in left sub-tree in a search. This is another
arrangement. This is still balanced because for all the nodes, the difference between
the heights of left and right sub-tree is not greater than 1. But this is the best arrangement
when we have a perfect binary tree. At each step, we will have exactly n/2 nodes to discard.
Ok, now to insert some records in binary search tree, we will first have to find the position
at which we can insert and we can find the position in big-oh of log n time. Lets say
we want to insert 19 in this tree, what we will do is start at the root. If the value
to be inserted is lesser or equal, if there is no child, insert as left child or go left.
If the value is greater and there is no right child, insert as right child or go right.
In this case, 19 is greater, so we will go right. Now, we are at 20. 19 is lesser and
left sub-tree is not empty. We have a left child. So, we will go left. Now, we are at
17, 19 is greater than 17. So, it should go in right of 17. There is no right child of
17. So, we will create a node with value 19 and link it to this node with value 17 as
right child. Because we are using pointers or references here just like linked list,
no shifting is needed like an array. Creating a link will take constant time. So, overall
insertion will also cost us like search. To delete also, we will first have to search
the node. Search once again will be big-oh of log n and deleting the node will only mean
adjusting some links. So, removal also is going to be like search – big-oh of log n
in average case. Binary search tree gets unbalanced during insertion and deletion. So, often during
insertion and deletion, we restore the balancing. There are ways to do it and we will talk about
all of this in detail in later lessons. In next lesson, we will discuss implementation
of binary search tree in detail. This is it for this lesson. Thanks for watching.