PolarSPARC |
Bhaskar S | 10/17/2020 |
Ever had the need to either choose the highest or the lowest valued item from a collection of items ???
The following are some of the examples pertaining to the above question:
The next task to be executed by a Job Scheduler based on a priority
Extract the most frequently used words from a document during Text Processing
It is in these situations, the Binary Heap data structure comes in handy.
A Binary Heap uses the Binary Tree data structure and the must satisfy the following two properties:
Relational :: For a Binary Min Heap, each ancestor node must have a value that is SMALLER than the values of either of its children (left and/or right). For a Binary Max Heap, each ancestor node must have a value that is LARGER than the values of either of its children (left and/or right)
Structural :: Each level of the Binary Tree must be COMPLETELY filled with items from left to right. In other words, all the levels except possibly the leaf node level are filled
Let us go through an example to populate a Binary Max Heap. For our example, we will populate the values: 19, 7, 18, 12, 15, 2, 22 respectively.
To start with, the Binary Max Heap is an empty tree. The first item to add is 19. Once added, the tree would have a single root node as shown in the following illustration:
The second item to add is 7. It will be added as the left child of the root node as shown in the following illustration:
The third item to add is 18. It will be added as the right child of the root node as shown in the following illustration:
As can be seen from the Binary Max Heap illustration above, both the relational and structural properties are maintained.
The fourth item to add is 12. It will be added to the tree as shown in the following illustration:
With the insertion of item 12 above, the Binary Max Heap violates the relational property - the value of the parent node (node 7) of node 12 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:
The fifth item to add is 15. It will be added to the tree as shown in the following illustration:
With the insertion of item 15 above, the Binary Max Heap violates the relational property - the value of the parent node (node 12) of node 15 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:
The sixth item to add is 2. It will be added to the tree as shown in the following illustration:
The last item to add is 22. It will be added to the tree as shown in the following illustration:
With the insertion of item 22 above, the Binary Max Heap violates the relational property - the value of the parent node (node 18) of node 22 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:
With the swap of item 22 above, the Binary Max Heap again violates the relational property - the value of the parent node (node 19) of node 22 is smaller in value. This means, we need to swap the values to maintain the relational property as shown in the following illustration:
With the swap of item 22 above, the Binary Max Heap is in a valid state as shown in the following illustration:
A Binary Heap (Max or Min) tree can be implemented using an array. Assuming A is an array, the root of the Binary Heap tree will be at A[0]. For any element at index I, the left child will be at A[2*I+1] and the right child will be at A[2*I+2]. The parent of any element at index I can be located at A[(I-1)//2], where // represents integer division.
With this information, the Binary Max Heap we just created can be represented in an array form as shown in the following illustration:
In the following sections, we will show how one could implement a Binary Max Heap using Python.
We will use the Python list (array) for storing the items of the Binary Max Heap.
Each item in the Binary Max Heap is an instance of KvPair, which represents a key-value pair and contains the following members:
_key :: This will contain a numerical value such as the priority of a job or the frequency count of a word. In our example, each item value (number) will be the key
_value :: This will contain the value corresponding to the key
The following is the implementation of the KvPair class using Python:
When an item is added to the Binary Max Heap, we need to ensure the relational property is strictly maintained - that is the item at the parent node of the newly added item is larger in value. If not, we swap the items. When an item is appended to the Python list, we know its index location (i). Using the formula (i - 1) // 2, we can calculate the index of its parent in the Python list.
The following is the code implementation to ensure the relational property of the Binary Max Heap is maintained by bubbling up the item with the largest value:
The worst-case time complexity (Big-O notation) for adding an item into a Binary Max Heap is O(log n).
Moving on to pop the item at the root (with the max value) from a Binary Max Heap. When the item at the root is popped out (removed), it needs to be replaced by the item at the farthest leaf node of the Binary Max Heap (last item in the array). This may violate the relational property of the Binary Max Heap and the nodes need to be re-adjusted starting from the root.
We will use our example Binary Max Heap to explain the pop operation as shown in the following illustration:
When the item 22 at the root is removed from a Binary Max Heap, it will be replaced by the item 18 in the last leaf node as shown in the following illustration:
With the last item 18 is moved to the root, the Binary Max Heap violates the relational property - the value of the parent node (node 18) is smaller in value than its right child (node 19). This means, we need to swap the values to maintain the relational property as shown in the following illustration:
With the swap of item 18 above, we continue the process down the Binary Max Heap till it reaches a valid state as shown in the following illustration:
The following is the code implementation to ensure the relational property of the Binary Max Heap is maintained by bubbling down the item with the smaller value:
When the last item is moved to the root of the Binary Max Heap, we need to ensure the relational property is strictly maintained - that is the item at the parent node (at index i) is larger in value compared to its left child located at (2 * i + 1) and its right child located at (2 * i + 2). If not, we swap the items.
The worst-case time complexity (Big-O notation) for removing an item into a Binary Max Heap is O(log n).
The following is the complete source code for the Binary Max Heap data structure implemented using Python:
Executing the above Python code will generate the following output:
Size of the Binary Max Heap: 7 Max item of the Binary Max Heap: (22, 22) Current items in the Binary Max Heap (after pop): [(19, 19), (15, 15), (18, 18), (7, 7), (12, 12), (2, 2)]
One can also have a Binary Min Heap data structure where the value at an ancestor node is smaller than the values at its children.
The following is the complete source code for the Binary Min Heap data structure implemented using Python:
Executing the above Python code will generate the following output:
Size of the Binary Min Heap: 7 Min item of the Binary Min Heap: (2, 2) Current items in the Binary Min Heap (after pop): [(7, 7), (12, 12), (18, 18), (19, 19), (15, 15), (22, 22)]
Python provides a module called heapq that implements the features of the Binary Min Heap.
The following shows the use of the heapq module using the Python REPL:
>>> import heapq >>> values = [19, 7, 18, 12, 15, 2, 22] >>> heapq.heapify(values) >>> values [2, 7, 18, 12, 15, 19, 22] >>> heapq.heappop(values) 2 >>> values [7, 12, 18, 22, 15, 19] >>> heapq.heappush(values, 4) >>> values [4, 12, 7, 22, 15, 19, 18] >>> heapq.nsmallest(3, values) [4, 7, 12]