Heapify in Linear Time
Understanding the linear time complexity of heapify.
Binary Heap is an extremely useful data structure with applications from sorting (HeapSort) to priority queues and can be either implemented as a MinHeap
or MaxHeap
. It is essentially a balanced binary tree with the property that the value of each parent node is less than or equal to any of its children for the MinHeap
implementation and greater than or equal to any of its children for the MaxHeap
implementation.
Python’s heap implementation is given by the heapq
module¹ as a MinHeap
. For the rest of this article, to make things simple, we will consider the Python heapq
module unless stated otherwise.
Four of the most used operations supported by heaps along with their time complexities are:
min
/max
depending onMinHeap
orMaxHeap
implementation:O(1)
, get element at index 0 of the heap.push
:O(log(n))
,heapq.heappush(heap: List[Any], k: int)
in Pythonpop
:O(log(n))
,heapq.heappop(heap: List[Any], k: int)
heapify
orbuild_heap
:O(n)
,heapq.heapify(heap: List[Any])
The first three in the above list are quite straightforward to understand based on the fact that the heaps are balanced binary trees. The tricky operation is the fourth one, heapify
!
Naively, we would expect heapify
to be an O(n log(n))
operation: if we form the heap one element at a time for n
elements, using the push
operation which costs O(log(n))
each time, we get O(n log(n))
time complexity. Then why is heapify
an operation of linear time complexity?
heapify
takes a list of values as a parameter and then builds the heap in place and in linear time. Let us try to look at what heapify
is doing through the initial list[9, 7, 10, 1, 2, 13, 4]
as an example to get a better sense of its time complexity:
Going back to the definition of the heap, each of the subtrees should also be a heap, and so the algorithm starts forming the heap from the leaf nodes and goes all the way to the root node while ensuring the subtrees remain heaps:
1. All the leaf nodes are already heap, so do nothing for them and go one level up:
2. The node with value 7 and the node with value 1 need to be swapped as 7 > 1 and 2 > 1:
3. The node with value 10 and the node with value 4 need to be swapped as 10 > 4 and 13 > 4:
4. Now we move up one level, the node with value 9 and the node with value 1 need to be swapped as 9 > 1 and 4 > 1:
5. Now the left subtree rooted at the node with value 9 is no longer a heap, we will need to swap node with value 9 and node with value 2 in order to make it a heap:
6. Finally we have our heap [1, 2, 4, 7, 9, 13, 10]
:
Based on the above algorithm, let us try to calculate the time complexity. For a node at level l
, with upto k
nodes, and each node being the root of a subtree with max possible height h
, we have the following equations:
h = log(n) — l => l = log(n) — h
k = 2^l = 2^( log(n) — h) = n/(2^h)
So for each level of the heap, we have O(n/(2^h) * log(h))
time complexity. Summing up all levels, we get time complexity T
:
T = ∑ (n/(2^h) * log(h)) = n * ∑ (log(h)/(2^h))
for 0 ≤ h ≤ log(n)
As ∑ (log(h)/(2^h)) = 2
, for n -> ∞
,²
T = n * 2 = O(n)
Hence the linear time complexity for heapify
!
References:
[1] https://docs.python.org/3/library/heapq.html#heapq.heapify
[2] https://youtu.be/Ch9uCBm-kUE
More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter and LinkedIn. Join our community Discord.