Heapify in Linear Time

Understanding the linear time complexity of heapify.

Pujan Dave
Python in Plain English

--

Photo by Aron Visuals on Unsplash

Binary Heap is an extremely useful data structure with applications from sorting (HeapSort) to priority queues and can be either implemented as a MinHeap or MaxHeap. It is essentially a balanced binary tree with the property that the value of each parent node is less than or equal to any of its children for the MinHeap implementation and greater than or equal to any of its children for the MaxHeap implementation.

Python’s heap implementation is given by the heapq module¹ as a MinHeap. For the rest of this article, to make things simple, we will consider the Python heapq module unless stated otherwise.

Four of the most used operations supported by heaps along with their time complexities are:

  1. min/max depending on MinHeap or MaxHeap implementation: O(1), get element at index 0 of the heap.
  2. push: O(log(n)), heapq.heappush(heap: List[Any], k: int) in Python
  3. pop: O(log(n)), heapq.heappop(heap: List[Any], k: int)
  4. heapify or build_heap: O(n), heapq.heapify(heap: List[Any])

The first three in the above list are quite straightforward to understand based on the fact that the heaps are balanced binary trees. The tricky operation is the fourth one, heapify!

Naively, we would expect heapify to be an O(n log(n)) operation: if we form the heap one element at a time for n elements, using the push operation which costs O(log(n)) each time, we get O(n log(n)) time complexity. Then why is heapify an operation of linear time complexity?

heapify takes a list of values as a parameter and then builds the heap in place and in linear time. Let us try to look at what heapify is doing through the initial list[9, 7, 10, 1, 2, 13, 4] as an example to get a better sense of its time complexity:

Initial state before Heapify is called

Going back to the definition of the heap, each of the subtrees should also be a heap, and so the algorithm starts forming the heap from the leaf nodes and goes all the way to the root node while ensuring the subtrees remain heaps:

1. All the leaf nodes are already heap, so do nothing for them and go one level up:

All the leaf nodes are already heap

2. The node with value 7 and the node with value 1 need to be swapped as 7 > 1 and 2 > 1:

Swap 7 and 1

3. The node with value 10 and the node with value 4 need to be swapped as 10 > 4 and 13 > 4:

Swap 10 and 4

4. Now we move up one level, the node with value 9 and the node with value 1 need to be swapped as 9 > 1 and 4 > 1:

Swap 9 and 1

5. Now the left subtree rooted at the node with value 9 is no longer a heap, we will need to swap node with value 9 and node with value 2 in order to make it a heap:

Swap 9 and 2

6. Finally we have our heap [1, 2, 4, 7, 9, 13, 10]:

Heap after heapify has run

Based on the above algorithm, let us try to calculate the time complexity. For a node at level l, with upto k nodes, and each node being the root of a subtree with max possible height h, we have the following equations:

  1. h = log(n) — l => l = log(n) — h
  2. k = 2^l = 2^( log(n) — h) = n/(2^h)

So for each level of the heap, we have O(n/(2^h) * log(h)) time complexity. Summing up all levels, we get time complexity T:

T = ∑ (n/(2^h) * log(h)) = n * ∑ (log(h)/(2^h))

for 0 ≤ h ≤ log(n)

As ∑ (log(h)/(2^h)) = 2, for n -> ∞

T = n * 2 = O(n)

Hence the linear time complexity for heapify!

--

--