Find median from data stream: two heaps that keep the halves balanced

Problem. Design a structure that supports inserting numbers and querying the median of all numbers seen so far.

This is the classic online order-statistics with heaps pattern.

Examples:

  • After adding 1, the median is 1.0
  • After adding 1 and then 2, the median is 1.5
  • After adding 1, 2, and 3, the median is 2.0

Solution. Keep the lower half in a max-heap and the upper half in a min-heap. Rebalance so their sizes differ by at most one. The median is then either the top of the larger heap or the average of the two tops.

import heapq


class MedianFinder:
    def __init__(self):
        self.lower = []  # max-heap via negated values
        self.upper = []  # min-heap

    def add_num(self, num):
        heapq.heappush(self.lower, -num)
        heapq.heappush(self.upper, -heapq.heappop(self.lower))

        if len(self.upper) > len(self.lower):
            heapq.heappush(self.lower, -heapq.heappop(self.upper))

    def find_median(self):
        if len(self.lower) > len(self.upper):
            return float(-self.lower[0])
        return (-self.lower[0] + self.upper[0]) / 2

O(log n) per insertion, O(1) per median query, O(n) space.

Gotchas.

  • Python only has a min-heap, so the lower half is stored as negated values.
  • Rebalancing after every insert keeps the median query trivial.
  • Be explicit about returning a float when the count of values is even.

Run: Find Median From Data Stream