Merge k sorted event streams: heap-based timeline merge

Problem. Given k event streams, each sorted by timestamp, merge them into a single globally sorted timeline. Each event is (timestamp, data). Break timestamp ties by original stream order.

Examples:

  • [[(1,"a1"),(4,"a2"),(7,"a3")],[(2,"b1"),(5,"b2")],[(3,"c1"),(6,"c2"),(8,"c3")]][(1,"a1"),(2,"b1"),(3,"c1"),(4,"a2"),(5,"b2"),(6,"c2"),(7,"a3"),(8,"c3")]
  • [[(1,"x")],[]][(1,"x")]
  • [[(1,"a"),(2,"b")],[(1,"c"),(2,"d")]][(1,"a"),(1,"c"),(2,"b"),(2,"d")] (stream 0 before stream 1 on ties)

Solution. Same min-heap approach as merge-k-sorted, but keyed on (timestamp, stream_index) to break ties by stream order.

import heapq


def merge_streams(streams):
    heap = []
    for i, s in enumerate(streams):
        if s:
            ts, data = s[0]
            heapq.heappush(heap, (ts, i, 0, data))
    result = []
    while heap:
        ts, si, ei, data = heapq.heappop(heap)
        result.append((ts, data))
        if ei + 1 < len(streams[si]):
            nts, ndata = streams[si][ei + 1]
            heapq.heappush(heap, (nts, si, ei + 1, ndata))
    return result

O(N log k) time, O(k) space.

Gotchas.

  • Include stream_index in the heap tuple to get stable tie-breaking — otherwise Python will try to compare data strings, which may not be meaningful.
  • Empty streams: skip during initialisation.
  • For real-time (online) merge, the heap naturally handles new events pushed as they arrive.
  • If data is not comparable, use (timestamp, stream_index, element_index) and store data separately.

Run: merge k sorted streams