Bucket values into boundary ranges: binary search classification

Problem. Given sorted boundaries and a list of values, classify each value into a bucket. Buckets: (-inf, b[0]) = 0, [b[0], b[1]) = 1, …, [b[-1], inf) = len(b).

Examples:

boundaries = [10, 20, 30]
values = [5, 10, 15, 25, 35]
bucket(boundaries, values)  # → [0, 1, 1, 2, 3]

boundaries = [0]
values = [-1, 0, 1]
bucket(boundaries, values)  # → [0, 1, 1]

Solution. bisect_right gives the bucket index directly.

import bisect


def bucket(boundaries, values):
    return [bisect.bisect_right(boundaries, v) for v in values]

O(m log n) where m = len(values), n = len(boundaries).

Gotchas.

  • bisect_right (not bisect_left) gives the correct bucket because boundaries are inclusive on the left: value == boundary falls into the next bucket.
  • Wait — actually bisect_right on boundary b[i] returns i+1 when v == b[i], which maps to bucket [b[i], b[i+1]). That matches the spec.
  • Empty boundaries list: every value maps to bucket 0.
  • For very large value lists, numpy searchsorted is faster.

Run: bucket values