Problem. Given sorted boundaries and a list of values, classify each value into a bucket. Buckets: (-inf, b[0]) = 0, [b[0], b[1]) = 1, …, [b[-1], inf) = len(b).
Examples:
boundaries = [10, 20, 30]
values = [5, 10, 15, 25, 35]
bucket(boundaries, values) # → [0, 1, 1, 2, 3]
boundaries = [0]
values = [-1, 0, 1]
bucket(boundaries, values) # → [0, 1, 1]
Solution. bisect_right gives the bucket index directly.
import bisect
def bucket(boundaries, values):
return [bisect.bisect_right(boundaries, v) for v in values]
O(m log n) where m = len(values), n = len(boundaries).
Gotchas.
bisect_right(notbisect_left) gives the correct bucket because boundaries are inclusive on the left: value == boundary falls into the next bucket.- Wait — actually
bisect_righton boundaryb[i]returnsi+1whenv == b[i], which maps to bucket[b[i], b[i+1]). That matches the spec. - Empty boundaries list: every value maps to bucket 0.
- For very large value lists, numpy
searchsortedis faster.