Problem. The naive find_duplicates below is O(n³) because of the nested loop plus the in check on a list. Refactor to O(n).
# Slow version
def find_duplicates_slow(lst):
duplicates = []
for i in range(len(lst)):
for j in range(i + 1, len(lst)):
if lst[i] == lst[j] and lst[i] not in duplicates:
duplicates.append(lst[i])
return duplicates
Examples:
[1, 3, 2, 3, 1, 5, 2]→[3, 1, 2](order of first duplicate occurrence)[1, 2, 3]→[][5, 5, 5, 5]→[5][]→[]
Solution. Single pass with a seen set and a duplicates list (plus a set for O(1) membership).
def find_duplicates(lst):
seen = set()
dup_set = set()
duplicates = []
for x in lst:
if x in seen:
if x not in dup_set:
duplicates.append(x)
dup_set.add(x)
else:
seen.add(x)
return duplicates
O(n) time, O(n) space.
Gotchas.
- Preserve the order of first duplicate occurrence — use a list for output and a set for O(1) dedup.
- The slow version is O(n³), not O(n²), because
lst[i] not in duplicatesis O(len(duplicates)) and duplicates grows with n. - If memory is very limited, sort first (O(n log n)) and scan for adjacent equals.
- For a stream, maintain the
seenanddup_setincrementally.
Run: find duplicates
else:
seen.add(x)
return duplicates
O(n) time, O(n) space.
**Gotchas.**
- Preserve the order of first duplicate occurrence — use a list for output and a set for O(1) dedup.
- The slow version is O(n³), not O(n²), because `lst[i] not in duplicates` is O(len(duplicates)) and duplicates grows with n.
- If memory is very limited, sort first (O(n log n)) and scan for adjacent equals.
- For a stream, maintain the `seen` and `dup_set` incrementally.
[Run: find duplicates](https://www.quantfinanceprep.com/interview/pyscript_runner.html?snippet=coding-find-duplicates&embedded=1)