Refactor a slow function: O(n³) duplicates finder to O(n)

Problem. The naive find_duplicates below is O(n³) because of the nested loop plus the in check on a list. Refactor to O(n).

# Slow version
def find_duplicates_slow(lst):
    duplicates = []
    for i in range(len(lst)):
        for j in range(i + 1, len(lst)):
            if lst[i] == lst[j] and lst[i] not in duplicates:
                duplicates.append(lst[i])
    return duplicates

Examples:

  • [1, 3, 2, 3, 1, 5, 2][3, 1, 2] (order of first duplicate occurrence)
  • [1, 2, 3][]
  • [5, 5, 5, 5][5]
  • [][]

Solution. Single pass with a seen set and a duplicates list (plus a set for O(1) membership).

def find_duplicates(lst):
    seen = set()
    dup_set = set()
    duplicates = []
    for x in lst:
        if x in seen:
            if x not in dup_set:
                duplicates.append(x)
                dup_set.add(x)
        else:
            seen.add(x)
    return duplicates

O(n) time, O(n) space.

Gotchas.

  • Preserve the order of first duplicate occurrence — use a list for output and a set for O(1) dedup.
  • The slow version is O(n³), not O(n²), because lst[i] not in duplicates is O(len(duplicates)) and duplicates grows with n.
  • If memory is very limited, sort first (O(n log n)) and scan for adjacent equals.
  • For a stream, maintain the seen and dup_set incrementally.

Run: find duplicates
else:
seen.add(x)
return duplicates


O(n) time, O(n) space.

**Gotchas.**
- Preserve the order of first duplicate occurrence — use a list for output and a set for O(1) dedup.
- The slow version is O(n³), not O(n²), because `lst[i] not in duplicates` is O(len(duplicates)) and duplicates grows with n.
- If memory is very limited, sort first (O(n log n)) and scan for adjacent equals.
- For a stream, maintain the `seen` and `dup_set` incrementally.

[Run: find duplicates](https://www.quantfinanceprep.com/interview/pyscript_runner.html?snippet=coding-find-duplicates&embedded=1)