Problem. Return the top 3 most frequent words from a list.
Rules:
- count words case-insensitively
- return the answer in lowercase
- if two words have the same frequency, the lexicographically smaller word comes first
- if there are fewer than 3 distinct words, return all of them
Example:
words = ["Hello", "world", "hello", "World", "HELLO", "foo"]
Answer:
["hello", "world", "foo"]
This is a frequency table + ranking problem.
Solution. Normalize each word to lowercase, count frequencies, then sort by:
- frequency descending
- word ascending
from collections import Counter
def top_3_frequent_words(words):
counts = Counter(word.lower() for word in words)
ranked = sorted(counts.items(), key=lambda item: (-item[1], item[0]))
return [word for word, _ in ranked[:3]]
Runtime is O(n + m log m) where m is the number of distinct words. Extra space is O(m).
Gotchas.
- Apply lowercase before counting, not after.
- The tie-break is lexicographic order on the normalized lowercase form.
- For arbitrary top
k, a heap can reduce the ranking phase toO(m log k).