Skip to content

Best Anagram Finder

Level: L3-L4 Topics: Hashing, Anagrams, Precomputation, Greedy Scoring

Problem Statement

You are given a dictionary of words and a hash function hashAnagram(word) that produces the same hash value for any two words that are anagrams of each other (and different hash values for non-anagrams).

Build a class BestAnagramFinder that supports two operations:

  1. calculateScore(word) -- Compute the "score" of a word, defined as the sum of absolute differences between adjacent characters:

    score("abc") = |'a'-'b'| + |'b'-'c'| = 1 + 1 = 2
    score("cab") = |'c'-'a'| + |'a'-'b'| = 2 + 1 = 3
    
  2. findBestAnagram(word) -- Given a word, return the anagram from the dictionary that has the highest score. If the word itself is in the dictionary, it is a candidate too.

Background & Constraints

  • The dictionary can contain hundreds of thousands of words.
  • findBestAnagram will be called many times, so it must be fast.
  • hashAnagram is provided as a black box -- you can assume it works correctly and runs in O(len(word)) time.
  • Two words are anagrams if they contain exactly the same characters with the same frequencies, just in a different order.

Examples

dictionary = ["listen", "silent", "enlist", "tinsel", "abc", "cab"]

finder = BestAnagramFinder(dictionary)

finder.calculateScore("listen")
# |'l'-'i'| + |'i'-'s'| + |'s'-'t'| + |'t'-'e'| + |'e'-'n'| = 3 + 10 + 1 + 15 + 9 = 38

finder.calculateScore("silent")
# |'s'-'i'| + |'i'-'l'| + |'l'-'e'| + |'e'-'n'| + |'n'-'t'| = 10 + 3 + 7 + 9 + 6 = 35

finder.calculateScore("enlist")
# |'e'-'n'| + |'n'-'l'| + |'l'-'i'| + |'i'-'s'| + |'s'-'t'| = 9 + 2 + 3 + 10 + 1 = 25

finder.calculateScore("tinsel")
# |'t'-'i'| + |'i'-'n'| + |'n'-'s'| + |'s'-'e'| + |'e'-'l'| = 11 + 5 + 5 + 14 + 7 = 42

finder.findBestAnagram("listen")  → "tinsel"  (score 42, highest among the 4 anagrams)

finder.calculateScore("abc")  → |'a'-'b'| + |'b'-'c'| = 1 + 1 = 2
finder.calculateScore("cab")  → |'c'-'a'| + |'a'-'b'| = 2 + 1 = 3

finder.findBestAnagram("abc")  → "cab"  (score 3 > score 2)

Tie-breaking note. If two anagrams have the same score, this problem leaves the choice unspecified — clarify with your interviewer. A common rule is "return the lexicographically smallest", which is what the reference implementation should document. (Example: scoring bac would also give 3, tying with cab; we keep bac out of the dictionary above to avoid the ambiguity.)

Hints & Common Pitfalls

  1. Preprocessing is the key to efficiency. During construction, group all dictionary words by their anagram hash. For each group, compute the score of every word and record only the word with the highest score.

  2. Optimal complexity:

    • Preprocessing: O(N * L) where N is the number of dictionary words and L is the average word length. You hash each word and compute its score once.
    • findBestAnagram lookup: O(L) -- just hash the query word and look up the precomputed best anagram in a hash map. This gives you effectively O(1) after hashing.
  3. Common mistake: Recomputing scores at query time or iterating through all anagrams on each call. This turns each query into O(N) instead of O(1).

  4. Edge cases: A word with no anagrams in the dictionary should return itself (if it is in the dictionary) or indicate no result. Clarify with your interviewer.

  5. The hash function is a black box. Do not try to implement your own anagram hash unless asked. A common approach is to sort the characters of the word, but the problem gives you hashAnagram for free.

Follow-Up Questions

  1. What if you also need to return the k best anagrams instead of just the single best? How does your data structure change?

  2. How would you implement hashAnagram yourself? What are the trade-offs between sorting-based hashing (sort the characters), prime-product hashing (assign each letter a prime and multiply), and frequency-count hashing?

  3. What if the dictionary is updated dynamically (words added and removed)? How would you maintain the best-anagram lookup efficiently?

  4. Can you prove that your preprocessing correctly identifies the best anagram for every possible query, even for query words not in the dictionary?