Domain Name Scoring
Level: L3-L4 Topics: Trees, Hash Maps, String Processing
Problem Statement
You are given a list of domain names, each with an associated integer score. A leaf domain is one that has no children present in the input list.
For each leaf domain, compute its total score as the sum of its own score plus the scores of all its ancestor domains that appear in the input.
Output all leaf domains with their computed total scores.
Background & Constraints
- Domain hierarchy is determined by the domain name structure:
mail.test.mydomain.comis a child oftest.mydomain.com, which is a child ofmydomain.com. - A domain
Ais an ancestor of domainBifBends with.A(not just containsAas a substring). - Only domains that appear in the input list contribute scores. Implicit intermediate domains that are not in the input have a score of 0.
- A domain is a leaf if no other domain in the input is a proper subdomain of it.
- Scores can be negative.
Examples
Input:
Analysis:
| Domain | Children in input? | Leaf? |
|---|---|---|
mydomain.com |
test.mydomain.com, api.mydomain.com |
No |
test.mydomain.com |
mail.test.mydomain.com |
No |
mail.test.mydomain.com |
None | Yes |
api.mydomain.com |
None | Yes |
Output:
Another example with shared suffixes:
Output:
Hints & Common Pitfalls
-
Do not use substring matching. The domain
domain.comis NOT an ancestor ofmydomain.com. Always check that the candidate parent is a proper suffix preceded by a dot (or is the full remaining string). Use an "ends with.+ parent" check. -
Building the hierarchy: One approach is to insert all domains into a hash map (domain -> score). For each domain, walk up the hierarchy by stripping the leftmost label (e.g.,
mail.test.mydomain.com->test.mydomain.com->mydomain.com->com) and accumulate scores from any ancestor found in the map. -
Identifying leaves: A domain is a leaf if no other domain in the input ends with
.<this domain>. You can determine this efficiently by building a set and checking for children. -
Trie-based approach: For large inputs, building a trie on reversed domain labels (e.g.,
com->mydomain->test->mail) gives efficient ancestor lookups and lets you identify leaves as trie nodes with no children that correspond to input entries.
Follow-Up Questions
-
Scale: Suppose you have billions of domain-score pairs that do not fit in memory. How would you solve this problem using external storage or a MapReduce-style framework? What would be your key/value scheme for the map and reduce phases?
-
Streaming updates: If domain scores are updated frequently, how would you maintain leaf total scores incrementally without recomputing from scratch?
-
Multiple queries: If you need to answer many queries of the form "What is the total score of domain X?" (where X may or may not be a leaf), how would you preprocess the data for efficient lookups?