Processing Highway Toll Logs
Level: L3-L5 Topics: Hash Maps, Sorting, Batch Processing, System Design, MapReduce
Problem Statement
A highway has N checkpoints (toll stations). Each time a vehicle enters or exits the highway, the corresponding checkpoint records a log entry:
At the end of the day, a batch process computes the toll charge for each vehicle. The charge formula is:
- Base fee: A fixed amount per trip.
- Distance fee: A per-unit charge based on the distance traveled (from entry checkpoint to exit checkpoint).
You receive the full day's logs as an unsorted list and must output the total charge for each license plate.
Background & Constraints
- Checkpoints are numbered but their physical order (which is checkpoint 1 vs. checkpoint 5 on the road) is given by a separate API or mapping.
- A vehicle enters at one checkpoint and exits at another. The two log entries for a single trip share the same license plate.
- Distance between checkpoints can be computed or looked up.
- Vehicles may make multiple trips in a day (enter and exit multiple times).
- Logs are unsorted and may arrive from different checkpoints in any interleaved order.
Examples
Checkpoints (in highway order): A — B — C — D — E
Distances: A-B = 10km, B-C = 20km, C-D = 15km, D-E = 25km
Logs:
(B, "ABC-123", 08:00)
(D, "XYZ-789", 08:15)
(D, "ABC-123", 09:30)
(A, "XYZ-789", 07:00)
Pairing by license plate and time:
"XYZ-789": entered A at 07:00, exited D at 08:15 → distance = 10+20+15 = 45km
"ABC-123": entered B at 08:00, exited D at 09:30 → distance = 20+15 = 35km
Charges (base_fee = $2.00, rate = $0.10/km):
"XYZ-789": $2.00 + 45 × $0.10 = $6.50
"ABC-123": $2.00 + 35 × $0.10 = $5.50
Hints & Common Pitfalls
-
Group logs by license plate. Use a hash map from license plate to a list of
(checkpoint, timestamp)entries. Sort each list by timestamp. -
Pair entry and exit. After sorting by timestamp, pair consecutive entries: the 1st and 2nd form a trip, the 3rd and 4th form another trip, and so on. An odd number of entries for a plate indicates an error (vehicle still on highway or missing log).
-
Computing distance: Precompute a prefix sum of distances along the highway. The distance between checkpoints
iandjis|prefix[j] - prefix[i]|. This handles bidirectional travel automatically. -
Clarify the fee formula with your interviewer. Is it purely distance-based? Is there a time component? Does the direction matter?
-
Bidirectional travel: A vehicle might enter at checkpoint D and exit at checkpoint B (traveling in the opposite direction). The distance is the same regardless of direction -- use absolute value.
Follow-Up Questions
-
Precomputing distances: If checkpoints can be added or removed, how would you maintain the distance lookup? Would a prefix sum array still work, or do you need a more flexible structure?
-
Multiple entries per day: A vehicle enters and exits 5 times. How do you correctly pair entries? What if a vehicle enters, exits, and re-enters at the same checkpoint?
-
Billions of logs (scale): The highway has thousands of checkpoints and millions of vehicles. How would you process this with MapReduce?
- Map phase: Key by license plate, value is the log entry.
- Reduce phase: For each plate, sort entries by time, pair them, compute charges.
- What is the shuffle cost? How do you handle data skew (one plate with many trips)?
-
Unreliable checkpoints: Some checkpoints occasionally fail to log a vehicle. How do you detect and handle missing entries? Can you estimate the trip using surrounding checkpoints?
-
Real-time processing: Instead of end-of-day batch, you want to compute charges as vehicles exit. How does the architecture change? What data structure do you maintain for vehicles currently on the highway?