Measuring Service Success
Level: L3-L5 Topics: Binary Search, Arrays, Time Series
Problem Statement
A web service tracks signups over time. A monitoring tool logs (nanosecond, user_count) pairs, where user_count is the cumulative number of users who have signed up by that nanosecond.
Write a function that, given a target count, returns the first nanosecond at which the cumulative user count reaches or exceeds the target.
Background & Constraints
- The log is a sorted array of
(timestamp, count)pairs. - Timestamps are in nanoseconds and strictly increasing.
- User counts are non-decreasing (users never unsubscribe in this model).
- The log can be very large (millions of entries).
- If the target is never reached, return -1.
Examples
Log data:
Queries:
firstTimeReaching(12) → 200 (count first reaches 12 at t=200)
firstTimeReaching(15) → 400 (count first reaches 15 at t=400, where count=18)
firstTimeReaching(25) → 500 (count first reaches 25 at t=500)
firstTimeReaching(50) → -1 (never reached)
firstTimeReaching(1) → 100 (count is already 5 >= 1 at first entry)
Hints & Common Pitfalls
-
Binary search on counts: Since user counts are non-decreasing, you can binary search for the first entry where count >= target. This gives O(log N) time.
-
Standard binary search variant: This is the classic "find the leftmost position where a condition is true" binary search. Be careful with the boundary — you want the first index where
count[i] >= target, not just any index. -
Common mistake: Using linear scan when binary search is possible. With millions of entries, O(N) per query is too slow if many queries are expected.
-
Edge cases: Target is 0 (return first entry), target exceeds the final count (return -1), all entries have the same count.
Follow-Up Questions
-
Reduce memory: The log takes a lot of memory. Can you avoid storing the full array? Consider streaming the data or using a compressed representation. How does run-length encoding help when there are long stretches of unchanged counts?
-
Complexity change: If the data is streamed and you cannot random-access it, what is the best time complexity you can achieve? How does this compare to the binary search approach?
-
Longest unchanged sequence: Find the longest contiguous range of timestamps where the user count did not change. What does this tell you about the service? Can you solve this in O(log N) time, or does it require O(N)?
-
Multiple services: If you have logs from K different services and need to find the first time the total count across all services reaches a target, how would you approach this?