Parse Paginated Results
Level: L3 Topics: API Design, Pagination, Heaps, Memory Optimization
Problem Statement
You are building a search backend that uses an external SmartSearch API. This API returns results in pages. You must implement client-side logic to process the paginated results.
The SmartSearch API provides:
startQuery(query_string) -> QueryResponse: Initiates a search and returns the first page of results.
Each QueryResponse contains:
photo_ids: a list of photo IDs in this page.relevancy_scores: a list of relevancy scores (floats between 0 and 1) corresponding to each photo.continuation_token: a token to fetch the next page, ornullif this is the last page.
To get the next page, call continueQuery(continuation_token) -> QueryResponse.
Implement the following:
- Part A: Return all photos with relevancy score >= 0.9.
- Part B: Return the top K photos with relevancy score >= 0.9, sorted by relevancy in descending order.
Background & Constraints
- The total number of results can be very large (millions of photos across many pages).
- Each page contains a bounded number of results (e.g., 100).
- Relevancy scores are not guaranteed to be in any particular order across pages.
- For Part B, you should aim for O(K) memory usage, not O(total results).
Examples
Page 1: {photo_ids: [1, 2, 3], scores: [0.95, 0.85, 0.92], token: "page2"}
Page 2: {photo_ids: [4, 5, 6], scores: [0.91, 0.78, 0.99], token: null}
Part A result: Photos [1, 3, 4, 6] (scores >= 0.9)
Part B (K=2) result: Photos [6, 1] (top 2 by score: 0.99, 0.95)
Hints & Common Pitfalls
- Part A is straightforward: iterate through all pages, filter by threshold, collect results. Memory usage is proportional to the number of qualifying results.
- Part B requires more thought to achieve O(K) memory:
- Use a min-heap of size K. As you process each qualifying photo, add it to the heap. If the heap exceeds size K, remove the smallest element.
- At the end, the heap contains the top K results. Extract them in sorted order.
- A common mistake is collecting all qualifying results first and then sorting — this uses O(total qualifying results) memory instead of O(K).
- Handle the pagination loop correctly: keep calling
continueQueryuntilcontinuation_tokenis null. - Do not assume pages are sorted by relevancy.
Follow-Up Questions
- O(K) memory vs. sort-at-end: Compare the min-heap approach (O(K) memory, O(N log K) time) with collecting everything and sorting (O(N) memory, O(N log N) time). When is each approach preferable? Consider: what if K is close to N? What if memory is the primary bottleneck?