Skip to content

Parse Paginated Results

Level: L3 Topics: API Design, Pagination, Heaps, Memory Optimization

Problem Statement

You are building a search backend that uses an external SmartSearch API. This API returns results in pages. You must implement client-side logic to process the paginated results.

The SmartSearch API provides:

  • startQuery(query_string) -> QueryResponse: Initiates a search and returns the first page of results.

Each QueryResponse contains:

  • photo_ids: a list of photo IDs in this page.
  • relevancy_scores: a list of relevancy scores (floats between 0 and 1) corresponding to each photo.
  • continuation_token: a token to fetch the next page, or null if this is the last page.

To get the next page, call continueQuery(continuation_token) -> QueryResponse.

Implement the following:

  • Part A: Return all photos with relevancy score >= 0.9.
  • Part B: Return the top K photos with relevancy score >= 0.9, sorted by relevancy in descending order.

Background & Constraints

  • The total number of results can be very large (millions of photos across many pages).
  • Each page contains a bounded number of results (e.g., 100).
  • Relevancy scores are not guaranteed to be in any particular order across pages.
  • For Part B, you should aim for O(K) memory usage, not O(total results).

Examples

Page 1: {photo_ids: [1, 2, 3], scores: [0.95, 0.85, 0.92], token: "page2"} Page 2: {photo_ids: [4, 5, 6], scores: [0.91, 0.78, 0.99], token: null}

Part A result: Photos [1, 3, 4, 6] (scores >= 0.9)

Part B (K=2) result: Photos [6, 1] (top 2 by score: 0.99, 0.95)

Hints & Common Pitfalls

  • Part A is straightforward: iterate through all pages, filter by threshold, collect results. Memory usage is proportional to the number of qualifying results.
  • Part B requires more thought to achieve O(K) memory:
    • Use a min-heap of size K. As you process each qualifying photo, add it to the heap. If the heap exceeds size K, remove the smallest element.
    • At the end, the heap contains the top K results. Extract them in sorted order.
  • A common mistake is collecting all qualifying results first and then sorting — this uses O(total qualifying results) memory instead of O(K).
  • Handle the pagination loop correctly: keep calling continueQuery until continuation_token is null.
  • Do not assume pages are sorted by relevancy.

Follow-Up Questions

  1. O(K) memory vs. sort-at-end: Compare the min-heap approach (O(K) memory, O(N log K) time) with collecting everything and sorting (O(N) memory, O(N log N) time). When is each approach preferable? Consider: what if K is close to N? What if memory is the primary bottleneck?