Skip to content

Implement a TCP Stream

Level: L4-L5 Topics: Data Structures, Buffers, Networking Concepts

Problem Statement

Implement a minimal TCP-like stream reassembly layer. Packets arrive out of order, and you must deliver data to the reader in the correct sequential order.

Implement a class with two methods:

void getPacket(long offset, byte[] data)
int read(byte[] dest)
  • getPacket(offset, data): Called when a packet arrives. offset is the byte position in the stream where data starts. Use a 64-bit type — a long-running TCP-like stream will exceed 2 GB, and you should not have to reason about sequence-number wrap-around for this problem.
  • read(dest): Copies as many contiguous bytes as possible starting from the current read position into dest. Returns the number of bytes copied. If no data is available at the current read position, returns 0.

Background & Constraints

  • Packets can arrive in any order.
  • In the basic version, assume no duplicate packets and no overlapping data between packets.
  • The stream starts at offset 0.
  • The read method maintains a cursor. Each call continues reading from where the last call left off.
  • dest has a fixed capacity — do not write beyond it.
  • Memory is limited, so consider cleaning up data that has already been read.

Examples

stream = new TcpStream()
buf = new byte[16]                   // destination buffer

stream.getPacket(4, [E, F, G])      // bytes 4-6 arrive
stream.read(buf)                     // returns 0 (byte 0 not yet available)

stream.getPacket(0, [A, B, C, D])   // bytes 0-3 arrive
stream.read(buf)                     // returns 7, buf[0..6] = [A, B, C, D, E, F, G]

stream.getPacket(10, [K, L])         // bytes 10-11 arrive
stream.read(buf)                     // returns 0 (byte 7 not yet available)

stream.getPacket(7, [H, I, J])      // bytes 7-9 arrive
stream.read(buf)                     // returns 5, buf[0..4] = [H, I, J, K, L]

Partial-read example (illustrates the mid-packet cursor):

stream = new TcpStream()
buf3 = new byte[3]                   // small destination

stream.getPacket(0, [A, B, C, D, E]) // bytes 0-4 arrive in one packet
stream.read(buf3)                    // returns 3, buf3 = [A, B, C]
                                     // packet is only partially consumed;
                                     // the buffered packet still holds D, E
stream.read(buf3)                    // returns 2, buf3 = [D, E, _]
                                     // cursor resumes mid-packet at offset 3

Hints & Common Pitfalls

  • Core data structure: Use a map (sorted by offset) or a buffer array to store received-but-not-yet-readable packets. A TreeMap<Long, byte[]> (or equivalent sorted map) keyed by offset works well.

  • Contiguity check: When read is called, look at the smallest-offset packet still in the buffer. If it covers the current read position (offset ≤ cursor < offset + length), copy bytes from there and advance the cursor; otherwise there is a gap and you stop. Continue across packets until the next gap.

  • Partial reads: The dest buffer may be smaller than the available contiguous data. Only copy up to dest.length bytes and leave the rest for the next read call. This means you may need to partially consume a packet.

  • Memory cleanup: Remove packets from the buffer once they have been fully read. If you do not clean up, memory usage grows unboundedly.

  • Edge case — reading before any packets arrive: read should return 0 immediately.

  • Edge case — mid-packet read cursor: If a previous read consumed only part of a packet (because dest was too small), the next read must continue from the middle of that packet. You cannot drop a partially-read packet from the buffer — track an intra-packet offset (or slice the byte array) so the next read resumes correctly.

Follow-Up Questions

  1. Allow duplicate packets: Packets with the same offset and data may arrive more than once. How do you handle this without corrupting the stream or wasting memory?

  2. Allow overlapping packets: Two packets may cover partially overlapping byte ranges (possibly with the same data, possibly with conflicting data). How do you resolve overlaps? What if the overlapping data differs — which takes precedence?

  3. Tight memory limit: Suppose you can only buffer a fixed number of bytes (e.g., a receive window of size W). If a packet arrives that would exceed the buffer capacity, you must drop it. How does this affect your design? How do you signal to the sender that data was dropped?

  4. Thread safety: If getPacket and read can be called concurrently from different threads, what synchronization is needed? How do you minimize lock contention?

  5. Performance: What is the time complexity of getPacket and read in your implementation? With a sorted map keyed by offset, every getPacket call carries an O(log N) factor (where N is the number of buffered packets), since arbitrary out-of-order offsets have to be ordered somehow. (A min-heap gives you the smallest-offset peek you need for the contiguity check, but it cannot dedupe or update existing entries — sorted map is the cleaner choice.) Under what additional assumption can you drop the log factor and achieve O(1) amortized per byte for both operations? (Hint: cap the maximum out-of-order distance and back the buffer with a fixed-size ring buffer indexed by offset % W, plus a per-slot "filled" bitmap for the contiguity check. What does that buy you, and what does it cost when the assumption is violated?)