Implement a TCP Stream
Level: L4-L5 Topics: Data Structures, Buffers, Networking Concepts
Problem Statement
Implement a minimal TCP-like stream reassembly layer. Packets arrive out of order, and you must deliver data to the reader in the correct sequential order.
Implement a class with two methods:
getPacket(offset, data): Called when a packet arrives.offsetis the byte position in the stream wheredatastarts. Use a 64-bit type — a long-running TCP-like stream will exceed 2 GB, and you should not have to reason about sequence-number wrap-around for this problem.read(dest): Copies as many contiguous bytes as possible starting from the current read position intodest. Returns the number of bytes copied. If no data is available at the current read position, returns 0.
Background & Constraints
- Packets can arrive in any order.
- In the basic version, assume no duplicate packets and no overlapping data between packets.
- The stream starts at offset 0.
- The
readmethod maintains a cursor. Each call continues reading from where the last call left off. desthas a fixed capacity — do not write beyond it.- Memory is limited, so consider cleaning up data that has already been read.
Examples
stream = new TcpStream()
buf = new byte[16] // destination buffer
stream.getPacket(4, [E, F, G]) // bytes 4-6 arrive
stream.read(buf) // returns 0 (byte 0 not yet available)
stream.getPacket(0, [A, B, C, D]) // bytes 0-3 arrive
stream.read(buf) // returns 7, buf[0..6] = [A, B, C, D, E, F, G]
stream.getPacket(10, [K, L]) // bytes 10-11 arrive
stream.read(buf) // returns 0 (byte 7 not yet available)
stream.getPacket(7, [H, I, J]) // bytes 7-9 arrive
stream.read(buf) // returns 5, buf[0..4] = [H, I, J, K, L]
Partial-read example (illustrates the mid-packet cursor):
stream = new TcpStream()
buf3 = new byte[3] // small destination
stream.getPacket(0, [A, B, C, D, E]) // bytes 0-4 arrive in one packet
stream.read(buf3) // returns 3, buf3 = [A, B, C]
// packet is only partially consumed;
// the buffered packet still holds D, E
stream.read(buf3) // returns 2, buf3 = [D, E, _]
// cursor resumes mid-packet at offset 3
Hints & Common Pitfalls
-
Core data structure: Use a map (sorted by offset) or a buffer array to store received-but-not-yet-readable packets. A
TreeMap<Long, byte[]>(or equivalent sorted map) keyed by offset works well. -
Contiguity check: When
readis called, look at the smallest-offset packet still in the buffer. If it covers the current read position (offset ≤ cursor < offset + length), copy bytes from there and advance the cursor; otherwise there is a gap and you stop. Continue across packets until the next gap. -
Partial reads: The
destbuffer may be smaller than the available contiguous data. Only copy up todest.lengthbytes and leave the rest for the nextreadcall. This means you may need to partially consume a packet. -
Memory cleanup: Remove packets from the buffer once they have been fully read. If you do not clean up, memory usage grows unboundedly.
-
Edge case — reading before any packets arrive:
readshould return 0 immediately. -
Edge case — mid-packet read cursor: If a previous
readconsumed only part of a packet (becausedestwas too small), the nextreadmust continue from the middle of that packet. You cannot drop a partially-read packet from the buffer — track an intra-packet offset (or slice the byte array) so the nextreadresumes correctly.
Follow-Up Questions
-
Allow duplicate packets: Packets with the same offset and data may arrive more than once. How do you handle this without corrupting the stream or wasting memory?
-
Allow overlapping packets: Two packets may cover partially overlapping byte ranges (possibly with the same data, possibly with conflicting data). How do you resolve overlaps? What if the overlapping data differs — which takes precedence?
-
Tight memory limit: Suppose you can only buffer a fixed number of bytes (e.g., a receive window of size W). If a packet arrives that would exceed the buffer capacity, you must drop it. How does this affect your design? How do you signal to the sender that data was dropped?
-
Thread safety: If
getPacketandreadcan be called concurrently from different threads, what synchronization is needed? How do you minimize lock contention? -
Performance: What is the time complexity of
getPacketandreadin your implementation? With a sorted map keyed by offset, everygetPacketcall carries anO(log N)factor (whereNis the number of buffered packets), since arbitrary out-of-order offsets have to be ordered somehow. (A min-heap gives you the smallest-offset peek you need for the contiguity check, but it cannot dedupe or update existing entries — sorted map is the cleaner choice.) Under what additional assumption can you drop the log factor and achieveO(1)amortized per byte for both operations? (Hint: cap the maximum out-of-order distance and back the buffer with a fixed-size ring buffer indexed byoffset % W, plus a per-slot "filled" bitmap for the contiguity check. What does that buy you, and what does it cost when the assumption is violated?)