Packetize Data Optimally
Level: L3-L5 Topics: Greedy, Math, Division, Edge Cases
Problem Statement
You need to split data of a given length into packets. Each packet has a maximum allowed size. The optimization goals are (in priority order):
- Minimize the number of packets
k. - Among solutions with
kpackets, distribute the data as evenly as possible: every packet must have size eitherfloor(data_length / k)orceil(data_length / k). Exactlydata_length mod kpackets take the larger size; the rest take the smaller. (This makes both the maximum packet and the maximum-minus-minimum spread minimal.)
Implement the function:
The function returns a list of packet sizes that sum to data_length, where each packet is at most max_packet_size.
Background & Constraints
data_lengthandmax_packet_sizeare non-negative integers.- Handle edge cases:
data_length = 0(return empty list),max_packet_size = 0with non-zero data (error or impossible). - Packets should be as equal in size as possible to minimize the largest one.
- The order of packets in the returned list does not matter.
Examples
Input: data_length = 10, max_packet_size = 4
- Minimum packets:
ceil(10 / 4) = 3. - Even distribution with
k = 3:floor(10 / 3) = 3,10 mod 3 = 1. So 1 packet of size 4 and 2 packets of size 3.
Output: [4, 3, 3] (or any permutation)
Why not
[4, 4, 2]? Same packet count (3) and same max (4), but rule 2 requires every packet to be eitherfloor(10/3)=3orceil(10/3)=4. Size 2 is neither, so[4, 4, 2]is rejected.
Input: data_length = 12, max_packet_size = 5
- Minimum packets: ceil(12/5) = 3.
- Even distribution:
[4, 4, 4].
Output: [4, 4, 4]
Hints & Common Pitfalls
- The minimum number of packets is
ceil(data_length / max_packet_size). - To distribute evenly: with
kpackets, each packet gets at leastdata_length / k(integer division). The remainderdata_length % kpackets get one extra unit. - Be careful with integer division and ceiling:
ceil(a/b) = (a + b - 1) / bfor positive integers. - Handle the zero and negative input cases explicitly — do not let them cause division by zero.
- A common mistake is making all packets
max_packet_sizeand leaving a tiny last packet, which does not minimize the largest packet.
Follow-Up Questions
- Header space: The first packet must reserve
Hbytes for a header, so its effective data capacity ismax_packet_size - H. How does this change the calculation? Recompute the minimum packet count accounting for the reduced first packet, then distribute the remaining data evenly across the other packets. - Discrete split positions: Data can only be split at specific positions (e.g., you cannot split in the middle of a record). Given a list of allowed split positions, find the optimal packetization. This becomes a more complex optimization — consider dynamic programming or a greedy approach that picks split points closest to the ideal even distribution.