Contained Characters in Grouped String
Level: L3-L4 Topics: Strings, Streaming, Sliding Window, Hash Maps
Problem Statement
A grouped string is a string where identical characters always appear consecutively (e.g., "AACCCCGGD" -- the As are together, the Cs are together, etc.). Each character in the alphabet appears in at most one contiguous group.
A character is "contained over k" if all of its occurrences fit within at least one contiguous window of size k in the string.
You receive the string as a stream -- one character at a time. At any point, you should be able to return the count of distinct characters that are currently "contained over k."
Background & Constraints
- The string is grouped: no character appears in two separate, non-adjacent groups.
- Characters are streamed one at a time. You cannot look ahead or rewind.
kis given upfront and is fixed.- You need to maintain a running count as each character arrives.
Examples
Example 1:
string = "AACCCCGGD", k = 4
After 'A','A': contained characters over 4: {'A'} → count = 1
(A's group size = 2, fits in window of 4)
After 'A','A','C','C': contained characters over 4: {'A','C'} → count = 2
(A has 2 chars, C has 2 chars so far, "AACC" fits in window of 4)
After 'A','A','C','C','C','C': contained characters over 4: {'A','C'} → count = 2
(C's group is now 4, which fits in window of 4. A's group is 2, fits in 4.)
After 'A','A','C','C','C','C','G','G': contained characters over 4: {'A','C','G'} → count = 3
After 'A','A','C','C','C','C','G','G','D': contained characters over 4: {'A','C','G','D'} → count = 4
Example 2:
string = "AAAAAB", k = 3
After all A's: A's group has 5 characters. Does A fit in a window of 3? No (5 > 3).
After B: B has 1 character, fits in window of 3.
Final contained count = 1 (only B)
Hints & Common Pitfalls
-
Since the string is grouped, each character's occurrences form a single contiguous block. You only need to track the size of the current group and the sizes of completed groups.
-
A character is "contained over k" if its group size is at most k. Since all occurrences are contiguous, a window of size k can cover them if and only if the group fits within k.
-
Streaming detection of group boundaries: When the incoming character differs from the previous character, the previous group is complete. Record its size and check if it is <= k.
-
The current (incomplete) group might grow beyond k. You need to track whether the current group has already exceeded k and update accordingly.
-
Common mistake: Forgetting to finalize the last group when the stream ends (there is no "next different character" to trigger boundary detection).
Follow-Up Questions
-
What is the runtime and space complexity of your streaming solution? Can you achieve O(1) per character processed?
-
How does the problem change if the string is NOT grouped (characters can appear in multiple non-adjacent segments)? Now "contained over k" means all occurrences across the entire string fit within some window of size k. How would you solve this variant?
-
If you are given the entire string at once (not streamed), can you solve it more efficiently or more simply? What data structures would you use?
-
What if
kis not fixed but changes between queries? Can you precompute something that lets you answer "how many characters are contained over k?" for any k in O(1)?