Skip to content

Contained Characters in Grouped String

Level: L3-L4 Topics: Strings, Streaming, Sliding Window, Hash Maps

Problem Statement

A grouped string is a string where identical characters always appear consecutively (e.g., "AACCCCGGD" -- the As are together, the Cs are together, etc.). Each character in the alphabet appears in at most one contiguous group.

A character is "contained over k" if all of its occurrences fit within at least one contiguous window of size k in the string.

You receive the string as a stream -- one character at a time. At any point, you should be able to return the count of distinct characters that are currently "contained over k."

Background & Constraints

  • The string is grouped: no character appears in two separate, non-adjacent groups.
  • Characters are streamed one at a time. You cannot look ahead or rewind.
  • k is given upfront and is fixed.
  • You need to maintain a running count as each character arrives.

Examples

Example 1:

string = "AACCCCGGD", k = 4

After 'A','A':         contained characters over 4: {'A'} → count = 1
  (A's group size = 2, fits in window of 4)

After 'A','A','C','C': contained characters over 4: {'A','C'} → count = 2
  (A has 2 chars, C has 2 chars so far, "AACC" fits in window of 4)

After 'A','A','C','C','C','C': contained characters over 4: {'A','C'} → count = 2
  (C's group is now 4, which fits in window of 4. A's group is 2, fits in 4.)

After 'A','A','C','C','C','C','G','G': contained characters over 4: {'A','C','G'} → count = 3

After 'A','A','C','C','C','C','G','G','D': contained characters over 4: {'A','C','G','D'} → count = 4

Example 2:

string = "AAAAAB", k = 3

After all A's: A's group has 5 characters. Does A fit in a window of 3? No (5 > 3).
After B: B has 1 character, fits in window of 3.

Final contained count = 1 (only B)

Hints & Common Pitfalls

  1. Since the string is grouped, each character's occurrences form a single contiguous block. You only need to track the size of the current group and the sizes of completed groups.

  2. A character is "contained over k" if its group size is at most k. Since all occurrences are contiguous, a window of size k can cover them if and only if the group fits within k.

  3. Streaming detection of group boundaries: When the incoming character differs from the previous character, the previous group is complete. Record its size and check if it is <= k.

  4. The current (incomplete) group might grow beyond k. You need to track whether the current group has already exceeded k and update accordingly.

  5. Common mistake: Forgetting to finalize the last group when the stream ends (there is no "next different character" to trigger boundary detection).

Follow-Up Questions

  1. What is the runtime and space complexity of your streaming solution? Can you achieve O(1) per character processed?

  2. How does the problem change if the string is NOT grouped (characters can appear in multiple non-adjacent segments)? Now "contained over k" means all occurrences across the entire string fit within some window of size k. How would you solve this variant?

  3. If you are given the entire string at once (not streamed), can you solve it more efficiently or more simply? What data structures would you use?

  4. What if k is not fixed but changes between queries? Can you precompute something that lets you answer "how many characters are contained over k?" for any k in O(1)?