Skip to content

Apply Refactor to Source Code

Level: L3-L5 Topics: Strings, Offset Manipulation, Sorting, Streaming

Problem Statement

You are given a source code file as a single string and a list of Replacement objects. Each Replacement specifies:

  • start -- the character offset in the original text where the replacement begins
  • before -- the substring that currently exists at that offset
  • after -- the string that should replace before

Apply all replacements and return the modified source code.

Critical detail: Every replacement's start offset refers to the original, unmodified text. You cannot apply them one at a time in a naive loop because earlier replacements shift all subsequent offsets.

Background & Constraints

  • The number of replacements can be large (thousands).
  • Replacements must not overlap. If two replacements would affect overlapping ranges in the original text, report an error.
  • The before string at each offset must match what actually appears in the original text at that position; if it does not, report an error.
  • The after string can be shorter, longer, or the same length as before.

Examples

Input:

text = "Hello World"
replacements = [
    Replacement(start=0, before="Hello", after="Goodbye"),
    Replacement(start=6, before="World", after="Earth")
]

Output:

"Goodbye Earth"

Overlapping error example:

text = "abcdef"
replacements = [
    Replacement(start=0, before="abc", after="XY"),
    Replacement(start=2, before="cde", after="Z")   # overlaps with first (offset 2 is inside [0,3))
]

This should raise an error because the ranges [0, 3) and [2, 5) overlap.

Hints & Common Pitfalls

  1. Do not apply replacements sequentially. If you apply the first replacement and then use the original offsets for the second, you will write to the wrong position. This is the most common mistake.

  2. Sort replacements by offset. After sorting, you can walk through the original text from left to right, copying unchanged segments and inserting replacement text as you go.

  3. Build the output incrementally. Maintain a pointer into the original text. For each replacement (in offset order): copy everything from the current pointer up to replacement.start, then append replacement.after, then advance the pointer past replacement.before.

  4. Overlap detection becomes trivial after sorting: just check whether the current replacement's start falls before the previous replacement's end.

  5. Validate before strings by comparing text[start : start + len(before)] against the declared before value.

Follow-Up Questions

  1. What if the source file is too large to fit in memory? How would you apply replacements in a streaming fashion, reading and writing chunks at a time?

  2. Can you parallelize the replacement process? Consider partitioning the file into non-overlapping segments that different threads can handle independently.

  3. What test cases would you write to gain confidence in your solution? Think about edge cases: empty replacements list, replacement at the very end of the file, replacement that deletes text (after is empty), adjacent but non-overlapping replacements.

  4. Why does each Replacement store the before string if we already have the offset and the original text? Consider scenarios involving version control, conflict detection, or applying replacements to a file that may have changed since the replacements were generated.