Skip to content

Custom Comparators

Beyond Simple Sorting

The previous lesson sorted by a single field: \(a_i\) ascending, or \(B_i\) ascending. Clean and simple.

Some problems don't have a single field to sort by. The comparison depends on both elements at once. You need to derive the comparator from scratch using the exchange argument.

Here's the key question you always ask: when should element A come before element B in the optimal ordering?


The Problem That Requires Derivation

CF632C — The Smallest String Concatenation: Given n strings, concatenate them in some order to get the lexicographically smallest possible result.

The Greedy Choice: Put string a before b if a + b < b + a lexicographically. The comparator falls directly from the exchange argument.

There's no obvious "sort by length" or "sort by first character" that works in general. You need to derive the sort order from the exchange argument.

Setup: Consider two strings a and b. In what order should they appear?

  • Order 1 (a before b): result starts with a + b
  • Order 2 (b before a): result starts with b + a

We want the lexicographically smallest result. So put a before b if a + b < b + a (lexicographic comparison of the concatenated strings).

This is the comparator. It fell directly from the exchange argument.


Watching the Comparator Work

String concatenation comparator: comparing AB vs BA determines optimal ordering

Let's trace on ["ba", "a", "bc"].

Compare all pairs using a + b < b + a:

Pair a+b b+a Who's first?
"ba" vs "a" "baa" "aba" "aba" < "baa" → a first
"ba" vs "bc" "babc" "bcba" "babc" < "bcba" → ba first
"a" vs "bc" "abc" "bca" "abc" < "bca" → a first

Sorted order: "a" < "ba" < "bc" → concatenation: "ababc".

Let's verify manually: - "a" + "ba" + "bc" = "ababc" - "a" + "bc" + "ba" = "abcba" — larger - "ba" + "a" + "bc" = "baabc" — larger - All other orders are also larger

The greedy sort gives the lexicographically smallest result.


The Transitivity Requirement

Here's something easy to miss. For a comparator to work with std::sort, it must be a strict weak order — in particular, it must be transitive: if A < B and B < C, then A < C.

Is our comparator transitive? I.e., if a+b < b+a and b+c < c+b, does a+c < c+a?

Proof: Treat each string as a number in base \(10^{|s|}\) (a "superdigit"). Then the concatenation a+b represents the number \(a \times 10^{|b|} + b\), and the condition a+b < b+a means:

\[a \times 10^{|b|} + b < b \times 10^{|a|} + a\]

Rearranging:

\[a(10^{|b|} - 1) < b(10^{|a|} - 1)\]

Which gives us a ratio comparison:

\[\frac{a}{10^{|a|} - 1} < \frac{b}{10^{|b|} - 1}\]

Ratios compose transitively. If \(\frac{a}{\ldots} < \frac{b}{\ldots}\) and \(\frac{b}{\ldots} < \frac{c}{\ldots}\), then \(\frac{a}{\ldots} < \frac{c}{\ldots}\). Transitivity holds.

What breaks if the comparator isn't transitive? std::sort enters undefined behavior — it can produce any ordering, crash, or infinite loop. Always verify transitivity before using a custom comparator.


The Cross-Product Pattern

CF922D — Robot Vacuum Cleaner: Each "robot" is a string of 's' and 'h' characters. Concatenate all robots in some order. Maximize the total number of "sh" subsequences in the result.

An "sh" subsequence is any pair of positions where s appears before h (not necessarily adjacent).

Setup the exchange argument: For two robots A and B, compute the contribution from the cross-term (pairs where one character comes from A and one from B).

  • A before B: cross contribution = \(A_s \times B_h\) (every s in A pairs with every h in B)
  • B before A: cross contribution = \(B_s \times A_h\)

Put A before B if \(A_s \times B_h > B_s \times A_h\).

This is a cross-product comparator — it compares the ratio \(s/h\) for each string, without actually dividing (to avoid floating point issues). Equivalently, A goes first if \(A_s / A_h > B_s / B_h\) (higher s-to-h ratio goes first).

Why? If A has many ss and few hs, and B has many hs, putting A first means all of A's ss can pair with all of B's hs — maximum cross-pairs. If you put B first instead, B's hs come before A's ss and those pairs are lost.

Let's trace on strings ["sh", "hs", "s", "h"]:

Compute \((s\_count, h\_count)\) for each: - "sh": s=1, h=1 - "hs": s=1, h=1 - "s": s=1, h=0 - "h": s=0, h=1

Cross-product comparison: - "s" (1,0) vs anything with h: \(1 \times h > 0 \times 1 = 0\) — "s" always goes first (h=0 means ratio is \(\infty\)) - "h" (0,1) vs anything with s: \(0 \times s < s_{other} \times 1\) — "h" always goes last - "sh" (1,1) vs "hs" (1,1): \(1 \times 1 = 1 \times 1\) — tie, either order

Optimal order: "s" first, then "sh" and "hs" in any order, then "h" last.

Counting "sh" subsequences in "s" + "sh" + "h" = "sshh": - (0,2): s at 0, h at 2 ✓ - (0,3): s at 0, h at 3 ✓ - (1,2): s at 1, h at 2 ✓ - (1,3): s at 1, h at 3 ✓ - Total: 4

Counting "sh" subsequences in the worst order "h" + "sh" + "s" = "hshs": - Characters: h(0), s(1), h(2), s(3) - (1,2): s at 1, h at 2 ✓ - Total: 1 — much worse.


Deriving Comparators: The General Template

  1. Consider two adjacent elements A and B in your sequence.
  2. Define \(\text{cost}(A, B)\) = some function of putting A before B.
  3. Define \(\text{cost}(B, A)\) = same function with B before A.
  4. Put A before B if \(\text{cost}(A, B)\) is better (smaller for minimization, larger for maximization).
  5. Simplify the comparison to avoid floating point.
  6. Verify transitivity.
Problem cost(A before B) Comparator
String concat a + b (lex) a + b < b + a
Max "sh" subseq \(A_s \times B_h\) (cross-pairs gained) \(A_s \cdot B_h > B_s \cdot A_h\)
Tasks & Deadlines \(2t + 2a_i + a_j\) (sum finish times) \(a_i < a_j\)

The Bug Factory: Strict vs Non-Strict

A common mistake: writing a + b <= b + a instead of a + b < b + a. This creates a comparator that returns true for equal elements (a < a = true), violating the strict weak order requirement. std::sort will produce garbage.

Always use strict <, not <=.

If two elements are equal under your comparator (both a+b == b+a and b+a == a+b), the comparator returns false for both, which means they're "equivalent" — and std::sort handles ties arbitrarily but correctly.


The Full Code

#include <bits/stdc++.h>
using namespace std;

int main() {
    ios_base::sync_with_stdio(false);
    cin.tie(NULL);

    int numStrings;
    cin >> numStrings;

    vector<string> stringElements(numStrings);
    for (int i = 0; i < numStrings; i++) {
        cin >> stringElements[i];
    }

    auto concatCompare = [](const string& a, const string& b) {
        return a + b < b + a;
    };

    sort(stringElements.begin(), stringElements.end(), concatCompare);

    for (const auto& str : stringElements) {
        cout << str;
    }
    cout << "\n";
    return 0;
}

Performance note: a + b creates a new string for every comparison. With n=50,000 strings of total length 50,000, this is O(n log n) comparisons × O(avg_length) per comparison = fine in practice, but be aware.


Try It

Fill in the sorting lambda.

Input:
3
c ba b
Output: bbac

Trace: compare "c" vs "ba": "cba" vs "bac" → "bac" < "cba" → ba first. Compare "ba" vs "b": "bab" vs "bba" → "bab" < "bba" → b first. So order: b, ba, c → "bbac". ✓

Input:
4
3 30 34 5
Output: 303345

These are numbers written as strings. Compare all pairs using a+b < b+a: - "3" vs "30": "330" vs "303" → "303" < "330" → 30 before 3 - "30" vs "34": "3034" vs "3430" → "3034" < "3430" → 30 before 34 - "34" vs "5": "345" vs "534" → "345" < "534" → 34 before 5 - "3" vs "34": "334" vs "343" → "334" < "343" → 3 before 34 - "3" vs "5": "35" vs "53" → "35" < "53" → 3 before 5

Sorted order: 30, 3, 34, 5 → "303345". ✓

Challenge: Write the cross-product sort for the Robot Vacuum Cleaner and verify on a small example by hand.

Problems

Problem Link Difficulty
CF632C — Smallest String Concatenation codeforces.com/problemset/problem/632/C 1700
CF922D — Robot Vacuum Cleaner codeforces.com/problemset/problem/922/D 1800