Lesson 023: Maximum Marginal Relevance for Diversity-Aware Selection

Lesson 023: Maximum Marginal Relevance for Diversity-Aware Selection

The Lesson

Greedy Maximum Marginal Relevance (MMR) is the practical default for selecting a diverse, high-quality subset from a large pool. At each step, it picks the item that maximizes quality minus similarity to already-selected items. It runs in O(K × N) time, requires no optimization library, and naturally balances relevance against redundancy.

Context

A calendar optimizer needed to select 13 images from 12,217 candidates. Each image had a preference score, and 512-dimensional CLIP embeddings were available for computing pairwise visual similarity. The selection needed to maximize both voter preference and visual diversity — two objectives in direct tension, since the most popular images tend to look similar.

What Happened

  1. Considered formal optimization approaches: integer programming (scipy.optimize.milp), simulated annealing, and determinantal point processes (DPPs). All are more theoretically principled than greedy but add complexity and dependencies for a 13-from-12K selection problem.
  2. Chose greedy MMR for simplicity and debuggability. The algorithm: start with an empty set, iteratively add the candidate with highest marginal utility, where utility = quality - λ × max_similarity_to_selected.
  3. Implemented with CLIP cosine similarity as the redundancy measure. L2-normalized the 512-dim vectors once upfront, then cosine similarity is a single dot product per pair. At each of 13 greedy steps, evaluated all ~12K remaining candidates — total work: ~160K dot products, completing in <1 second.
  4. Added hard constraints on top of MMR: max 2 images per visual cluster (k=25 clusters). This provides a coarse diversity floor that MMR's continuous penalty reinforces.
  5. Compared MMR output (Method E) against naive top-13 (Method A): zero overlap in the 13 selected images. MMR aggressively avoided redundancy, selecting from 10 distinct clusters instead of the 3-4 clusters that dominate the top-13.
  6. MMR's objective score was lower than cluster-limited top-N (Method B) because it sacrificed more popularity for diversity. This is a feature, not a bug — it exposes a different point on the trade-off frontier.

Key Insights

Examples

# Core MMR loop (simplified)
selected = []
for _ in range(13):
    best_sk, best_utility = None, -inf
    for sk in remaining:
        quality = w_pop * pref_quantile[sk] + w_appeal * appeal_quantile[sk]
        if selected:
            max_sim = max(dot(normed[sk], normed[s]) for s in selected)
            quality -= w_redundancy * max_sim
        if quality > best_utility:
            best_sk, best_utility = sk, quality
    selected.append(best_sk)
    remaining.remove(best_sk)

Applicability

MMR works for any "select K diverse, high-quality items" problem:

Does NOT apply when diversity is irrelevant (e.g., selecting the K best answers to a factual question) or when the quality-diversity trade-off needs formal guarantees (use submodular optimization or DPPs instead).

Related Lessons