Lesson 012: Bayesian Beta-Binomial Smoothing

Lesson 012: Bayesian Beta-Binomial Smoothing

Problem

The Artemis vote system shows 50 random images per ballot and asks voters to pick 5 favorites. With 500 ballots across 12,217 images, most images are shown only 1-2 times. A raw selection rate of "1 out of 1 shown = 100%" is meaningless — it tells you nothing about whether the image is actually preferred. Raw rates are dominated by sampling noise at low exposure.

Why It Matters

If you feed raw selection rates into an optimizer, it will select images that were lucky (shown once, happened to get picked) over images that were genuinely preferred but had the misfortune of being shown to a voter who preferred something else. The calendar would be built on noise, not signal.

What Happened

  1. Started with raw selection rates (selected / shown) as the preference metric. Images shown once and selected once scored 1.0 — higher than genuinely popular images shown 50 times with a 30% selection rate.
  2. Added Wilson lower-bound confidence intervals as a frequentist correction. This penalized low-exposure images appropriately but produced a single point estimate rather than a full posterior distribution. Couldn't express "how uncertain are we?" beyond the interval width.
  3. Switched to a Beta-Binomial conjugate model. Chose Beta(2, 8) as the prior — encoding "assume an image has about a 20% selection rate until data says otherwise." The 20% is slightly above the 10% base rate (5 of 50) because the vote pool is pre-filtered to usable frames.
  4. Verified the smoothing behavior: images with 1-2 exposures stay near the prior mean of 0.20 regardless of outcome. Images with 10+ exposures have posteriors dominated by data. The crossover happens around n=10, which matches the prior strength (a+b=10).
  5. Kept Wilson lower bound as a secondary metric alongside the Bayesian posterior. Both are stored in mart_image_preference_score — Wilson for comparison, posterior_mean as the backbone of the composite score.
  6. The posterior_mean became the primary input to the composite scoring formula, weighted at ~85% of the final score (with Elo and Borda as secondary adjustments).

Design Choice: Beta-Binomial Conjugate Prior

We use a Beta(2, 8) prior combined with the observed selection data to produce a posterior Beta distribution for each image's true selection probability.

Key terms

Alternatives Considered

  1. Raw selection rate: Rejected. Meaningless at n=1 or n=2.
  2. Laplace smoothing (add-1): Equivalent to Beta(1,1) prior, which assumes a 50% base rate — too optimistic for images.
  3. Empirical Bayes: Estimate the prior from the data. More sophisticated but harder to explain and not needed when the prior is weakly informative.
  4. Mixed-effects logistic regression: Would account for voter random effects but requires substantially more data to fit reliably.

What Was Learned

Bayesian smoothing is the right default for any sparse count-based metric. The conjugate prior trick makes it computationally trivial — no MCMC, no optimization, just arithmetic. The key decision is the prior strength (a + b = 10 in our case), which controls how many observations are needed to overcome the prior. With a total prior weight of 10, it takes about 10 observations for the data to dominate the prior — roughly the right amount of smoothing for our exposure levels.