Mill journal24 April 20266 min readMill team

Gamification in Mill, and why compliance courses default to OFF

Points, badges, and leaderboards make bad compliance training worse, and good onboarding better. Here's the research, and how Mill's three-tier toggle reflects it.

Turning gamification on for every course would be an easy product decision. Every competitor has done it. Points, badges, leaderboards. Ship them, ship them everywhere, tell buyers the course is now engaging.

Unfortunately the research has been screaming for about a decade that this is wrong. We'd rather ship the awkward truth than the easy lie.

What the literature actually says

When adult-learning researchers test "does gamification work?", the honest answer is it depends on what you put it on.

Hamari, Koivisto & Sarsa (2014) ran a systematic review of 24 empirical studies and found positive effects are real, but effect sizes are hugely dependent on context, and several studies on mandatory training found no effect or negative effects. Learners in compliance contexts don't treat points as a reward; they treat them as a signal that the content is trivial.

Sailer & Homner (2020) did a meta-analysis in Educational Psychology Review. They found an overall medium positive effect (g = 0.36), but with one crucial split: structural game elements (progress bars, avatars, narrative, feedback) produced the strongest effects. Surface elements (generic points + badges + leaderboards added to existing content) produced the weakest, and in some studies, reversed effects.

Deci, Koestner & Ryan (1999) in Psychological Bulletin 125(6) is the source of the over-justification effect: a meta-analysis of 128 studies showing that extrinsic rewards systematically undermine intrinsic motivation on meaningful tasks. When you add points to something a learner already engages with out of genuine interest, the brain starts optimising for the reward and loses interest in the task itself. Mekler, Brühlmann, Tuch & Opwis (2017) in Computers in Human Behavior found the flip side: points, levels, and leaderboards produce measurable performance effects but no intrinsic-motivation benefit, consistent with Deci-Koestner-Ryan and evidence that surface gamification doesn't recruit self-determined engagement.

Nicholson (2015) proposed the RECIPE framework for "meaningful gamification": Reflection, Exposition, Choice, Information, Play, Engagement. Note what's not on the list: points and badges. Nicholson's core argument is that extrinsic rewards fail at adult-learning timescales because the reward becomes the goal.

Deci & Ryan's Self-Determination Theory is the dominant theoretical frame. Three innate needs: autonomy, competence, relatedness. Points support competence (and only weakly). Badges can support competence if they're genuinely earned. Leaderboards can support relatedness if they're cohort-based and opt-in, and can actively damage it if they're public and competitive on safety-critical content.

Seaborn & Fels' 2015 survey in IJHCS is probably the most comprehensive synthesis. Their conclusion: gamification works when it supports all three SDT needs. Generic BPL usually supports only one, sometimes none.

What this means for Mill

Mill is an AI course generator that ships SCORM and xAPI packages to customers who use them across a wild range of contexts. Compliance refreshers for pharma. Safety briefings for construction. Product training for sales. Onboarding for new hires. Ethics for executives. Anti-money-laundering for bank analysts.

The same gamification design is wrong across half of those. Points on a safety course are at best patronising and at worst encourage gaming the assessment. A leaderboard on sales training is motivating; a leaderboard on a harassment-reporting course is a catastrophe.

So Mill's gamification is a three-tier toggle with audience-aware defaults:

OFF: no gamification layer. Default for courses whose topic/audience signals regulatory, safety, ethics, or executive content. The research is clear that compliance training is worse with BPL, not better.
LIGHT: structural gamification only. Progress bars, chapter markers, mastery badges (earned through passing-gated assessments, not "awarded for clicking through"), skill tree visualisation, reflection prompts, unlock progression. Every element in LIGHT has strong empirical support across all adult-learning contexts. Safe default for skills / onboarding / product training.
FULL: opt-in, surface + structural. Adds points/XP, narrative framing, streaks (with cooldown; Duolingo-style anxiety is a real cost), team leaderboards (cohort-based only; we don't offer individual-public leaderboards because the research is so clearly negative), and optional timed challenges on recognition-style section types.

What we deliberately don't offer

A few things we've chosen not to build, because the evidence says they hurt outcomes:

Individual-public leaderboards. The research is consistently negative. We offer cohort-scoped (team-internal) leaderboards only.
Time pressure on conceptual content. Timed challenges are opt-in AND restricted to recognition-style section types (MCQ, flashcards, hotspots). Time pressure on case-studies or reflection reduces metacognition.
Cartoon-style badges on serious topics. Badge artwork for a BLS refresher shouldn't look like a Roblox achievement. Mill's default badge styling is editorial, not playful.
Streaks without cooldown. A 2-day grace period is the default. Zero-cooldown streaks correlate with learner anxiety and burnout in the consumer-app research we've seen.

The advisor: how we default responsibly

Every course gets analysed by the Mill Gamification Advisor on publish (using whichever LLM your organisation has configured for Mill's advisor role). It reads the course's topic, audience, level, and learning goals; runs a keyword-based baseline; then refines with per-feature rationale. The author sees a pre-populated recommendation with the research citations inline, not a generic "ENGAGE YOUR LEARNERS!" banner.

If a compliance course is analysed, the advisor recommends OFF with a citation (usually the Deci-Koestner-Ryan over-justification finding). If a sales onboarding course is analysed, the advisor recommends FULL. The author can override (we trust them), but they see the "why" every time.

What flows through SCORM and xAPI

Every gamification event (badge earned, level up, chapter unlocked, challenge completed) emits a canonical event that gets serialised THREE ways:

xAPI: full-fidelity Statement to Mill's LRS (and any customer-configured LRS). Lossless.
SCORM 2004: rich translation via cmi.objectives, cmi.interactions, and a JSON blob in cmi.suspend_data. Roughly 80% fidelity.
SCORM 1.2: compact translation via a MessagePack-encoded cmi.suspend_data blob (fits the 4 KB limit). Roughly 40% fidelity at the transport; the parallel xAPI stream to Mill's LRS keeps the full record regardless.

Your LMS sees standard SCORM behaviour: completion, pass/fail, score. Mill's analytics see the full semantic stream. Every gamification event is auditable; every configuration change produces a new PublishVersion with a fresh SHA-256 manifest hash. A regulator asking "was gamification on for this course on this date?" gets an answer in one query.

TL;DR

Gamification is off by default on compliance courses because the research says it should be.
It's on by default on skills/onboarding/product training because the research says it helps there.
Every toggle shows the research that drives the recommendation.
Every event flows through xAPI + SCORM with audit-grade guarantees.
We don't offer individual-public leaderboards because they make things worse.

This is one of those product decisions where the easy answer and the right answer point in different directions. We picked the right one.

#product#pedagogy#gamification#compliance

Turn this topic into a course.

Mill generates a full SCORM-ready course from a topic in minutes - AI narration, 33 languages, compliance audit trail.

Start for free See examples