Your Elo rating is described as a precise measure of your chess strength. Players obsess over every point gained or lost. Tournaments are built around rating brackets. And yet, a 1500-rated player on Chess.com and a 1500-rated player in a FIDE tournament are not remotely the same strength. Something is broken — and nobody in the chess world talks about it enough.
The Elo Myth
Arpad Elo was a Hungarian-American physics professor who developed his rating system in the 1960s to improve on the previous Harkness system used by the United States Chess Federation. The USCF adopted it in 1960, FIDE in 1970. Elo's original paper described the system as a statistical estimate, not an absolute truth — a point that has been largely forgotten in popular use.
The system works on one elegant principle: your rating predicts your expected score against an opponent, based on the difference in your ratings. Win more than expected, your rating goes up. Win less than expected, it goes down. Simple, transparent, and — in a closed, stable pool of players — remarkably effective.
The problem is that chess in 2026 is not a closed, stable pool of players. It is fragmented across dozens of platforms, time controls, national federations, and online communities — each with its own rating scale, inflation history, and player population. The Elo system was never designed for this, and the cracks are everywhere.
How Rating Inflation Works
Rating inflation occurs when the average rating in a pool drifts upward over time without representing a genuine increase in skill. In a perfectly balanced Elo system, every point gained by one player would come from another player — a zero-sum transfer. But in practice, chess rating pools are not zero-sum, for several reasons.
The biggest driver is new player pool expansion. When thousands of new players join a rating system each year, they bring fresh points with them (their initial ratings). If even a fraction of those new players are stronger than their initial rating suggests, they pump real value into the pool by beating established players and taking points from them. This flow never fully reverses when those players quit.
K-factor changes also play a role. In 2012, FIDE raised the K-factor from 10 to 20 for established players. This made ratings more volatile and, combined with the expanded player base, contributed measurably to upward drift. Several studies have shown that a 2200-rated player from the 1990s would likely need to be around 2300 today to be considered equally strong by the statistics.
The Online vs OTB Rating Gap
This is where the practical impact hits hardest. Across hundreds of thousands of player comparisons, the pattern is unmistakable: players consistently rate higher online than they do over the board. The typical gap for rapid time controls is 200–350 rating points. For blitz, even higher.
Several factors explain this gap. Online chess is typically played at faster time controls, and blitz/rapid ratings measure a different skill mix than classical chess. Online, you face a different demographic — a global pool dominated by younger players who are often underrated relative to their true strength. And online platforms have experienced dramatic inflation in their early years as they acquired millions of new users quickly.
The real-world consequence: players who have only played online and show up to their first over-the-board tournament are routinely surprised — and humbled — by how different the experience is. The slower pace, the physical presence of an opponent, the requirement to write moves, the absence of computer assistance — all of these change the game profoundly.
Platform Rating Comparison
Rough conversion estimates based on player survey data. Individual variation is very high — treat these as orientation guides only.
| FIDE OTB | Chess.com Rapid | Chess.com Blitz | Lichess Rapid | Lichess Blitz |
|---|---|---|---|---|
| 1000 | ~1200–1350 | ~1300–1450 | ~1350–1500 | ~1400–1600 |
| 1200 | ~1400–1550 | ~1500–1650 | ~1550–1700 | ~1600–1800 |
| 1500 | ~1700–1850 | ~1800–1950 | ~1800–2000 | ~1900–2100 |
| 1800 | ~2000–2150 | ~2100–2250 | ~2100–2300 | ~2200–2400 |
| 2000 | ~2200–2350 | ~2300–2450 | ~2300–2500 | ~2400–2600 |
| 2200 | ~2400–2550 | ~2500–2650 | ~2500–2700 | ~2600–2800 |
The wide ranges reflect genuine uncertainty. Players who primarily play blitz online and rarely play classical OTB will have the widest gaps. Players who regularly compete OTB will see their platforms align more closely. The single most important variable: how much classical OTB chess you have played in the past 12 months.
Sandbagging: The Dark Art
Sandbagging — deliberately losing games to lower your rating — is the most controversial abuse of the Elo system. It is not a new problem. National chess federations have been aware of it since the 1980s. But the rise of money-prize open tournaments, particularly those with strict rating class sections (e.g., "Under 1600", "Under 1800"), has made it a significant issue.
The incentive is obvious: a 1750-rated player who drops their rating to 1550 can enter the Under 1600 section, where they are dramatically stronger than most competitors. Prize fund cash awaits. The sandbagger collects winnings, then lets their rating climb back to its natural level before the next tournament.
FIDE and national federations have introduced anti-sandbagging provisions — performance thresholds, mandatory reporting for suspicious patterns, and in some cases direct rating adjustments. But the detection problem is hard: losing a few games in a row is statistically normal. Proving intent is nearly impossible without confessions or overwhelming statistical evidence.
K-Factor and How It Gets Exploited
The K-factor governs how quickly your rating can change per game. Higher K = bigger swings. FIDE uses three levels: K=40 for players with fewer than 30 rated games, K=20 for most established players, and K=10 for players over 2400 who have been active for many years.
The K=40 phase is particularly prone to exploitation. A player in their first 30 rated games gains or loses 40 points from a single result against an equally-rated opponent (compared to 20 for an established player). Experienced players who have never been formally rated — including some who have played extensively in unofficial clubs — enter this phase with enormous tactical advantages: they know their true strength, but the system treats them as unknown quantities.
The Provisional Rating Trap
Your first batch of rated games sets a baseline that is remarkably sticky. If your first 10 games are against opponents significantly weaker or stronger than your true level — which is completely random, not your fault — you may find yourself with a provisional rating that misrepresents your strength by 200+ points.
Getting out of a bad provisional rating is slow. With K=20, each win against a lower-rated player gives you only a small gain. It can take 30–50 games to correct a provisional rating misalignment, during which you will consistently be paired with opponents either too easy or too strong for meaningful learning. The system is working as designed — slowly correcting toward truth — but it feels brutal when you are living through it.
What Your Rating Actually Tells You
After all this criticism, it is worth being clear: your rating does mean something, and it is genuinely useful — within its proper scope. Here is an honest breakdown of what your Elo number measures and what it does not:
| Your rating IS a reliable measure of… | Your rating IS NOT a reliable measure of… |
|---|---|
| Relative strength vs players in the same pool | Absolute chess skill independent of platform |
| Your expected score vs a specific opponent | How you would perform at a different time control |
| A recent trend in your performance | How your strength compares to players from past decades |
| Pairing fairness within one tournament or platform | Cross-platform comparisons (e.g. Chess.com vs Lichess) |
Is There a Better System?
Yes — and it already exists. Glicko-2, developed by Professor Mark Glickman of Harvard, addresses many of Elo's structural problems. It adds two key elements the basic Elo lacks: a Rating Deviation (RD) representing uncertainty in your rating, and a volatility measure representing how consistently you perform.
Lichess uses Glicko-2. When you see "1800 ± 45" on Lichess, that ±45 is your Rating Deviation. Lower RD means more games, more certainty. A new player might show "1500 ± 350" — meaning the system is not confident in the estimate at all. This is far more honest than a bare Elo number that implies the same precision for a 5-game newcomer as for a 500-game veteran.
FIDE has been reluctant to adopt Glicko-2 primarily for backward-compatibility reasons — the existing rating database stretching back decades would be difficult to migrate, and players have strong emotional attachments to their Elo numbers. But several national federations have quietly moved to Glicko-2 or similar systems, and there is growing academic consensus that Elo's time as the gold standard may be ending.