Introducing Player Elo Ratings

Elo ratings, despite originally being designed for chess, have become popular in the analysis of team sports due to their simplicity. FiveThirtyEight has popularized Elo ratings for both the National Football League and the National Basketball Association, using simple wins and losses to provide a baseline for analysis of teams.

Due to the size of rosters (and the relative stability of teams) in both of those leagues, FiveThirtyEight simply calculates Elo ratings for teams, and not for each individual player. This makes sense for those sports, but given the low roster size in competitive Rocket League (3v3), and the lack of stable teams, it’s more reasonable to devise a rating system for individual players, instead of for teams as a whole.

Before diving into the specifics, here’s some quick background info on how Elo rating systems work.

  • The rating systems are zero-sum – if Cloud 9 beats NRG and gains 10 rating points, NRG loses 10 rating points.
  • Teams always gain points for winning games, and always lose points for losing games. The number of points gained/lost depends on the team’s rating prior to the game – underdogs will gain more points if they pull off an upset, while a top-tier team crushing a lower level opponent won’t affect either squad’s rating too much.
  • The average team Elo rating typically sits around 1500, so in this case, the average player rating will sit around 500.

Now, here are some specifics related to the Rocket League version of Elo.

  • New players start at the average of 500, and then the system adjusts their ratings accordingly.
  • For each game, changes in Elo are calculated on a team-wide basis. Each player’s individual rating is added together, and then pitted against their opponent’s combined ratings. Afterwards, the points are redistributed to each player.
  • All games from tournaments under the Major, Minor, or Monthly tabs located on the esports wikis are used (h/t to @slokh_ for collecting RLCS statistics; the man is an invaluable resource for anyone interested in Rocket League statistics, and runs his own website that you should totally check out).
  • Only North American and European teams are included. If anybody would like to send data for other regions, they will certainly be added to the rating system.
  • There is a regression component to account for players that play on the same team for an extended period of time.
  • There is a region adjustment, with European players having a higher overall average than North American players.

Here’s a deeper look at the number behind the Rocket League Elo.

K-Factor

When calculating Elo ratings, the K-Factor describes the maximum number of points that can “change hands” in a single match. In sports like baseball, where there are 162 games played in a single season, the K-Factor is set around 10, while a sport like football, which has a 16 game season, has a much higher K-Factor (around 20).

After running tests, it was found that the ideal K-Factor for competitive Rocket League is 20. In each individual game, a team’s Elo rating can change by a maximum of 20 points, so a single player can see a maximum point change of 6.67 after one game.

It was a bit surprising to see the Elo rating so high, but after further thought, it makes sense – the rating system is quick to adapt to roster changes, and can pick up on potent roster combinations soon after they’re formed.

Margin of Victory

Though margin of victory information wasn’t available for every game in the 8,725 game dataset, it was used whenever it was accessible. For games where there was no margin of victory data, it was simply assumed that the final score was 1-0.

In an Elo rating, margin of victory will boost a team’s rating for a more convincing win (like Mock-it EU’s 9-1 thrashing of The Leftovers in Season 3 League Play), and won’t see the same boost for only beating an opponent 1-0. It also accounts for strength of opponent, so an underdog massively outscoring their opponent can expect a noticeable bump in Elo, while a heavyweight squeaking by a bubble team won’t change either team’s rating all that much.

The exact formula used comes from an Elo rating system used for hockey, due to the similarity in game scores. Though Rocket League resembles soccer, it’s final scores have closer resembled hockey; in the NHL, there was an average of 5.45 goals per game during the 2016-2017 season, while Rocket League has averaged 4.34 goals per game since 2015 – international soccer, on the other hand, is at 2.7 goals per game. 

Here’s the exact formula:

Here’s the explanation from Hockey Analytics:

Here GDH is the goal differential in the game, from the perspective of the home team.  This formula is based on this goal differential, adjusted for the (roughly) expected goal differential.  The adjustment is .85 goals for every 100 ELO points of differential.  This adjustment is important for technical reasons.  Once we have the absolute value of the adjusted goal differential, we take the natural logarithm to let some air out of larger margins.  When the adjusted goal differential is between -1 and 1, the formula is designed so that the result is 1.  M is larger when the adjusted goal differential is higher, but with diminishing returns for larger goal differentials.

Basically, the multiplier for margin of victory will increase as margin of victory increases, but only to a certain extent. A three goal victory is worth a lot more than a one goal victory, but a five goal victory isn’t worth all that much more than a three goal victory.

This was done to account for the fact that players will take unnecessary risks when trailing by two or more goals, resulting in goal differentials that don’t quite represent the skill levels of the teams.

Adjusting for the formation of new teams and players having consistent teammates

With this system being a “player” rating system, instead of a “team” rating system, there were a variety of options that could be employed to add player’s rankings together, calculate the change, and then redistribute points.

The system that was decided upon (and made the rating system the best at predicting the outcome of games) was one that accounted for a player’s strength before joining a team, but also slowly brought individual ratings together so that, if three players played on the same team for long enough, their ratings would be essentially identical.

Consider the following example. Player A has an Elo rating of 550, Player B has an Elo rating of 500, and Player C has an Elo rating of 450. Together, they form a team with an Elo rating of 1500. Player A accounts for 36.7 percent of his team’s Elo, while Player B accounts for 33.3 percent and Player C accounts for 30 percent.

They win a game, and get a 10 point boost to their team’s Elo rating. Assuming that Player A simply is worth one-third of their team wouldn’t be fair to them – their new Elo would be 503! That’s a 47 point drop after a victory.

At the same time, simply giving Player A 36.7 percent of their team’s Elo after every game isn’t fair to their teammates. Let’s say this team sticks together for several months, and becomes a strong team with a rating of 1750. Player A’s rating is now 642, while Player C only sits at 525.

The solution was to add a two percent regression to percentage of team Elo, so that, after enough time, three players on the same team would all have virtually identical Elo ratings. Assuming that this hypothetical team played 100 games in the several months they stuck together, Player A has an Elo of 591, while Player C sits at 576. Player A is still considered better, but the gap isn’t quite as massive.

Region adjustment

From late 2015-present, North America and Europe have faced off in 771 games. The series stands at a whopping 458-313 in favor of the Europeans, a sterling 59.4 percent win percentage.

That value has been pretty consistent over time, too, though there is some evidence to show it’s declining. Here’s a look at cumulative win percentage for both regions (when playing against each other, that is).

Europe won about 75 of the first hundred games, and stayed above 60 percent until just recently.

So what does this mean for the player ratings? Well, an expected win percentage of 59.4 percent equates to a 66 point difference among teams, and a 22 point difference between players. When new European players appear in the dataset, they are given an initial rating of 500. When new North American players appear, their initial rating is 478.

Where can I find these ratings?

Right here on the site! Simply navigate to the “Player Ratings” or “Team Ratings” tab.

If you have any questions or would like to spark a discussion, please leave a comment or reach out on Twitter.

Bits and Pieces

Here are some notes that I took over the course of pulling the data for the ratings.

  • ESL 2015 NA 1 – Left out due to lack of roster information for 3 of the 8 teams.
  • ESL 2015 NA 2 (10/3) – Exclude Round 1 due to lack of roster information for two teams.
  • RGN 3v3 Cup – Exclude rounds 1 & 2 due to lack of roster information.
  • Insomnia 57, 58, and 59 excluded due to lack of high end teams (zero or few RLCS players).
  • Pulsar Premier League Season 1 is included.
  • Pulsar Premier League Seasons 2 and 3 League Play are not included due to unclear results – playoffs, however, are included.
  • ESL 2016 NA 3 – Team What A Guy excluded due to lack of roster information (swept 3-0 in Round 1).
  • Rocket Royale 2016 Week 6 – Get a Clue and Razzle Dazzle excluded due to a lack of roster information.
  • DreamHack Montreal 2016 excluded due to a lack of roster information.
  • Metaleak: Metacup Series (Elite) 3 and above excluded for time purposes.
  • Paris Week 2016 and Red Bull 5G Finals excluded for time purposes.
  • Leaf League excluded due to time constraints.
  • ASTRONAUTS 4 and 5 excluded due to lack of game information.
  • SUPER ROYALE (Phase 1 excluded) due to lack of game information.
  • SPS Battlecar Pacifica omitted due to a lack of team information
  • Dreamhack Atlanta – Round 2 and above only.