Matchmaking, Elo and hangar score - what's a fair rating?
Jan 30, 2017 11:45:30 GMT -5
ShutUpAndSmokeMyWeed, noobcake, and 2 more like this
Post by frunobulax on Jan 30, 2017 11:45:30 GMT -5
I initially just wanted to summarize things from a few discussions here, but while I was writing I realized that I could outline a completely new matchmaking algorithm in detail, that may or may not be close to what Pix has planned. Hence the new thread. And I tried to create a model for a matchmaking that would be adequate to handle all requirements.
Seems to me that we have a fundamental problem in our vocabulary. What exactly is that (Elo) score we're talking about? Is it a measure of the ability of the player, or a measure of the impact a player does with his current hangar? In other words, if a seasoned player tunes down his hangar (be it for clubbing or to squad with weaker clanmates), would his (Elo) score change? Some people here clearly think yes, some think no.
To make one thing clear: I think the ability to switch hangars is a fundamental aspect of the game, and part of the fun. The game is very different with only light bots or maybe mediums, and I do enjoy that quite a bit. But of course the main issue is that if players from one clan want to squad, then usually the best players have to adjust their hangars to match the hangars of the weaker players. This should still be possible.
But, and this should also be clear: If I go down to 4/6 light robots, and if I'm a good player, then I should go up against other good players that have a 4/6 light hangar, or maybe worse players that have a 5/7 medium hangar, but not newbies with a 3/5 light hangar or worse. In essence, I will meet either good players at my hangar level, or weaker players playing with a stronger hangar.
Now, if we agree that switching hangars should still be possible, it is very obvious to me that the above question should be answered "no" (that is, player score must be independent of his hangar) and matchmaking must work as combination of "player ability" and hangar rating to allow hangar switching. If the hangar score is fair (and not the rather stupid algorithm we had previously), then hangar score plus Elo should be right as rain.
More precisely: Ideally, the player score should measure how a player performs "on equal footing" with opponents. To know if we are on equal footing or not we need balancing with a good hangar score. What is a good hangar score? That's a tough one. The basic requirement is that two hangars with identical scores should win the same number of games (better: have the same chance of winning a game) if they are thrown in a random game with equally good players. Just assume for the moment that we play one on one and have a perfect hangar score, that is, if we have players with the same "ability" and the same hangar score, then the probability of winning must be 50% for each player. Not for a single match of course, as we have a stone-scissors-paper system here, we could have a hangar A of robots with Plasma and physical shields, hangar B of robots with only splash missiles and hangar C of robots with Anciles and Plasma. All hangars could have the same score, and very obviously B would (almost) always win against A, C would always win against B and A would always win against C. But if you throw them all together, each hangar would win about 50% of the battles.
Now, the problem of course is that there is no easy way to determine a "fair" hangar score. It doesn't have to be 100% fair, after all we are in a game with a lot of variables, but it should be "good enough". The fundamental problem here is how to rate a hangar with multiple robots: Assume a player has a 12/12 robot and four 1/1 robots. If that 12/12 robot is a Trident Fury or maybe a Treb Fury and the player prefers to stay out of close encounters, it is conceivable that the player will be able to use it for a long time in many games, so that the other four 1/1 robots have almost no meaning in his hangar. (Yes, I'm exaggerating here to make a point.) If on the other hand that 12/12 robot is a knife fighter, it is obviously that it won't last long in any battle, and the other robots will need to be weighted much higher.
So, what can we do about it? I'd say let's introduce the best hangar score we can think of, and see if it is good enough. No point in trying to read crystal balls. I already suggested a possible hangar score in another thread, but will propose a significantly different system now because I've been thinking about it some more.
Hangar score for a single robot is a combined score for robot stats, plus weapon slots, plus robot abilities. The general idea is that a robot satisfies two purposes: Ability to cap beacons and ability to fight. In fight, a robot with X hitpoints, speed Y and 1 light weapon counts half as much as a robot with X hitpoints, speed Y and 2 light weapons, but that doesn't mean that the second weapon slot doubles the robot value as beacon capping ability is equal. Keeping this in mind, I'm trying to build a score that captures both. The thing about beacon capping however is this: A team that has 6 perfect light hangars for beacon capping will lose 100% of the time against a team with 6 heavy hangars with a lot of firepower, because the 6 light hangars will be eliminated too quickly to win the game and the fighters will be standing at the spawn point of their enemies and pick the beacon cappers off one by one. Thus, I consider beacon capping more as a small bonus with a limited ceiling, while the fighting power scales a lot better. And one major part of beacon capping is speed and ability like stealth, which contribute to fighting power as well.
So, here we go. First, each robot has a fighting power, which kind of measures the potential damage output of a robot and could be calculated as a combination of hitpoints, weapons, speed and abilities as follows. All the values are pulled out of my hat and need to be adjusted to make sense, but I hope you get the basic idea.
Now we have a fighting score for each robot. Let's see where we end up. First a Raijin: highest HP is 250, times 1.66 (for speed 33), square root of that, times 2 heavy weapons means a factor of 4, times 2.8 for level 12 weapons, times 1.2 for the bastion mode, score 278. Highest score for a Galahad would be 185, for a Fury 271. So a Raijin is slightly better than a Fury? I don't own one, but the Raijin has 60% more HP, more speed, bastion ability and all that for one weapon less. Sounds not that far fetched, but I told you we might need to adjust those values.
Then, I'd add some score for beacon capping ability. Let's say the Stalker is the best beacon capper, with speed 66, stealth and 90 HP. Assume that we want measure beacon capping ability on a scale between 0 and 100 points, then we could just calculate a beacon ability of (speed-26)*2, plus 25% for Stealth ability. (Let's forget about the other abilities for simplicity. Robots slower than 26 get a score of 0.) Now, the "fighting score" of the robot is modified by the beacon capping ability, say we multiply the fighting score by beacon_ability/400, giving the perfect beacon capper a bonus of 25% to his fighting score.
So, fighting score plus beacon capping bonus equals robot score. What about the hangar? I'd suggest to use 100% of the strongest robot, plus 100% of the second strongest robot, plus 70% of the third strongest robot, plus 50% of the fourth and fifth strongest robot. Why? First, I'd argue that I'm only interested in robots that are used in a win. In a loss you'll use all available robots anyway, but if you win all your battles by using only 3 robots, then the last two in your hangar are not that relevant as the first 3. So, the 100% score for the first two is obvious: There are hardly any games where you need only one robot. (Pushovers are not relevant for me either.) For the remaining slots we have to consider the chance that they play, but also scale for versatility. Maybe you'll use that 5th robot only in 10% of your wins, but it adds the possibility to adapt to different map, for different opponents and for different situations. If you play on Springfield, have achieved tactical dominance but you are behind on beacons and the enemy holds the far 3 beacons, then you'll be very happy to have a fast Stalker or Cossack to cap those beacons and stop the domination bar running out. On about any map, if enemies are closing in on your spawn point you need a brick fighter. You get the idea. So even if you use five robots in only 10% of your wins, all robots in your hangar will contribute.
There you have it. A hangar score that should be good enough to allow a reasonable evaluation of your hangar. and rather easy to calculate. Now we come to the (Elo) score of a player. And here the question really is: How big should the influence of that score be? It is obvious that a player with a 4/4 hangar won't be able to win against a player with a 12/12 hangar, even if the latter is a completely clueless player. But it seems reasonable that a good 6/6 player would be able to beat a bad 8/8 or 9/9 player. The hangar score I described is basically the product of a modified hit point number times the damage output of the robot. So, a 30% higher hangar score means essentially that you can either do 30% more damage, or live 30% times longer, or something in between. I'd argue that 30% more damage is doable if your opponent is worse than you.
What about the Elo score that everybody is talking about? To make that clear: I have no clue how the Elo in chess is calculated. That's why I won't use the name "Elo" anymore. But the idea is clear: By comparing scores of two players, you gain information about the probability that either player would win in a head to head matchup, and we have a scale that we can rely on. Winning against strong players will gain you more points than winning against weak players, and within the winning team, damage and beacons should be factored in.
I already introduced a hangar score. I will now introduce a "player score" (that's the Elo-like part) and a "combined" score (hangar score adjusted by player score).
Say I want to make sure that the player score system satisfies two conditions:
1. Player score is targeted to be between 0 and 2000. It can't be lower than 0.
2. The average score of all players is 1000.
Then, we will just multiply the hangar score with, say, (player score plus 500) divided by 1500, and get a combined score. That is, a player with 2000 points would face hangars with 2500/1500 = 1.67 times of his hangar score, while the worst imaginable player would have 500/1500= 0.33 times of his hangar score. Note that there is in theory no upper limit, so there could be a player with 4000 points that is facing hangars much stronger than his, but I figure this is not a likely scenario.
To create a match, we have the usual "rubberband", where we try to find players with a combined score that is in a given range of maybe 20% centered on an arbitrary combined score. That is, no player in the match should have a more than 20% higher or lower combined score of any other player. Matchmaking should try to create equal scores on both sides, such that the sum of all players is as close as possible.
How are player scores adjusted after a match? Points have to be fractional and not integer. Also, we have to specify how quickly the points can change. The player score change shouldn't be too quick so that it takes more than a few battles to adjust. Let's assume we have a constant X that we will use to control this tempo. So, we will do the following:
(1) We will remove inactive and dropouts from the game (see below). That, if for example 3 players eject on one side, the total rating of the side will be only the sum of the ratings of the remaining players, and if they lose the match they will lose only few points because their score is significantly below the score of 6 active players on the other side (assuming MM has created a somewhat equal 6vs6 setup).
(2) Then we will calculate the quotient Q of the total scores of the losing and winning side (total score is sum of hangar score times player rating), that is, Q=losing_score/winning_score. If the winning team was stronger, this will be smaller than 1. If we had an upset, the quotient will be higher than 1.
(3) We calculate the "capability" C/6000 of the losing team as sum of all player scores in that team. As the average score is 1000, this will be exactly 1 for an average 6-player team. The number increases if players are above average, but drops if there are dropouts or players are bad.
(4) Now we will deduct P=X*Q*C rating points from the losing team and give it to the winning team.
There are a few remarks to be made here.
- First and most important, this way the total number of points among all players remains constant, that is, the average of all War Robots players is still 1000.
- Second, the amount of points redistributed is relative to the rating of the players (a group of six 500-score-players will lose half as many points as a group of six 1000-point-players) due to the capability-factor C.
- Third, the amount of points redistributed is relative to the quotient of the ratings. If the losing team had the weaker score they will lose less points than if they had a larger score, due to the factor Q.
- X controls the speed how quickly a score can change. From a gut feeling, I'd expect X=100 would work, such that in an average match 100 rating points are shifted from the losing to the winning players (or 16 points per player).
(5) Given the number P, we rank each of the players in the winning and losing team by beacons and damage output. We use, say, 75% damage and 25% beacons. The points deducted or awarded for each player is then linear in this figure, that is, the player with the most damage and/or the most beacons will gain more points and lose less points than is teammates who "contributed less to the win" or were "more responsible for the loss".
(6) If a player would fall below 0, we'll set his score to 0 and reduce the number of rating points that the winning team gets. (Remember, the sum of all points must remain constant in the system.)
Let me finally add that the problem of tanking is always there, whatever "player score" system we use. Therefore, anti-tanking measures must be applied: A player that ejects from a match before he mechs out will receive nor gold or silver, and his rating will not be changed. A player that does not play (no significant input, no movement for X% of the battle time) will also not receive any rewards or rating change, to avoid that players enter a match and just let it run on auto. We just remove these players from the calcuations above. I'm also in favor for a silver penalty in such cases (repair costs).
There you have it. I probably forgot about a few loopholes and got a few details wrong, but it was fun to write, and to invent the system while writing
Seems to me that we have a fundamental problem in our vocabulary. What exactly is that (Elo) score we're talking about? Is it a measure of the ability of the player, or a measure of the impact a player does with his current hangar? In other words, if a seasoned player tunes down his hangar (be it for clubbing or to squad with weaker clanmates), would his (Elo) score change? Some people here clearly think yes, some think no.
To make one thing clear: I think the ability to switch hangars is a fundamental aspect of the game, and part of the fun. The game is very different with only light bots or maybe mediums, and I do enjoy that quite a bit. But of course the main issue is that if players from one clan want to squad, then usually the best players have to adjust their hangars to match the hangars of the weaker players. This should still be possible.
But, and this should also be clear: If I go down to 4/6 light robots, and if I'm a good player, then I should go up against other good players that have a 4/6 light hangar, or maybe worse players that have a 5/7 medium hangar, but not newbies with a 3/5 light hangar or worse. In essence, I will meet either good players at my hangar level, or weaker players playing with a stronger hangar.
Now, if we agree that switching hangars should still be possible, it is very obvious to me that the above question should be answered "no" (that is, player score must be independent of his hangar) and matchmaking must work as combination of "player ability" and hangar rating to allow hangar switching. If the hangar score is fair (and not the rather stupid algorithm we had previously), then hangar score plus Elo should be right as rain.
More precisely: Ideally, the player score should measure how a player performs "on equal footing" with opponents. To know if we are on equal footing or not we need balancing with a good hangar score. What is a good hangar score? That's a tough one. The basic requirement is that two hangars with identical scores should win the same number of games (better: have the same chance of winning a game) if they are thrown in a random game with equally good players. Just assume for the moment that we play one on one and have a perfect hangar score, that is, if we have players with the same "ability" and the same hangar score, then the probability of winning must be 50% for each player. Not for a single match of course, as we have a stone-scissors-paper system here, we could have a hangar A of robots with Plasma and physical shields, hangar B of robots with only splash missiles and hangar C of robots with Anciles and Plasma. All hangars could have the same score, and very obviously B would (almost) always win against A, C would always win against B and A would always win against C. But if you throw them all together, each hangar would win about 50% of the battles.
Now, the problem of course is that there is no easy way to determine a "fair" hangar score. It doesn't have to be 100% fair, after all we are in a game with a lot of variables, but it should be "good enough". The fundamental problem here is how to rate a hangar with multiple robots: Assume a player has a 12/12 robot and four 1/1 robots. If that 12/12 robot is a Trident Fury or maybe a Treb Fury and the player prefers to stay out of close encounters, it is conceivable that the player will be able to use it for a long time in many games, so that the other four 1/1 robots have almost no meaning in his hangar. (Yes, I'm exaggerating here to make a point.) If on the other hand that 12/12 robot is a knife fighter, it is obviously that it won't last long in any battle, and the other robots will need to be weighted much higher.
So, what can we do about it? I'd say let's introduce the best hangar score we can think of, and see if it is good enough. No point in trying to read crystal balls. I already suggested a possible hangar score in another thread, but will propose a significantly different system now because I've been thinking about it some more.
Hangar score for a single robot is a combined score for robot stats, plus weapon slots, plus robot abilities. The general idea is that a robot satisfies two purposes: Ability to cap beacons and ability to fight. In fight, a robot with X hitpoints, speed Y and 1 light weapon counts half as much as a robot with X hitpoints, speed Y and 2 light weapons, but that doesn't mean that the second weapon slot doubles the robot value as beacon capping ability is equal. Keeping this in mind, I'm trying to build a score that captures both. The thing about beacon capping however is this: A team that has 6 perfect light hangars for beacon capping will lose 100% of the time against a team with 6 heavy hangars with a lot of firepower, because the 6 light hangars will be eliminated too quickly to win the game and the fighters will be standing at the spawn point of their enemies and pick the beacon cappers off one by one. Thus, I consider beacon capping more as a small bonus with a limited ceiling, while the fighting power scales a lot better. And one major part of beacon capping is speed and ability like stealth, which contribute to fighting power as well.
So, here we go. First, each robot has a fighting power, which kind of measures the potential damage output of a robot and could be calculated as a combination of hitpoints, weapons, speed and abilities as follows. All the values are pulled out of my hat and need to be adjusted to make sense, but I hope you get the basic idea.
- We have a base score which is the number of hitpoints (divided by thousand here to avoid k-ing all numbers), plus 2% for each speed point.
(We ignore the robot class. Speed can give a significant bonus - for example, a level 12 Rogatka has 124 HP and speed 45, while a level 3 Natasha has also 124 HP and speed 29. Therefore the Rog has a base score of 124 + 90% = 235, the Natasha has 124 + 58% = 195.) - We then take the square root of the base score and multiply it by the weapon score. The weapon score is the sum of all individual weapons, and calculated as this:
A weapon has score 1 for a heavy weapon, 1.5 for a medium weapon and 2 for a heavy weapon. Then we add 10% for each level (more precisely: 1.1 to the power of (level - 1), as the 10% bonus stacks).
The base assumption behind the weapon score is that we can't compare apples and oranges, so it makes no sense to relate the damage that a Trebuchet does to the damage that a Thunder does. Damage being out of the equation, we only have weapon class and level to work with. A factor 2 between light and heavy weapons seems about right if you look at the cycle DPS of the different weapon systems.
We use the square root of the hitpoint number because I'd argue that twice the hitpoints does not justify twice the fighting power. (Otherwise the Raijin would be the strongest robot.) Square root is just one way to make the HP influence a bit more moderate, in this case it would take 4 times the hitpoints to achieve a double score. - We then add a bonus between
20%-50% extra for each robot ability. I guess we should simply throw in some values and see how they do. (Maybe a physical shield for a Galahad is worth a 50% bonus, while bastion mode and jump capability are worth only 20%.)
Now we have a fighting score for each robot. Let's see where we end up. First a Raijin: highest HP is 250, times 1.66 (for speed 33), square root of that, times 2 heavy weapons means a factor of 4, times 2.8 for level 12 weapons, times 1.2 for the bastion mode, score 278. Highest score for a Galahad would be 185, for a Fury 271. So a Raijin is slightly better than a Fury? I don't own one, but the Raijin has 60% more HP, more speed, bastion ability and all that for one weapon less. Sounds not that far fetched, but I told you we might need to adjust those values.
Then, I'd add some score for beacon capping ability. Let's say the Stalker is the best beacon capper, with speed 66, stealth and 90 HP. Assume that we want measure beacon capping ability on a scale between 0 and 100 points, then we could just calculate a beacon ability of (speed-26)*2, plus 25% for Stealth ability. (Let's forget about the other abilities for simplicity. Robots slower than 26 get a score of 0.) Now, the "fighting score" of the robot is modified by the beacon capping ability, say we multiply the fighting score by beacon_ability/400, giving the perfect beacon capper a bonus of 25% to his fighting score.
So, fighting score plus beacon capping bonus equals robot score. What about the hangar? I'd suggest to use 100% of the strongest robot, plus 100% of the second strongest robot, plus 70% of the third strongest robot, plus 50% of the fourth and fifth strongest robot. Why? First, I'd argue that I'm only interested in robots that are used in a win. In a loss you'll use all available robots anyway, but if you win all your battles by using only 3 robots, then the last two in your hangar are not that relevant as the first 3. So, the 100% score for the first two is obvious: There are hardly any games where you need only one robot. (Pushovers are not relevant for me either.) For the remaining slots we have to consider the chance that they play, but also scale for versatility. Maybe you'll use that 5th robot only in 10% of your wins, but it adds the possibility to adapt to different map, for different opponents and for different situations. If you play on Springfield, have achieved tactical dominance but you are behind on beacons and the enemy holds the far 3 beacons, then you'll be very happy to have a fast Stalker or Cossack to cap those beacons and stop the domination bar running out. On about any map, if enemies are closing in on your spawn point you need a brick fighter. You get the idea. So even if you use five robots in only 10% of your wins, all robots in your hangar will contribute.
There you have it. A hangar score that should be good enough to allow a reasonable evaluation of your hangar. and rather easy to calculate. Now we come to the (Elo) score of a player. And here the question really is: How big should the influence of that score be? It is obvious that a player with a 4/4 hangar won't be able to win against a player with a 12/12 hangar, even if the latter is a completely clueless player. But it seems reasonable that a good 6/6 player would be able to beat a bad 8/8 or 9/9 player. The hangar score I described is basically the product of a modified hit point number times the damage output of the robot. So, a 30% higher hangar score means essentially that you can either do 30% more damage, or live 30% times longer, or something in between. I'd argue that 30% more damage is doable if your opponent is worse than you.
What about the Elo score that everybody is talking about? To make that clear: I have no clue how the Elo in chess is calculated. That's why I won't use the name "Elo" anymore. But the idea is clear: By comparing scores of two players, you gain information about the probability that either player would win in a head to head matchup, and we have a scale that we can rely on. Winning against strong players will gain you more points than winning against weak players, and within the winning team, damage and beacons should be factored in.
I already introduced a hangar score. I will now introduce a "player score" (that's the Elo-like part) and a "combined" score (hangar score adjusted by player score).
Say I want to make sure that the player score system satisfies two conditions:
1. Player score is targeted to be between 0 and 2000. It can't be lower than 0.
2. The average score of all players is 1000.
Then, we will just multiply the hangar score with, say, (player score plus 500) divided by 1500, and get a combined score. That is, a player with 2000 points would face hangars with 2500/1500 = 1.67 times of his hangar score, while the worst imaginable player would have 500/1500= 0.33 times of his hangar score. Note that there is in theory no upper limit, so there could be a player with 4000 points that is facing hangars much stronger than his, but I figure this is not a likely scenario.
To create a match, we have the usual "rubberband", where we try to find players with a combined score that is in a given range of maybe 20% centered on an arbitrary combined score. That is, no player in the match should have a more than 20% higher or lower combined score of any other player. Matchmaking should try to create equal scores on both sides, such that the sum of all players is as close as possible.
How are player scores adjusted after a match? Points have to be fractional and not integer. Also, we have to specify how quickly the points can change. The player score change shouldn't be too quick so that it takes more than a few battles to adjust. Let's assume we have a constant X that we will use to control this tempo. So, we will do the following:
(1) We will remove inactive and dropouts from the game (see below). That, if for example 3 players eject on one side, the total rating of the side will be only the sum of the ratings of the remaining players, and if they lose the match they will lose only few points because their score is significantly below the score of 6 active players on the other side (assuming MM has created a somewhat equal 6vs6 setup).
(2) Then we will calculate the quotient Q of the total scores of the losing and winning side (total score is sum of hangar score times player rating), that is, Q=losing_score/winning_score. If the winning team was stronger, this will be smaller than 1. If we had an upset, the quotient will be higher than 1.
(3) We calculate the "capability" C/6000 of the losing team as sum of all player scores in that team. As the average score is 1000, this will be exactly 1 for an average 6-player team. The number increases if players are above average, but drops if there are dropouts or players are bad.
(4) Now we will deduct P=X*Q*C rating points from the losing team and give it to the winning team.
There are a few remarks to be made here.
- First and most important, this way the total number of points among all players remains constant, that is, the average of all War Robots players is still 1000.
- Second, the amount of points redistributed is relative to the rating of the players (a group of six 500-score-players will lose half as many points as a group of six 1000-point-players) due to the capability-factor C.
- Third, the amount of points redistributed is relative to the quotient of the ratings. If the losing team had the weaker score they will lose less points than if they had a larger score, due to the factor Q.
- X controls the speed how quickly a score can change. From a gut feeling, I'd expect X=100 would work, such that in an average match 100 rating points are shifted from the losing to the winning players (or 16 points per player).
(5) Given the number P, we rank each of the players in the winning and losing team by beacons and damage output. We use, say, 75% damage and 25% beacons. The points deducted or awarded for each player is then linear in this figure, that is, the player with the most damage and/or the most beacons will gain more points and lose less points than is teammates who "contributed less to the win" or were "more responsible for the loss".
(6) If a player would fall below 0, we'll set his score to 0 and reduce the number of rating points that the winning team gets. (Remember, the sum of all points must remain constant in the system.)
Let me finally add that the problem of tanking is always there, whatever "player score" system we use. Therefore, anti-tanking measures must be applied: A player that ejects from a match before he mechs out will receive nor gold or silver, and his rating will not be changed. A player that does not play (no significant input, no movement for X% of the battle time) will also not receive any rewards or rating change, to avoid that players enter a match and just let it run on auto. We just remove these players from the calcuations above. I'm also in favor for a silver penalty in such cases (repair costs).
There you have it. I probably forgot about a few loopholes and got a few details wrong, but it was fun to write, and to invent the system while writing