Tuesday, November 23, 2010

Elo and Trickle-Up Economies

Preface: I realized after writing it that this post is pretty long and could use an abstract. It's an overview of the two scoring systems used in Warlocks viewed as economies. The objective is simply to note what kind of properties will emerge from having an economy with these particular constraints. What does emerge is that the resources in both systems are only acquired from other players, not independently generated, which results in a pyramid of sorts, where a large number of players lower down in the hierarchy provide a source of points for the higher ranked players. Larger economies will result in a higher peak, but otherwise do not effect the average player in these systems.

---

As I've mentioned before, I'm an avid player of the game Warlocks, based on Waving Hands. This game uses two different ranking systems for competitive players, and they have some interesting differences between them.

The first, and simpler ranking system is ladder points, and they work as follows: every time a player wins a ladder match, they gain one ladder point; every time a player loses a ladder match, they lose one ladder point; every time a player dies during a ladder match, their ladder points reduce to zero (note that in this game, most matches end with one player surrendering, not dying). Every player starts with a ladder score of zero, and you cannot have negative ladder points. This means that every time a player with no ladder points loses a match, a ladder point is created from the ether, and every time a player with ladder points loses a match, their point is effectively transferred to the winner. There's one more feature of ladder matches that's worth mentioning, and that is that you cannot challenge a player to a ladder match if your relative ladder scores are more than 5 points apart.

The result of these features is that ladder scores rarely get very high. Since your ladder score will get reduced to zero by a single death it takes a lot of skill (or luck) in order to continually grow your score. Moreover, since you cannot challenge an opponent who is more than 5 points apart from you, the high possible ladder score for any player is 7 points higher than the second-highest score (assuming they began 5 points apart and that the higher-ranked player won). This means that in order for me to have a ladder score higher than 20, there need to be other players with a ladder score of at least 15 I can challenge. This means that the upper limit of ladder scores depends on the presence of a population of successful ladder players who collaboratively create ladder point (by playing those with 0 ladder points) and then transferring them up the ladder to the best players.

We'll see a similar dynamic with the second ranking system: elo. Like in chess rankings, elo is a system in which the change in a player's score is weighted depending on their expected likelihood of winning (which is, in turn, based on the competing players' relative elo scores). Each player who registers begins with an elo score of 1500, which defines that score as the expected skill level of an average new player. Each match results in one player gaining a number of points and the other losing an equal number of points - in other words, once again, matches effectively cause a "transfer" of points from one player to another. If a player with a lower score beats a player with a higher score, they earn more points from the win, and if a player with a higher score wins, they earn fewer points. The difference in points earned corresponds to a player's expected likelihood of winning - meaning that if I'm expected to have a 75% chance of defeating an opponent, I will earn 1/3 as many point for winning as he will if he wins, so that over the course of many games, elo scores will stabilize if players tend to win as often as they are expected to given their elo score. Since all starting players start with 1500 points, they begin ranked as equals even though some may be stronger players than others. However, the differences in skill level will fairly rapidly be reflected in their score once they begin playing ranked games.

Let's look at an example. I register a new account and start with 1500 elo. If I play and beat another new player, I will gain 12 points, to have a score of 1512, and their score will go down to 1488. Now the difference in our scores is 24, so if I play that same player again and win, I will gain slightly fewer points than I did the first time. Once the elo difference is over 100 points, I will gain 8 points from a win and my opponent will gain 16 points if he wins - as long as I win approximately twice as often as I lose, the elo difference will remain stable, but if I win more often, it will continue to go up, and if I lose more often, it will go down.

Notably, if the winning player is ahead by enough elo, they effectively gain no points from victory, so many high-ranked players will simply refuse to play ranked matches with much lower-ranked players (since they have nothing to gain and much to lose if they make a mistake). In practice, the maximum effective difference between players who can fairly compete in ranked matches is a little over 200 points. Any more of a difference and fluke wins by inexperienced players will unduly throw off the scores of high-ranked players.

All of this together suggests some interesting features of the elo economy - since a winning player gains as much as their opponent loses from a match, the sum elo score of the player population cannot grow except by the addition of new players, and that the existence of players with more than 1500 elo requires the existence of players with less than 1500 elo. Moreover, a player can only effectively grow their elo by playing opponents with an elo score within 200 points of their own, which suggests that growing your elo depends on a population of players with elos near your own, so the highest possible elo in the system depends on the number of successful players, which is in turn limited by the number of total players. That is, a population of new players is needed in order to support the elo growth of players with elos between 1500-1700, and a population of players with elos of at least 1700 is needed to support the elo growth of players with elos between 1700-1900.


As of this writing, there are 1577 players who have registered to play Warlocks, about 200 of which never played a ranked duel. Of the players who have played ranked games, 281 have an elo higher than 1500, and 419 players have an elo lower than 1500. The lowest elo in the system is 1298 (202 points lower than the average) and the highest elo is 2106 (606 points higher than the average). This suggests that in practice, a large population of weak players is needed to support the heightened elo scores of a relative few. There are two reasons for this: first, players who repeatedly lose will likely stop competing at some point, and players who repeatedly lose will have their elos fall to the point where they no longer effectively feed the elo growth of stronger players.

Since the value of a win is weighted by the likelihood of the win, players who perform as well as expected will have stable elos - if you are about twice as good as the average new player (meaning twice as likely to win), your elo should stabilize around elo 1600. However, once a player enters the higher echelons of play, the relative dirth of other high-ranked players makes it harder to play enough balanced games to maintain a representative elo. In a population of players with elos from 1400 to 1600, it is unlikely for me to grow my elo above 1800, no matter how good I become at the game.

So the grand result is this: The total size of the elo economy of the game is determined by the number of players in the system, and the larger the total elo economy is, the higher the elo ratings of the best players can be, but that for the vast majority of players, the size of the elo economy will have no impact on their personal elo scores. That is, as a resource, the total quantity of elo in the system will only effect the players at the top.

Now there are obvious disanalogies between the elo economy in Warlocks and market economies in the real world, but it nonetheless serves as an interesting model of a competition driven economy. This is also not meant in any way to be some kind of moral statement about how "just" the elo system is - the numbers simply represent the fact that some players win more often than others, and it is the explicit goal of the elo system to represent this. I simply believed that the unintended emergent features of the system are noteworthy, since they result from the interactions of thousands of players.

Monday, November 1, 2010

Waving Hands: Paralysis and "Broken" Mechanics

I recently had a request for another post on Waving Hands, and since we're in the midsts of the Warlocks 2010 championships (in which I'm well placed to make the finals right now, and on which Waving Hands is based), I decided to oblige.

A lot of discussion about the game by avid players revolves around a single spell, paralysis, which is considered by many, including myself, to be unbalanced, and possibly even broken. Its abusability has even led to the formation of a guild, the Paramancers, who specialize in this one spell. One of the interesting questions to arise from this discussion, however, is what exactly it means for a spell to be "broken."

When we talk about game balance, we generally assume that what we have in mind is a series of equally viable options. If one strategy is disproportionately represented or effective, then it is unbalanced. This, at first, seems like a fair criterion for a balanced game - if, in Starcraft, almost no one ever played Zerg, but 90% of players played Protoss, that would be a clera sign that something was amuck in the balance of the game. However, if we push the idea a little further, it starts to become murkier. Isn't the very idea of "strategy" supposed to be that some moves ARE better than others, and that finding the good moves is what makes the game fun? Rock, paper, scissors is a perfectly balanced game, because no move is better than any other, but for that reason it's impossible to strategize about anything other than player psychology, so the game is shallow.

In the case of Warlocks, some spells are clearly more popular than others, but this creates a self-balancing factor: the more commonly a strategy is used, the more predictable it is, and in a game that relies on predicting your opponent's moves as much as Warlocks does, that can be a fatal weakness. This means that if you use less popular spells, you can take your opponent by surprise, and in doing so may be able to make up for the features of the spell that make it less popular.

So in the case of paralysis, the fact that it is used much more often than most other spells is, in some sense, unbalanced, but that doesn't necessarily make the game worse. This is where we can draw a distinction between an unbalanced and broken spell, because a broken spell will interfere with the overall playability of the game. The problem, in this case, is that paralysis DOES interfere with the overall playability of the game. The criteria that can be used for a broken strategy might include:

1. Whether or not predicting the strategy makes it possible to "punish" the player who is predictable.
2. Whether or not there exists effective counters to the strategy.
3. Whether or not there is a good motivation for a player to use a different strategy.

As I mentioned before, there exists a guild called the Paramancers, who specialize in paralysis. In other words, just by being in this guild, they are effectively announcing to their opponents before the duel begins what strategy they will use, and yet they are still successful players. This simply wouldn't be the case with any other strategy - if I announced that every game I play, whenever possible, my right hand will constantly be casting antispell, I would lose every game I played (even though, in a given game, my right hand might end up making those gestures anyway).

Discussing counters will require a more detailed discussion of the spell itself, which requires some background knowledge of Warlocks. So if you've been reading up to now simply because you liked the idea of distinguishing "unbalanced" and "broken" mechanics, here's your chance to escape.

Paralysis - FFF

In Warlocks, you submit gestures on your left and right hands that add up, over several turns, to spells, that take effect when the gestures of that spell are completed. The gestures needed to cast the spell paralysis are FFF, and it causes one of your opponent's hands to be "paralyzed" into the same gesture on the next turn (well, except that W is paralyzed into P and S into D). The important thing is that since the gestures of paralysis are so symmetrical, gesturing another "F" on the next turn allows you to cast it again immediately (because now your last three gestures were, once again, "FFF"), resulting in "parachains" where one hand gestures "FFFFFFF..." ad infinitum. There is one restriction on parachains built into the basic rules, however - on consecutive turns, you can only paralyze the hand you already were paralyzing.

At first, this doesn't seem terribly abusive. If I use an endless parachain, I can keep one of my opponent's hands tied up, and effectively make us both play one-handed. This makes the spell useful on its own, in case you have some other advantage you want to hold onto, or force your opponent to respond to you and a summoned monster with only one hand available. It also grants a small initiative advantage, because as soon as the paralyzer decides to end his parachain, he can immediately begin casting a new spell, whereas the paralyzed player must suffer the effect of the final turn of paralysis before he can move his hand freely again, leaving him one turn behind on one hand. This is a significant advantage by itself.

However, the real problem with paralysis occurs when you start changing targets. Since it counts as a mind-affecting enchantment, it cancels with other mind-affecting enchantments, which means that you can cast paralysis on yourself to counter an opponent's interrupt, and then go back to paralyzing them on the next turn. What's more, when you stop paralyzing your opponent for one turn, and then start again, you can switch which hand you're affecting, allowing you to alternate and restrict your opponent's use of both his hands. Finally, paralysis cast on a monster stops the monster from attacking that turn, which means that a parachain can be used, on any given turn, to disrupt either of your opponent's hands, hold his monster at bay, or counter one of his interrupts on your (including his own paralysis). This versatility in a spell that can be cast every single turn is incredibly powerful.

There already exists, however, a standard variant that helps to make paralysis easier to disrupt, called "parafc". It means that when paralysis is used on an F gestures, the F is paralyzed into a C gesture. The significance of this is that it makes paralysis targeted at yourself (to counter another enchantment) into a risky move, because if your opponent was bluffing, and doesn't complete their own enchantment, you've paralyzed yourself, and either cannot continue your parachain (because now you've gestured FFFc), or must disrupt your other hand. The effect of this rule is that many situations in which a parachain could not be countered are now situations in which your opponent can generate "50/50" opportunities to disrupt you.

The problem with this solution seems, to me, fairly straightforward. While it's better to have a 50% chance to disrupt a parachain than a 0% chance, if I really know, before my opponent has made a single move, what he's going to do, I should have a 100% chance of countering him. Paralysis can still be cast every turn, can restrict both of your opponent's hands, and can be used defensively - now it's simply that you have run a 50% risk of disrupting yourself when using it defensively (which is the case with every other mind-affecting enchantment, anyway). The fact remains that even playing parafc, Paramancers can play effectively even when their opponents know what strategy they are using, and this simply shouldn't be the case.

I think there are two ways to balance out paralysis as it stands, and they each essentially involve removing one of the spell's advantages - either it shouldn't be cast every turn, or it shouldn't be able to target either of an opponent's hands. I, in fact, already play with the former restriction self-imposed: I do not allow myself to cast paralysis more than three times consecutively, and this is a restriction that is public, and which my opponents know about and take advantage of. The reason I play with this restriction is simple - because playing without extended parachains makes the game more dynamic and interesting, and if I didn't have an explicit rule, I would end up using them simply because they are so advantageous. However, a fiat "do not cast paralysis more than three times" rule is not a very elegant solution, so this restriction, were it generally enforced, would best take the form of a chance in the gestures of paralysis such that it was impossible to cast every turn.

And restricting which hand can be paralyzed would also balance out the spell effectively. My preferred way of handling this would be to say that whichever hand I cast paralysis with is the hand that gets paralyzed (if I gesture "FFF" with my right hand, I can paralyze your right hand). This is significant for a reason that goes beyond the ability of a parachain to alternate hands every other turn - even though only one hand can be paralyzed, as long as the caster gets to choose which hand it is that is affected, and the choice is made after the spell is successfully cast, the target of paralysis has to restrict which gestures he makes on BOTH of his hands, in order to avoid having a gesture on either hand that will become particularly unusable once paralysis takes effect. In fact, it might be a downside of the parafc variant that any spell featuring an F can be so disastrously disrupted by paralysis (given how few spells use the C gesture). The result of this is that on a turn in which I expect to be targeted by paralysis, I will try to gesture either a W (paralyzed into WPP - counterspell) or PS (paralyzed into PSDD - charm monster) on each hand to prevent my opponent from being able to totally disrupt my spellflow. However, since my opponent can alternate the paralyzed hand every other turn, that means that in theory I have to be prepared to make one of those restricted gestures on each hand every other turn. If I knew ahead of time which hand would be paralyzed, this would not be an issue.

There's plenty more to say on the topic, but this is plenty for one post. I like to hope that some of my idealistic self-imposed restrictions gain traction in the community at large, or that at some point a further evolved version of the game will address some of these issues, but in the meantime paralysis is a legitimately broken aspect of an otherwise unbelievably well-crafted game that we must live with.