Matrix Games and Linear Programming
Every matrix defines a game for two. If G is an m-by-n matrix then,
in each round,
the row player selects a row, e.g., i, and, at the same time,
the column player selects a column, j,
after which
the column player pays G(i,j) units to the row player,
with the understanding that if G(i,j)<0 then the
row player pays -G(i,j) units to the column player
For example, the game of rock-scissors-paper, is expressed by
R S P
R 0 1 -1
S -1 0 1
P 1 -1 0
We see that rock beats scissors, paper beats rock, scissors beats
paper, and that draws cost nothing. The adorning R S P on top and left
are simply adornments. Our main interest is in the Game matrix
0 1 -1
G = -1 0 1
1 -1 0
and rather than focus on individual rounds we will uncover strategies
that are best on average. By strategy we mean a decision, over the
course of a tournament, to
play rock with probability x1, scissors
with probability x2, and paper with
probability x3. Of course
x1 &ge 0, x2 &ge 0, x3 &ge 0, and x1+x2+x3=1.
As such vectors appear often in game theory, we give them a name. A vector
with n nonnegative elements, all of which sum to one, is called
stochastic. We denote the collection of such vectors by
Sn = Stochastic n-vectors
Now, if the game matrix, G,is m-by-n, and the row player adopts
strategy x &isin Sm and the
column player adopts
strategy y &isin Sn then
xTGy is the average payoff (to the row player) per round.
We now show how the row player can maximize his/her gains and how the
column player can minimize his/her losses.
The Optimal Row Strategy
We note that a given row stategy x is assured of winning at least
min xTGy
y &isin Sn
Our first step is to note that among the most effective replies y to
a given x there always lies at least one pure (only one nonzero
entry, e.g., always playing scissors) strategy.
This is easier done than said. For example, for the RSP game,
xTG = [x3-x2 x1-x3 x2-x1]
Now, for a given x, one of these elements, say the 2nd,
is less than or equal to the other two. As a result, the pure strategy
p = [0 1 0]T
will yield
min xTGy = xTGp = min xTG = min GTx
y &isin Sn
The last equality stems from the fact that
(xTG)T = GTx
and is invoked because Matlab prefers to work with columns rather than rows.
Hence, the optimal row strategy is achieved by solving
max min GTx
x &isin Sm
Our second step is to translate this max min into a standard
linear program
(something amenable to Matlab's linprog). We do this
by asking for the biggest scalar (z) that remains less than each element of
GTx, i.e., we solve
max z
(LPr) z*ones(n,1)-GTx &le 0
x &isin Sm
We code the solution to (LPr) by appending z to the end of
x. Here is the code for finding the best
rock-scissors-paper row strategy. Here is a diary of its use
rsprow
Optimization terminated.
Best RSP row strategy
0.3333
0.3333
0.3333
Average payoff
0
diary
You are pleased that it gave the expected answer, but look forward to
the unexpected.
The Optimal Column Strategy
In a game like RSP where the players are interchangable we expect that the
best row strategy will also be the best the column strategy. As this week's
assignment is not such a game, you, perhaps, might be interested in
reading this section carefully.
We note that a given column stategy y is assured of losing no more than
max xTGy
x &isin Sm
Our first step is to note that among the most effective replies x
to
a given y there always lies at least one pure
strategy.
In symbols this means that
max xTGy = max Gy
x &isin Sm
Hence, the optimal column strategy is achieved by solving
min max Gy
y &isin Sn
Our second step is to translate (as above) this max min into a standard
linear program (something amenable to Matlab's linprog). In particular,
we search for the smallest scalar (z) that exceeds every element of Gy.
min z
(LPc) Gy - z*ones(m,1) &le 0
y &isin Sn
We code the solution to (LPc) by appending z to the end of
y. Here is the code for finding the best
rock-scissors-paper column strategy. Here is a diary of its use
rspcol
Optimization terminated.
Best RSP column strategy
0.3333
0.3333
0.3333
Average payout
0
diary
And so indeed, the two optimal strategies coincide, and the maximal
row payoff is precisely the minimum column payout. The
MiniMax Theorem: min max xTGy = max min xTGy
y &isin Sn x &isin Sm x &isin Sm y &isin Sn
informs us that the maximal row payoff is indeed the minimum column payout
for every game (for two) in the world.
When this common value is zero we call the game fair.
In preparation for the week's assignment let us move beyond the
simple RSP game.
The Game of Morra
Each player possesses two tokens. In each round they hide either one
or both and guess the number hidden by their opponent. If only one player
guesses correctly then he/she receives payment equal to the total number
of hidden tokens. The available pure strategies are then
hide 1, guess 1
hide 1, guess 2
hide 2, guess 1
hide 2, guess 2
and so the game matrix takes the form
0 2 -3 0
-2 0 0 3
G = 3 0 0 -4
0 -3 4 0
Here is the associated code and diary
morra
Optimization terminated.
Best Morra row strategy
0
0.6000
0.4000
0
Average payoff
0
Optimization terminated.
Best Morra column strategy
0
0.6000
0.4000
0
Average payout
0
diary
Hence, the game is fair and a best strategy is to play (hide 1,guess 2)
with probability 3/5 and (hide 2, guess 1) with probability 2/5.
We see that it is best is to suppose that your opponent's cache is
opposite of yours.
We now introduce a slight twist, stemming from the row player's modesty.
You see, he is a bit uncomfortable bleating out his guess (as required)
at exactly the same moment as his lovely opponent, and so offers that she be
allowed to speak first. She considers his polite offer and reasons that
as her guess bears no relation to the number of tokens she herself has
hidden that the game remains fair. Let us see.
While her pure strategies remain unchanged, he now has, in
addition, the option to
hide 1, echo her guess
hide 1, counter her guess
hide 2, echo her guess
hide 2, counter her guess
and so game matrix for polite Morra is
0 2 -3 0
-2 0 0 3
G = 3 0 0 -4
0 -3 4 0
0 0 -3 3
-2 2 0 0
3 -3 0 0
0 0 4 -4
From the associated code and diary
morrap
Optimization terminated.
Best Polite Morra row strategy
0
0.5657
0.4040
0
0
0.0202
0
0.0101
Average payoff
0.0404
Optimization terminated.
Best polite Morra column strategy
0.2828
0.3030
0.2121
0.2020
Average payout
0.0404
diary
we see that it pays to be a gentleman!
In particular, he gains (and she loses) 4/99 on average per round.
He does this by countering her guess about three percent of the time.
He never echoes her guess. Although linprog has generated a new
best strategy for her, it will, in fact do no better than if she had
stuck with her old one.