Matrix Games and Linear Programming

Every matrix defines a game for two. If G is an m-by-n matrix then, in each round,

   the row player selects a row, e.g., i, and, at the same time,
   the column player selects a column, j, 
   after which
   the column player pays G(i,j) units to the row player,
       with the understanding that if G(i,j)<0 then the
       row player pays -G(i,j) units to the column player

For example, the game of rock-scissors-paper, is expressed by

      R   S   P
  R   0   1  -1
  S  -1   0   1
  P   1  -1   0

We see that rock beats scissors, paper beats rock, scissors beats paper, and that draws cost nothing. The adorning R S P on top and left are simply adornments. Our main interest is in the Game matrix

      0   1  -1
 G = -1   0   1
      1  -1   0

and rather than focus on individual rounds we will uncover strategies that are best on average. By strategy we mean a decision, over the course of a tournament, to play rock with probability x₁, scissors with probability x₂, and paper with probability x₃. Of course

  x₁ &ge 0,  x₂ &ge 0, x₃ &ge 0, and x₁+x₂+x₃=1.

As such vectors appear often in game theory, we give them a name. A vector with n nonnegative elements, all of which sum to one, is called stochastic. We denote the collection of such vectors by

   S_n = Stochastic n-vectors

Now, if the game matrix, G,is m-by-n, and the row player adopts strategy x &isin S_m and the column player adopts strategy y &isin S_n then

   x^TGy is the average payoff (to the row player) per round.

We now show how the row player can maximize his/her gains and how the column player can minimize his/her losses.

The Optimal Row Strategy

We note that a given row stategy x is assured of winning at least

   min x^TGy  
  y &isin S_n

Our first step is to note that among the most effective replies y to a given x there always lies at least one pure (only one nonzero entry, e.g., always playing scissors) strategy. This is easier done than said. For example, for the RSP game,

  x^TG = [x₃-x₂   x₁-x₃   x₂-x₁]

Now, for a given x, one of these elements, say the 2nd, is less than or equal to the other two. As a result, the pure strategy

  p = [0  1  0]^T

will yield

  min x^TGy  = x^TGp = min x^TG = min G^Tx
  y &isin S_n

The last equality stems from the fact that (x^TG)^T = G^Tx and is invoked because Matlab prefers to work with columns rather than rows.

Hence, the optimal row strategy is achieved by solving

   max    min G^Tx
  x &isin S_m

Our second step is to translate this max min into a standard linear program (something amenable to Matlab's linprog). We do this by asking for the biggest scalar (z) that remains less than each element of G^Tx, i.e., we solve

        max z
(LPr)  z*ones(n,1)-G^Tx &le 0  
       x &isin S_m

We code the solution to (LPr) by appending z to the end of x. Here is the code for finding the best rock-scissors-paper row strategy. Here is a diary of its use

     rsprow
     Optimization terminated.
 
     Best RSP row strategy
         0.3333
         0.3333
         0.3333

     Average payoff
          0

     diary

You are pleased that it gave the expected answer, but look forward to the unexpected.

The Optimal Column Strategy

In a game like RSP where the players are interchangable we expect that the best row strategy will also be the best the column strategy. As this week's assignment is not such a game, you, perhaps, might be interested in reading this section carefully.

We note that a given column stategy y is assured of losing no more than

   max x^TGy
  x &isin S_m

Our first step is to note that among the most effective replies x to a given y there always lies at least one pure strategy. In symbols this means that

   max x^TGy  = max Gy
  x &isin S_m

Hence, the optimal column strategy is achieved by solving

   min    max Gy
  y &isin S_n

Our second step is to translate (as above) this max min into a standard linear program (something amenable to Matlab's linprog). In particular, we search for the smallest scalar (z) that exceeds every element of Gy.

        min z
(LPc)  Gy - z*ones(m,1) &le 0
       y &isin S_n

We code the solution to (LPc) by appending z to the end of y. Here is the code for finding the best rock-scissors-paper column strategy. Here is a diary of its use

     rspcol
     Optimization terminated.
 
     Best RSP column strategy
         0.3333
         0.3333
         0.3333

     Average payout
          0
     diary

And so indeed, the two optimal strategies coincide, and the maximal row payoff is precisely the minimum column payout. The

  MiniMax Theorem:   min    max   x^TGy   =   max    min   x^TGy 
                   y &isin S_n  x &isin S_m        x &isin S_m  y &isin S_n

informs us that the maximal row payoff is indeed the minimum column payout for every game (for two) in the world. When this common value is zero we call the game fair.

In preparation for the week's assignment let us move beyond the simple RSP game.

The Game of Morra

Each player possesses two tokens. In each round they hide either one or both and guess the number hidden by their opponent. If only one player guesses correctly then he/she receives payment equal to the total number of hidden tokens. The available pure strategies are then

    hide 1, guess 1
    hide 1, guess 2
    hide 2, guess 1
    hide 2, guess 2

and so the game matrix takes the form

      0   2  -3   0
     -2   0   0   3
 G =  3   0   0  -4
      0  -3   4   0

Here is the associated code and diary

     morra
     Optimization terminated.
 
     Best Morra row strategy
              0
         0.6000
         0.4000
              0

     Average payoff
          0

     Optimization terminated.
     Best Morra column strategy
              0
         0.6000
         0.4000
              0

     Average payout
          0

     diary

Hence, the game is fair and a best strategy is to play (hide 1,guess 2) with probability 3/5 and (hide 2, guess 1) with probability 2/5. We see that it is best is to suppose that your opponent's cache is opposite of yours.

We now introduce a slight twist, stemming from the row player's modesty. You see, he is a bit uncomfortable bleating out his guess (as required) at exactly the same moment as his lovely opponent, and so offers that she be allowed to speak first. She considers his polite offer and reasons that as her guess bears no relation to the number of tokens she herself has hidden that the game remains fair. Let us see.

While her pure strategies remain unchanged, he now has, in addition, the option to

	hide 1, echo her guess
	hide 1, counter her guess
	hide 2, echo her guess
	hide 2, counter her guess

and so game matrix for polite Morra is

      0   2  -3   0
     -2   0   0   3
 G =  3   0   0  -4
      0  -3   4   0
      0   0  -3   3
     -2   2   0   0
      3  -3   0   0
      0   0   4  -4

From the associated code and diary

     morrap
     Optimization terminated.
 
     Best Polite Morra row strategy
              0
         0.5657
         0.4040
              0
              0
         0.0202
              0
         0.0101

     Average payoff
         0.0404


     Optimization terminated.
     Best polite Morra column strategy
         0.2828
         0.3030
         0.2121
         0.2020

     Average payout
         0.0404

     diary

we see that it pays to be a gentleman! In particular, he gains (and she loses) 4/99 on average per round. He does this by countering her guess about three percent of the time. He never echoes her guess. Although linprog has generated a new best strategy for her, it will, in fact do no better than if she had stuck with her old one.

Matrix Games and Linear Programming

the row player selects a row, e.g., i, and, at the same time, the column player selects a column, j, after which the column player pays G(i,j) units to the row player, with the understanding that if G(i,j)<0 then the row player pays -G(i,j) units to the column player

R S P R 0 1 -1 S -1 0 1 P 1 -1 0

0 1 -1 G = -1 0 1 1 -1 0

x1 &ge 0, x2 &ge 0, x3 &ge 0, and x1+x2+x3=1.

Sn = Stochastic n-vectors

xTGy is the average payoff (to the row player) per round.

The Optimal Row Strategy

min xTGy y &isin Sn

xTG = [x3-x2 x1-x3 x2-x1]

p = [0 1 0]T

min xTGy = xTGp = min xTG = min GTx y &isin Sn

max min GTx x &isin Sm

max z (LPr) z*ones(n,1)-GTx &le 0 x &isin Sm

The Optimal Column Strategy

max xTGy x &isin Sm

max xTGy = max Gy x &isin Sm

min max Gy y &isin Sn

min z (LPc) Gy - z*ones(m,1) &le 0 y &isin Sn

MiniMax Theorem: min max xTGy = max min xTGy y &isin Sn x &isin Sm x &isin Sm y &isin Sn

The Game of Morra

hide 1, guess 1 hide 1, guess 2 hide 2, guess 1 hide 2, guess 2

0 2 -3 0 -2 0 0 3 G = 3 0 0 -4 0 -3 4 0

hide 1, echo her guess hide 1, counter her guess hide 2, echo her guess hide 2, counter her guess

0 2 -3 0 -2 0 0 3 G = 3 0 0 -4 0 -3 4 0 0 0 -3 3 -2 2 0 0 3 -3 0 0 0 0 4 -4

x₁ &ge 0, x₂ &ge 0, x₃ &ge 0, and x₁+x₂+x₃=1.

S_n = Stochastic n-vectors

x^TGy is the average payoff (to the row player) per round.

min x^TGy y &isin S_n

x^TG = [x₃-x₂ x₁-x₃ x₂-x₁]

p = [0 1 0]^T

min x^TGy = x^TGp = min x^TG = min G^Tx y &isin S_n

max min G^Tx x &isin S_m

max z (LPr) z*ones(n,1)-G^Tx &le 0 x &isin S_m

max x^TGy x &isin S_m

max x^TGy = max Gy x &isin S_m

min max Gy y &isin S_n

min z (LPc) Gy - z*ones(m,1) &le 0 y &isin S_n

MiniMax Theorem: min max x^TGy = max min x^TGy y &isin S_n x &isin S_m x &isin S_m y &isin S_n