Matrix Games and Linear Programming

Every matrix defines a game for two. If G is an m-by-n matrix then, in each round,

   the row player selects a row, e.g., i, and, at the same time,
   the column player selects a column, j, 
   after which
   the column player pays G(i,j) units to the row player,
       with the understanding that if G(i,j)<0 then the
       row player pays -G(i,j) units to the column player

For example, the game of rock-scissors-paper, is expressed by

      R   S   P
  R   0   1  -1
  S  -1   0   1
  P   1  -1   0

We see that rock beats scissors, paper beats rock, scissors beats paper, and that draws cost nothing. The adorning R S P on top and left are simply adornments. Our main interest is in the Game matrix

      0   1  -1
 G = -1   0   1
      1  -1   0

and rather than focus on individual rounds we will uncover strategies that are best on average. By strategy we mean a decision, over the course of a tournament, to play rock with probability x1, scissors with probability x2, and paper with probability x3. Of course

  x1 &ge 0,  x2 &ge 0, x3 &ge 0, and x1+x2+x3=1.

As such vectors appear often in game theory, we give them a name. A vector with n nonnegative elements, all of which sum to one, is called stochastic. We denote the collection of such vectors by

   Sn = Stochastic n-vectors

Now, if the game matrix, G,is m-by-n, and the row player adopts strategy x &isin Sm and the column player adopts strategy y &isin Sn then

   xTGy is the average payoff (to the row player) per round.

We now show how the row player can maximize his/her gains and how the column player can minimize his/her losses.

The Optimal Row Strategy

We note that a given row stategy x is assured of winning at least

   min xTGy  
  y &isin Sn

Our first step is to note that among the most effective replies y to a given x there always lies at least one pure (only one nonzero entry, e.g., always playing scissors) strategy. This is easier done than said. For example, for the RSP game,

  xTG = [x3-x2   x1-x3   x2-x1]

Now, for a given x, one of these elements, say the 2nd, is less than or equal to the other two. As a result, the pure strategy

  p = [0  1  0]T

will yield

  min xTGy  = xTGp = min xTG = min GTx
  y &isin Sn

The last equality stems from the fact that (xTG)T = GTx and is invoked because Matlab prefers to work with columns rather than rows.

Hence, the optimal row strategy is achieved by solving

   max    min GTx
  x &isin Sm

Our second step is to translate this max min into a standard linear program (something amenable to Matlab's linprog). We do this by asking for the biggest scalar (z) that remains less than each element of GTx, i.e., we solve

        max z
(LPr)  z*ones(n,1)-GTx &le 0  
       x &isin Sm

We code the solution to (LPr) by appending z to the end of x. Here is the code for finding the best rock-scissors-paper row strategy. Here is a diary of its use
     rsprow
     Optimization terminated.
 
     Best RSP row strategy
         0.3333
         0.3333
         0.3333

     Average payoff
          0

     diary
You are pleased that it gave the expected answer, but look forward to the unexpected.

The Optimal Column Strategy

In a game like RSP where the players are interchangable we expect that the best row strategy will also be the best the column strategy. As this week's assignment is not such a game, you, perhaps, might be interested in reading this section carefully.

We note that a given column stategy y is assured of losing no more than

   max xTGy
  x &isin Sm

Our first step is to note that among the most effective replies x to a given y there always lies at least one pure strategy. In symbols this means that

   max xTGy  = max Gy
  x &isin Sm

Hence, the optimal column strategy is achieved by solving

   min    max Gy
  y &isin Sn

Our second step is to translate (as above) this max min into a standard linear program (something amenable to Matlab's linprog). In particular, we search for the smallest scalar (z) that exceeds every element of Gy.

        min z
(LPc)  Gy - z*ones(m,1) &le 0
       y &isin Sn

We code the solution to (LPc) by appending z to the end of y. Here is the code for finding the best rock-scissors-paper column strategy. Here is a diary of its use
     rspcol
     Optimization terminated.
 
     Best RSP column strategy
         0.3333
         0.3333
         0.3333

     Average payout
          0
     diary
And so indeed, the two optimal strategies coincide, and the maximal row payoff is precisely the minimum column payout. The

  MiniMax Theorem:   min    max   xTGy   =   max    min   xTGy 
                   y &isin Sn  x &isin Sm        x &isin Sm  y &isin Sn

informs us that the maximal row payoff is indeed the minimum column payout for every game (for two) in the world. When this common value is zero we call the game fair.

In preparation for the week's assignment let us move beyond the simple RSP game.

The Game of Morra

Each player possesses two tokens. In each round they hide either one or both and guess the number hidden by their opponent. If only one player guesses correctly then he/she receives payment equal to the total number of hidden tokens. The available pure strategies are then

    hide 1, guess 1
    hide 1, guess 2
    hide 2, guess 1
    hide 2, guess 2

and so the game matrix takes the form

      0   2  -3   0
     -2   0   0   3
 G =  3   0   0  -4
      0  -3   4   0

Here is the associated code and diary
     morra
     Optimization terminated.
 
     Best Morra row strategy
              0
         0.6000
         0.4000
              0

     Average payoff
          0

     Optimization terminated.
     Best Morra column strategy
              0
         0.6000
         0.4000
              0

     Average payout
          0

     diary
Hence, the game is fair and a best strategy is to play (hide 1,guess 2) with probability 3/5 and (hide 2, guess 1) with probability 2/5. We see that it is best is to suppose that your opponent's cache is opposite of yours.

We now introduce a slight twist, stemming from the row player's modesty. You see, he is a bit uncomfortable bleating out his guess (as required) at exactly the same moment as his lovely opponent, and so offers that she be allowed to speak first. She considers his polite offer and reasons that as her guess bears no relation to the number of tokens she herself has hidden that the game remains fair. Let us see.

While her pure strategies remain unchanged, he now has, in addition, the option to

	hide 1, echo her guess
	hide 1, counter her guess
	hide 2, echo her guess
	hide 2, counter her guess

and so game matrix for polite Morra is

      0   2  -3   0
     -2   0   0   3
 G =  3   0   0  -4
      0  -3   4   0
      0   0  -3   3
     -2   2   0   0
      3  -3   0   0
      0   0   4  -4

From the associated code and diary
     morrap
     Optimization terminated.
 
     Best Polite Morra row strategy
              0
         0.5657
         0.4040
              0
              0
         0.0202
              0
         0.0101

     Average payoff
         0.0404


     Optimization terminated.
     Best polite Morra column strategy
         0.2828
         0.3030
         0.2121
         0.2020

     Average payout
         0.0404

     diary
we see that it pays to be a gentleman! In particular, he gains (and she loses) 4/99 on average per round. He does this by countering her guess about three percent of the time. He never echoes her guess. Although linprog has generated a new best strategy for her, it will, in fact do no better than if she had stuck with her old one.