Homework 12 - DNA OCR via Neural Nets. First draft due 1pm Friday November 16. Final Draft due 5pm Monday, November 19

Train a 3 layer neural net to recognize digitized versions of the letters A, C, G and T. Your net should have 25 inputs, 2 outputs and 25 "hidden" cells.

The inputs will be binary and correspond to the LED-like figure to the left. In particular, the input pattern for the character C is [1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1]. For your targets, please use

T = [0 0], A = [0 1], C = [1 0], and G = [1 1].

Write a function

[V,W] = ocrtrain(V0,W0,maxiter,rate)

that returns the two weight matrices after successful training at the given rate. Drive this function with a function called ocr, as in,

seq = ocr(bitstream)

that "reads" the N-by-25 bitstream through the net with V and W and produces a string of N letters. For example, if

bitstream = [1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1; 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1; 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 1; 1 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0]

then seq should be CGAT. Hint The grad formulas on the lecture page work nice for scalar outputs. But here your outputs have two components, in particular your W has two rows. What to do? Well, on each iteration just work on updating a single row of W. How should you choose which row? Flip a rand.
2nd Hint: Make sure your starting weights, V0 and WO, are diverse, i.e., composed of both positive and negative values.

Your work will be graded as follows

     First Draft,
        3 pts for guts of ocrtrain
        3 pts for guts of ocr

     Final Draft,
        8 pts for headers CONTAINING detailed USAGE
        8 pts for further comments in code 
        4 pts for indentation
        14 pts for correct ocrtrain
        10 pts for correct ocr

        10 pts for diary of training session and translation of the
               4-by-25 bitstream above.