Random-normal epsilon-greedy epsilon with-prob epsilon random arg-max-random greedy arg-max-random arg-max-random array returns index first instance the largest value the array loop with best-args list with best-value aref array for from below length array for value aref array cond value best-value value best-value setq best-value value setq best-args list value best-value push best-args finally return values nth random length best-args best-args best-value max-q num-tasks mean loop for task below num-tasks collect loop for below maximize aref task defvar epsilon defvar defvar defvar defvar randomness defvar max-num-tasks defvar rbar defvar time setup setq setq make-array setq make-array setq make-array list max-num-tasks setq randomness make-array max-num-tasks standardize-random-state advance-random-state loop for task below max-num-tasks loop for below setf aref task random-normal setf aref randomness task make-random-state init loop for below setf aref setf aref setq rbar setq time defun runs optional num-runs num-steps epsilon loop with average-reward make-list num-steps initial-element with prob-a make-list num-steps initial-element for run-num below num-runs for loop for from below when aref run-num aref run-num setq format run-num print print loop for below collect aref run-num init setq random-state aref randomness run-num collect loop for time-step below num-steps for a-greedy arg-max-random for with-prob epsilon random a-greedy for prob-a epsilon a-greedy epsilon for reward run-num format prob-a rbar prob-a rbar aref learn prob-a format aref incf nth time-step average-reward when incf nth time-step prob-a finally return loop for below num-steps setf nth average-reward nth average-reward num-runs setf nth prob-a
(random-normal)))
(defun epsilon-greedy (epsilon)
(with-prob epsilon
(random n)
(arg-max-random-tiebreak Q)))
(defun setup ()
(setq n 10)
(setq Q (make-array n))
(setq n_a (make-array n))
(setq Q* (make-array (list n max-num-tasks)))
(setq randomness (make-array max-num-tasks))
(standardize-random-state)
(advance-random-state 0)
(loop for task below max-num-tasks do
(loop for a below n do
(setf (aref Q* a task) (random-normal))) (setf (aref randomness
task)
(make-random-state))))
(defun init ()
(loop for a below n do
(setf (aref Q a) 0.0)
(setf (aref n_a a) 0))
(setq rbar 0.0)
(setq time 0))