Derivatives of logarithmic stationary distributions for policy gradient reinforcement learningTetsuro MorimuraEiji Uchibeet al.2010Neural ComputationPaper
A generalized natural actor-critic algorithmTetsuro MorimuratEiji Uchibeet al.2009NeurIPS 2009Conference paper
Natural actor-critic with baseline adjustment for variance reductionTetsuro MorimuraEiji Uchibeet al.2008Artificial Life and RoboticsPaper