Derivatives of logarithmic stationary distributions for policy gradient reinforcement learningTetsuro MorimuraEiji Uchibeet al.2010Neural ComputationPaper
A generalized natural actor-critic algorithmTetsuro MorimuratEiji Uchibeet al.2009NeurIPS 2009Conference paper