Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method

Published in Advances in Neural Information Processing Systems 33, 2020

Recommended citation: Zhou, Q., Kuang, Y., Qiu, Z., Li, H., & Wang, J. (2020). Promoting stochasticity for expressive policies via a simple and efficient regularization method. Advances in Neural Information Processing Systems, 33, 13504-13514. https://proceedings.neurips.cc/paper/2020/hash/9cafd121ba982e6de30ffdf5ada9ce2e-Abstract.html

Abstract

Many recent reinforcement learning (RL) methods learn stochastic policies with entropy regularization for exploration and robustness. However, in continuous action spaces, integrating entropy regularization with expressive policies is challenging and usually requires complex inference procedures. To tackle this problem, we propose a novel regularization method that is compatible with a broad range of expressive policy architectures. An appealing feature is that, the estimation of our regularization terms is simple and efficient even when the policy distributions are unknown. We show that our approach can effectively promote the exploration in continuous action spaces. Based on our regularization, we propose an off-policy actor-critic algorithm. Experiments demonstrate that the proposed algorithm outperforms state-of-the-art regularized RL methods in continuous control tasks.

Citation

If you want to cite our work, please use:

@inproceedings{ACED2020NEURIPS,
    author = {Zhou, Qi and Kuang, Yufei and Qiu, Zherui and Li, Houqiang and Wang, Jie},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {H. Larochelle and M. Ranzato and R. Hadsell and M.F. Balcan and H. Lin},
    pages = {13504--13514},
    publisher = {Curran Associates, Inc.},
    title = {Promoting Stochasticity for Expressive Policies via a Simple and Efficient Regularization Method},
    url = {https://proceedings.neurips.cc/paper_files/paper/2020/file/9cafd121ba982e6de30ffdf5ada9ce2e-Paper.pdf},
    volume = {33},
    year = {2020}
}

Share on

Twitter Facebook LinkedIn

Zherui Qiu

Abstract

Citation

Share on