Skip to main content
Log in

Input-Decoupled Q-Learning for Optimal Control

  • Published:
The Journal of the Astronautical Sciences Aims and scope Submit manuscript

Abstract

A design of optimal controllers based on a reinforcement learning method called Q-Learning is presented. Central to Q-Learning is the Q-function which is a function of the state and all input variables. This paper shows that decoupled-in-the-inputs Q-functions exist, and can be used to find the optimal controllers for each input individually. The method thus converts a multiple-variable optimization problem into much simpler single-variable optimization problems while achieving optimality. An explicit model of the system is not required to learn these decoupled Q-functions, but rather the method relies on the ability to probe the system and observe its state transition. Derived within the framework of modern control theory, the method is applicable to both linear and non-linear systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction, vol. 1. MIT Press, Cambridge (1998)

  2. Vrabie, D., Vamvoudakis, K.G., Lewis, F.L.: Optimal adaptive control and differential games by reinforcement learning principles, Vol. 2. IET (2013)

  3. Watkins, C.J.C.H.: Learning from delayed rewards. University of Cambridge England, PhD thesis (1989)

    Google Scholar 

  4. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3-4), 279–292 (1992)

    Article  Google Scholar 

  5. Werbos, P.J.: A menu of designs for reinforcement learning over time. Neural Networks for Control, pp. 67–95 (1990)

  6. Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is direct adaptive optimal control. IEEE Control. Syst. 12(2), 19–22 (1992)

    Article  Google Scholar 

  7. Bradtke, S.J.: Reinforcement learning applied to linear quadratic regulation. Advances in Neural Information Processing Systems, pp. 295–295 (1993)

  8. Bradtke, S.J., Ydstie, B.E., Barto, A.G.: Adaptive linear quadratic control using policy iteration, American Control Conference, 1994, Vol. 3, IEEE, pp. 3475–3479 (1994)

  9. Ten Hagen, S., Kröse, B.: Linear quadratic regulation using reinforcement learning (1998)

  10. Kretchmar, R.M., Young, P.M., Anderson, C.W., Hittle, D.C., Anderson, M.L., Delnero, C.C.: Robust reinforcement learning control with static and dynamic stability. Int. J. Robust Nonlinear Control 11(15), 1469–1500 (2001)

    Article  MathSciNet  Google Scholar 

  11. Negenborn, R.R., De Schutter, B., Wiering, M.A., Hellendoorn, H.: Learning-based model predictive control for Markov decision processes. IFAC Proceedings Volumes 38(1), 354–359 (2005)

    Article  Google Scholar 

  12. Gaweda, A.E., Muezzinoglu, M.K., Jacobs, A.A., Aronoff, G.R., Brier, M.E.: Model predictive control with reinforcement learning for drug delivery in renal anemia management, Engineering in Medicine and Biology Society 2006. EMBS’06. 28th Annual International Conference of the IEEE, IEEE, pp. 5177–5180 (2006)

  13. Al-Tamimi, A., Vrabie, D., Abu-Khalaf, M., Lewis, F. L.: Model-free approximate dynamic programming schemes for linear systems. In: 2007 international joint conference on neural networks, 2007. IJCNN, IEEE, pp. 371–378 (2007)

  14. Lewis, F., Liu, D., Lendaris, G., Werbos, P., Balakrishnan, S., Ding, J.: Special issue on adaptive dynamic programming and reinforcement learning in feedback control. IEEE Trans. Syst. Man Cybern. B Cybern. 38(4), 896 (2008)

    Article  Google Scholar 

  15. Wang, F.-Y., Zhang, H., Liu, D.: Adaptive dynamic programming: An introduction. IEEE Comput. Intell. Mag. 4, 2 (2009)

    Google Scholar 

  16. Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Trans. Syst. Man Cybern. B Cybern. 39(2), 517–529 (2009)

    Article  Google Scholar 

  17. Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Control. Syst. 32(6), 76–105 (2012)

    Article  MathSciNet  Google Scholar 

  18. Kiumarsi, B., Lewis, F.L., Modares, H., Karimpour, A., Naghibi-Sistani, M. -B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)

    Article  MathSciNet  Google Scholar 

  19. Palanisamy, M., Modares, H., Lewis, F.L., Aurangzeb, M.: Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans. Cybern. 45(2), 165–176 (2015)

    Article  Google Scholar 

  20. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), IEEE, pp. 528–535 (2016)

  21. Kahn, G., Villaflor, A., Pong, V., Abbeel, P., Levine, S.: Uncertainty-aware reinforcement learning for collision avoidance. arXiv:1702.01182(2017)

  22. Vamvoudakis, K.G.: Q-learning for continuous-time linear systems: A model-free infinite horizon optimal control approach. Syst. Control Lett. 100, 14–20 (2017)

    Article  MathSciNet  Google Scholar 

  23. Camacho, E.F., Bordons, C.: Nonlinear model predictive control: An introductory review, Assessment and future directions of nonlinear model predictive control, pp. 1–16, Springer (2007)

  24. Anderson, B., Moore, J.: Optimal control: linear quadratic methods. Prentice-Hall, Upper Saddle River (1990)

    MATH  Google Scholar 

  25. Phan, M.Q., Azad, S.M.B.: Model predictive control and Model Predictive Q-Learning for structural vibration control. AAS/AIAA Astrodynamics Specialist Conference (2017)

  26. Phan, M.Q., Azad, S.M.B.: Model Predictive Q-Learning (MPQ-L) for bilinear systems. In: 7th International Conference on High Performance Scientific Computing: Modeling, Simulation, and Optimization of Complex Processes, Hanoi (2018)

  27. Kulkarni, N.V., Phan, M.Q.: Reinforcement-learning-based magneto-hydrodynamic control of hypersonic flows. IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pp. 9–191 (2007)

  28. Kulkarni, N.V., Phan, M.Q.: Neural optimal magnetohydrodynamic control of hypersonic flows. J. Guid. Control. Dyn. 30(5), 1519–1523 (2007)

    Article  Google Scholar 

  29. Kulkarni, N.V., Phan, M.Q.: Performance optimization of the magnetohydrodynamic generator at the scramjet inlet. J. Propuls. Power 21(5), 822–830 (2005)

    Article  Google Scholar 

  30. Kulkarni, N.V., Phan, M.Q.: Neural-network-based design of optimal controllers for nonlinear systems. J. Guid. Control. Dyn. 27(5), 745–751 (2004)

    Article  Google Scholar 

  31. Postoyan, R., Buṡoniu, L., Nešić, D., Daafouz, J.: Stability of infinite-horizon optimal control with discounted cost. In: 2014 IEEE 53rd Annual Conference on Decision and Control (CDC), IEEE, pp. 3903–3908 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minh Q. Phan.

Ethics declarations

Conflict of interests

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Phan, M.Q., Azad, S.M.B. Input-Decoupled Q-Learning for Optimal Control. J Astronaut Sci 67, 630–656 (2020). https://doi.org/10.1007/s40295-019-00157-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40295-019-00157-4

Keywords

Navigation