mdps
- 网络Markov决策过程
-
Parallel Q-learning Algorithms for MDPs Based on Performance Potentials
一种MDP基于性能势的并行Q学习算法
-
S (λ): A reinforcement learning algorithm based on average-payoff MDPs
S(λ):一个基于平均奖赏MDPs的激励学习算法
-
Discounted and Undiscounted MDPs : a Case Study Based on SARSA (λ) Algorithms
折扣与无折扣MDPs:一个基于SARSA(λ)算法的实例分析
-
We discuss the reinforcement learning-based optimization methods of Markov decision processes ( MDPs ) using the Markov performance potentials .
在Markov性能势基础上讨论了一种基于强化学习的马尔可夫决策过程(MDP)优化方法。
-
On-Policy Modeless Reinforcement Learning Algorithms for Average - Payoff MDPs
平均奖赏MDP的在策略无模型激励学习算法
-
The concept of Markov performance potentials , which is introduced by Cao , offers a new framework and approach for the optimization of MDPs .
Markov性能势理论的提出,为MDP的优化提供了一种新的理论框架和途径。
-
This article based on the ideas and methods of GIS , connecting with features of mining area disaster designed the Mining area Disaster Production System ( MDPS ) .
根据地理信息系统的基本思想,结合矿区灾害的特点设计了矿区灾害防治系统(MDPS)。
-
Motivated by the need of practical large-scale Markov systems , we considered in this paper the learning optimization problems for Markov decision processes ( MDPs ) .
为适应实际大规模Markov系统的需要,讨论Markov决策过程(MDP)基于仿真的学习优化问题。
-
Through these solutions we can integrate theory implementation of message transmission based on MDPs , to make it be a safe , stable , maintainable message transmission system .
通过对基于MDPs的消息传递的理论实现进行整合,使之成为安全可靠、性能稳定、可维护的消息传递系统。
-
The Effect of Synthetic Muramyl Dipeptide ( MDP ) and Its Derivatives ( MDPs ) on Immune Function in Mice
胞壁酰二肽及其衍生物对小鼠免疫功能的影响
-
Many sequential decision problems , such as flexible manufacturing systems , traffic command systems and queuing systems etc. , can be modeled as Markov decision processes ( MDPs ) .
实际生活中的许多序贯决策问题,如柔性制造系统、交通指挥系统、排队系统等,都可以模型化为Markov决策过程(MDP)。
-
This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model .
针对强化学习算法的状态值泛化和随机探索策略在确定性MDP系统控制中存在着学习效率低的问题,本文提出基于模型的层次化强化学习算法。
-
In this paper , a neuro-dynamic programming ( NDP ) method is discussed via an actor-critic algorithm for Markov decision processes ( MDPs ) based on the learning of performance potentials .
研究马尔可大决策过程(MDP)在actor-critic模式下,基于性能势学习的神经元动态规划(NDP)方法。
-
Moderate deviation principles ( MDPs ) are proved for the occupation time process of a super Brownian motion with immigration , where the immigration is governed by the Lebesgue measure .
本文证明了当底空间维数d≥3时,一类带移民超布朗运动占位时过程的中偏差,其移民由Lebesgue测度控制。
-
The extension of reinforcement learning to MDPs with large state , action space and high complexity has inevitably encountered the problem of the curse of dimensionality , which results in slow convergence and long training time .
传统的强化学习算法应用到大状态、动作空间和任务复杂的马尔可夫决策过程问题时,存在收敛速度慢,训练时间长等问题。
-
The problems on discounting reinforcement learning are analyzed . Several experiments have been performed for comparing the influence of different discounting factors on SARSA (λ) algorithm based on MDPs . The role of average reward scalar to undiscounted SARSA (λ) algorithm is also discussed .
分析了折扣激励学习存在的问题,对MDPs的SARSA(λ)算法进行了折扣的比较实验分析,讨论了平均奖赏常量对无折扣SARSA(()算法的影响。