首页 / 词典 / good

mdps

  • 网络Markov决策过程
mdpsmdps
  1. Parallel Q-learning Algorithms for MDPs Based on Performance Potentials

    一种MDP基于性能势的并行Q学习算法

  2. S (λ): A reinforcement learning algorithm based on average-payoff MDPs

    S(λ):一个基于平均奖赏MDPs的激励学习算法

  3. Discounted and Undiscounted MDPs : a Case Study Based on SARSA (λ) Algorithms

    折扣与无折扣MDPs:一个基于SARSA(λ)算法的实例分析

  4. We discuss the reinforcement learning-based optimization methods of Markov decision processes ( MDPs ) using the Markov performance potentials .

    在Markov性能势基础上讨论了一种基于强化学习的马尔可夫决策过程(MDP)优化方法。

  5. On-Policy Modeless Reinforcement Learning Algorithms for Average - Payoff MDPs

    平均奖赏MDP的在策略无模型激励学习算法

  6. The concept of Markov performance potentials , which is introduced by Cao , offers a new framework and approach for the optimization of MDPs .

    Markov性能势理论的提出,为MDP的优化提供了一种新的理论框架和途径。

  7. This article based on the ideas and methods of GIS , connecting with features of mining area disaster designed the Mining area Disaster Production System ( MDPS ) .

    根据地理信息系统的基本思想,结合矿区灾害的特点设计了矿区灾害防治系统(MDPS)。

  8. Motivated by the need of practical large-scale Markov systems , we considered in this paper the learning optimization problems for Markov decision processes ( MDPs ) .

    为适应实际大规模Markov系统的需要,讨论Markov决策过程(MDP)基于仿真的学习优化问题。

  9. Through these solutions we can integrate theory implementation of message transmission based on MDPs , to make it be a safe , stable , maintainable message transmission system .

    通过对基于MDPs的消息传递的理论实现进行整合,使之成为安全可靠、性能稳定、可维护的消息传递系统。

  10. The Effect of Synthetic Muramyl Dipeptide ( MDP ) and Its Derivatives ( MDPs ) on Immune Function in Mice

    胞壁酰二肽及其衍生物对小鼠免疫功能的影响

  11. Many sequential decision problems , such as flexible manufacturing systems , traffic command systems and queuing systems etc. , can be modeled as Markov decision processes ( MDPs ) .

    实际生活中的许多序贯决策问题,如柔性制造系统、交通指挥系统、排队系统等,都可以模型化为Markov决策过程(MDP)。

  12. This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model .

    针对强化学习算法的状态值泛化和随机探索策略在确定性MDP系统控制中存在着学习效率低的问题,本文提出基于模型的层次化强化学习算法。

  13. In this paper , a neuro-dynamic programming ( NDP ) method is discussed via an actor-critic algorithm for Markov decision processes ( MDPs ) based on the learning of performance potentials .

    研究马尔可大决策过程(MDP)在actor-critic模式下,基于性能势学习的神经元动态规划(NDP)方法。

  14. Moderate deviation principles ( MDPs ) are proved for the occupation time process of a super Brownian motion with immigration , where the immigration is governed by the Lebesgue measure .

    本文证明了当底空间维数d≥3时,一类带移民超布朗运动占位时过程的中偏差,其移民由Lebesgue测度控制。

  15. The extension of reinforcement learning to MDPs with large state , action space and high complexity has inevitably encountered the problem of the curse of dimensionality , which results in slow convergence and long training time .

    传统的强化学习算法应用到大状态、动作空间和任务复杂的马尔可夫决策过程问题时,存在收敛速度慢,训练时间长等问题。

  16. The problems on discounting reinforcement learning are analyzed . Several experiments have been performed for comparing the influence of different discounting factors on SARSA (λ) algorithm based on MDPs . The role of average reward scalar to undiscounted SARSA (λ) algorithm is also discussed .

    分析了折扣激励学习存在的问题,对MDPs的SARSA(λ)算法进行了折扣的比较实验分析,讨论了平均奖赏常量对无折扣SARSA(()算法的影响。