Skip to content

MCTS All Move As First

Peter Shih edited this page Jun 20, 2017 · 4 revisions
  • Background
    • Two sets of value:
      1. Standard update from MCTS episodes
      2. AMAF values
    • Basic idea:
      • The value of a move is often unrelated to the moves plays elsewhere
        • Section 4.1 from reference [3]
  • Variants
    • Alpha-AMAF: weight between the two sets
    • Cuteoff-AMAF: Use AMAF values in the first k iterations
    • RAVE: Like alpha-AMAF, but each node has its own alpha value
    • Generalized AMAF: Also use AMAF value of a parent node
  • Discussion
    • In Hearthstone, the value of a move is STRONGLY DEPENDS on the board state
      • E.g., If there are many enemy minions, a strong AOE is a very good move.
    • We can use (state, move) to re-use the previous playout result
      • In previous playouts, we knew the move A1 is good in state S1
      • So, if current state is S1, we'd like to play A1 more likely
        • Especially when the playouts of this node is not much enough
    • However, the granularity of the state we just mentioned should be carefully defined.
      • If the granularity contains too many details, then rarely the information can be used
      • If the granularity contains too less details, then it might be mis-leading
    • Maybe we can use a policy network to guide the MCTS selection phase
  • References
    1. All-Moves-As-First Heuristics in Monte-Carlo Go
    2. Generalized Rapid Action Value Estimation
    3. Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go
Clone this wiki locally