MCTS All Move As First

Background
- Two sets of value:
  1. Standard update from MCTS episodes
  2. AMAF values
- Basic idea:
  - The value of a move is often unrelated to the moves plays elsewhere
    - Section 4.1 from reference [3]
Variants
- Alpha-AMAF: weight between the two sets
- Cuteoff-AMAF: Use AMAF values in the first k iterations
- RAVE: Like alpha-AMAF, but each node has its own alpha value
- Generalized AMAF: Also use AMAF value of a parent node
Discussion
- In Hearthstone, the value of a move is STRONGLY DEPENDS on the board state
  - E.g., If there are many enemy minions, a strong AOE is a very good move.
- We can use (state, move) to re-use the previous playout result
  - In previous playouts, we knew the move A1 is good in state S1
  - So, if current state is S1, we'd like to play A1 more likely
    - Especially when the playouts of this node is not much enough
- However, the granularity of the state we just mentioned should be carefully defined.
  - If the granularity contains too many details, then rarely the information can be used
  - If the granularity contains too less details, then it might be mis-leading
- Maybe we can use a policy network to guide the MCTS selection phase
References
1. All-Moves-As-First Heuristics in Monte-Carlo Go
2. Generalized Rapid Action Value Estimation
3. Monte-Carlo Tree Search and Rapid Action Value Estimation in Computer Go

Provide feedback