forked from nrox/q-learning.js
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtest2.html
70 lines (68 loc) · 2.59 KB
/
test2.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
<!DOCTYPE html>
<html>
<head>
<title>q-learning.js test 2</title>
<meta name="description" content="Q-learning algorithm implementation and example in Javascript">
<meta name="keywords" content="Q-Learning, javascript, game">
<style>
body {
font-family: monospace;
font-size: 120%;
padding: 5%;
}
canvas {
border: 2px solid black;
width: 300px;
margin: 1% auto 1% auto;
display: block;
}
#buttons {
width: 300px;
margin: auto;
display: block;
}
#buttons * {
width: 25%;
}
#score {
text-align: right;
}
</style>
<script src="q-learning.js"></script>
</head>
<body>
<p>
<a href="https://github.com/nrox/q-learning.js">GitHub repo</a>
</p>
<p>
This is a practical implementation of the artificial intelligence algorithm Q-Learning. The tutorial
<a href="http://mnemstudio.org/path-finding-q-learning-tutorial.htm">A Painless Q-Learning Tutorial</a> is a very nice introduction.
The Q-Learning maximizes the expected reward for an action.
</p>
<canvas id="canvas" width="300px" height="300px"></canvas>
<div id="buttons">
<button onclick="slow()">slow</button>
<button onclick="fast()">fast</button>
<span id="score"></span>
</div>
<p>
The black circle represent an agent. A green circle represent food (+1) and a gray represent poison (-1). Food and poison are inserted with the same probability.
The agent can move left, right or stay. The states are string representations of the objects in the 3x3 square immediately in front of the agent,
that's what he sees. There are therefore 3^9=19683 possible states.
</p>
<p>
In some states, exploration of new actions are done with a probability of 10%, if the outcomes of that action is not known.
</p>
<p>
We are using online learning, where training and using the algorithm are done simultaneously, i.e. we don't need to collect
considerable amounts of data before the algorithm is trained. A consequence is that in early stages the performance is poor.
In the beginning the outcome should be something like a random walk and as time goes by, the performance should be considerably better: the agent not only avoids poison,
but catch more food.
</p>
<p>
Press fast to speed up. To understand this example read the code in test2.js. Simple changes to this code can produce agents
which battle, chase others, run away or follow paths.
</p>
<script src="test2.js"></script>
</body>
</html>