Home

Welcome to the Reinforcement Learning.github.io wiki! This project summaries all the basic algorithms in reinforcement learning.

Basic of MDPs

Reinforcement learning is another machine learning algorithm in contrast to supervised learning and unsupervised learning. In this problem settings we have an agent and want to achieve some goals. Usually we can not tell the agent how to achieve our goal directly in the environment, in which it i.e. moves, gets some feedbacks after taking this action, for sometimes we do not known the dynamics of this environment or the environment is non-deterministic. These feedbacks are called Reward. We also don't have the training samples directly, all of them is instead generated by the interaction of agent with the environment. It depicts like The agent–environment interaction in reinforcement learning. Normally the environment can be abstracted as it consists of many states. The state, at which the agent terminates, is called absorb or terminal state. At each time step the agent takes an action and gets a reward. When the agent reaches this absorb state, this episodic task is closed. So our problem turns to finding the optimal action sets, which called policy, with it the agent can reach the terminal state and get many rewards as it can. A state signal that succeeds in retaining all relevant information is said to be Markov or it satisfies the Markov Property. Let’s formally define the Markov property by considering how environment might response at time t + 1 to the action taken at time t. In general case this response may depend on what happened in the past.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Basic of MDPs

Clone this wiki locally