This project utilizes Dynamic Programming to solve the Finite-Time Optimal Control problem to provide an optimal control policy for an agent to navigate its way through an environment to reach the goal point.
This was implemented in Python using NumPy and Gymnasium. The code has been redacted, if you wish to see it, you may contact me at charles.lychee@gmail.com
We describe our "Known" environments as problems where we know our exact environment prior to computing our optimal control policy for each environment. e.g. 3 environments get 3 different control policies.
We describe our "Random" environments as problems where we do not know our exact environment, but are given a basic structure of the environment. e.g. 2 doors with same coordinate across all random environments.
We compute one control policy that is generalized to the basic structure of our random environment. e.g. 3 environments get 1 control policy.
The planning horizon is chosen to be the total number of states:
For the purposes of making the upcoming definitions easier, we define the state of the agent taking a step in front of it as
In the Random Map problem, we need to expand our state space to include the 2 doors' states, the position of the key, and the position of goal
We define our control policy as a function that maps a state to a control input at time
We define our value function as a function that returns the long-term cost of starting at a given state at time
We want to solve the optimal control policy given by the following:
To solve this, we can use the Dynamic Programming algorithm.
There are a total of
Dynamic Programming solves for the optimal closed loop control policy in