4Next, I give a more rigorous and general definition of state-values. To begin with, it must be noted that the state-value is a function of the “policy” used by the agent; the policy is what I call the action selection system in the text, i.e., a system deciding which action is selected (and with which probability) in any given state. Further, we have to take into account the fact that the world may have some randomness in it, so we have to consider expected reward. The value function at a given state is then generally defined as the expected amount of discounted reward that the agent will obtain starting from that state, when it follows that policy. (Sometimes, when speaking about the state-value, it is more specifically assumed that the policy in question is the optimal policy which gives the highest expected reward.) This definition simplifies to the definition we just gave for the case of a single goal in a deterministic world, where the state-value is a decreasing function of the distance to the goal. The connection can be seen by defining that there is a reward at the goal and nowhere else, and using the fact that there is discounting, and thus rewards in the distant future are given less weight than rewards in the near future. Then, the closer you are to the goal, the larger the expected reward is, because the reward at the goal is given more weight when you are closer to the goal. (I define here “closer” to mean that you can get there more quickly compared to the situation where you are further away and need time to get there). While this standard definition in the literature, as just given, considers the reward uncertain and talks about expected (discounted) reward, I will not usually do that in this chapter for simplicity: I assume the world is largely deterministic.