6Again, strictly speaking, the action-values depend on the policy of the agent, and sometimes the term is used to mean the action-values for the optimal policy. The terminology is further confounded by the fact that sometimes action-values can refer to the current estimates of the agent for those action-value instead of their true values (the same holds for state-values as well). Note that action-values are often called “Q-values”.