Reinfocement Learning for Routing in Ad Hoc Networks Petteri Nurmi Abstract We show how routing in ad hoc networks can be modeled as a sequential decision making problem with incomplete information. More precisely, we show how to map routing into a reinforcement learning problem involving a partially observable Markov decision process, and present an algorithm for optimizing the performance of the nodes in this model. We also briefly present simulation results with our model. In this extended abstract, we focus on a special case of the model where (i) the nodes optimize only the next link along the routing path and (ii) the decisions of the nodes depend on their selfishness and on their remaining energy. In the full version of the paper we generalize the model to the case in which the nodes optimize the full routing path, and give conditions under which optimizing the next hop alone is sufficient. We also present the model in a more general form where the decisions of nodes can depend on an arbitrary amount of parameters. The full version also studies what happens when the Markov chain describing the decision process has a stationary distribution. Finally, the experimental section in the full paper covers a more thorough simulation analysis of the model.