Reinfocement Learning for Routing in Ad Hoc Networks
Petteri Nurmi

Abstract
We show how routing in ad hoc networks can be modeled as a sequential decision making problem
with incomplete information. More precisely, we show how to map routing into a reinforcement learning
problem involving a partially observable Markov decision process, and present an algorithm for optimizing
the performance of the nodes in this model. We also briefly present simulation results with our model.
In this extended abstract, we focus on a special case of the model where (i) the nodes optimize only the
next link along the routing path and (ii) the decisions of the nodes depend on their selfishness and on
their remaining energy. In the full version of the paper we generalize the model to the case in which the
nodes optimize the full routing path, and give conditions under which optimizing the next hop alone is
sufficient. We also present the model in a more general form where the decisions of nodes can depend
on an arbitrary amount of parameters. The full version also studies what happens when the Markov chain
describing the decision process has a stationary distribution. Finally, the experimental section in the full
paper covers a more thorough simulation analysis of the model.