Title:

Chained Kullback-Leibler Divergences

Abstract:

We define and characterize the "chained" Kullback-Leibler divergence min_w D(p||w)+D(w||q) minimized over all intermediate distributions w and the analogous k-fold chained K-L divergence min D(p||w1)+D(w1||w2)+...+D(wk||q) minimized over the entire path (w1, ... ,wk). This quantity arises in a large deviations analysis of a Markov chain on the set of types – the Wright-Fisher model of neutral genetic drift: a population with allele distribution q produces offspring with allele distribution w, which then produce offspring with allele distribution p, and so on. The chained divergences enjoy some of the same properties as the K-L divergence (like joint convexity in the arguments) and appear in k-step versions of some of the same settings as the K-L divergence (like information projections and a conditional limit theorem). We further characterize the optimal k-step "path" of distributions appearing in the definition and apply our findings in a large deviations analysis of the Wright-Fisher process. We make a connection to information geometry via the previously studied continuum limit, where the number of steps tends to infinity, and the limiting path is a geodesic in the Fisher information metric. Finally, we offer a thermodynamic interpretation of the chained divergence (as the rate of operation of an appropriately defined Maxwell’s demon) and we state some natural extensions and applications (a k-step mutual information and k-step maximum likelihood inference). We release code for computing the objects we study.