## The On-Line Shortest Path Problem Under Partial Monitoring

** András György, Tamás Linder, Gábor Lugosi, György Ottucsák**; 8(79):2369−2403, 2007.

### Abstract

The on-line shortest path problem is considered under various models
of partial monitoring. Given a weighted directed acyclic graph whose
edge weights can change in an arbitrary (adversarial) way, a decision
maker has to choose in each round of a game a path between two
distinguished vertices such that the loss of the chosen path (defined
as the sum of the weights of its composing edges) be as small as
possible. In a setting generalizing the multi-armed bandit problem,
after choosing a path, the decision maker learns only the weights of
those edges that belong to the chosen path. For this problem, an
algorithm is given whose average cumulative loss in *n* rounds exceeds
that of the best path, matched off-line to the entire sequence of the
edge weights, by a quantity that is proportional to 1/√*n* and
depends only polynomially on the number of edges of the graph. The
algorithm can be implemented with
complexity that is linear in the number of
rounds *n* (i.e., the average complexity per round is constant)
and in the number of edges. An extension to the so-called
label efficient setting is also given, in which the decision maker is
informed about the weights of the edges corresponding to the chosen
path at a total of *m* ≪ *n* time instances. Another extension is
shown where the decision maker competes against a time-varying path, a
generalization of the problem of tracking the best expert. A version
of the multi-armed bandit setting for shortest path is also discussed
where the decision maker learns only the total weight of the chosen path
but not the weights of the individual edges on the path. Applications to
routing in packet switched networks along with simulation results are
also presented.

© JMLR 2007. (edit, beta) |