jellekok.nl

Multiagent Q-learning by context-specific coordination graphs

Jelle R. Kok and Nikos Vlassis. Multiagent Q-learning by context-specific coordination graphs. In Proceedings of the International Conference on Intelligent Autonomous Systems, pp. 317–324, IOS Press, Amsterdam, The Netherlands, March 2004.

Download

pdf [119.6kB]  ps.gz [82.5kB]  

Abstract

One of the main problems in cooperative multiagent learning is that the joint action space is exponential in the number of agents. In this paper, we investigate a sparse representation of the joint action space in which value rules specify the coordination dependencies between the different agents for a particular state. Each value rule has an associated payoff which is part of the global Q-function. We will discuss a Q-learning method that updates these context-specific rules based on the optimal joint action found with the coordination graph algorithm. We apply our method to the pursuit domain and compare it with other multiagent reinforcement learning methods.

BibTeX Entry

@InProceedings{Kok04ias,
  author =       {Jelle R. Kok and Nikos Vlassis},
  title =        {Multiagent {Q}-learning by context-specific
                  coordination graphs},
  address =      {Amsterdam, The Netherlands},
  booktitle =    {Proceedings of the International Conference on
                  Intelligent Autonomous Systems},
  year =         {2004},
  pages =        {317-324},
  editor =       {Frans Groen and Nancy Amato and Andrea Bonarini and
                  Eiichi Yoshida and Ben Kr\"ose},
  publisher =    {IOS Press},
  month =        mar,
  postscript =   {2004/Kok04ias.ps.gz},
  pdf =          {2004/Kok04ias.pdf},
  abstract =     { One of the main problems in cooperative multiagent
                  learning is that the joint action space is
                  exponential in the number of agents. In this paper,
                  we investigate a sparse representation of the joint
                  action space in which value rules specify the
                  coordination dependencies between the different
                  agents for a particular state. Each value rule has
                  an associated payoff which is part of the global
                  Q-function. We will discuss a Q-learning method that
                  updates these context-specific rules based on the
                  optimal joint action found with the coordination
                  graph algorithm. We apply our method to the pursuit
                  domain and compare it with other multiagent
                  reinforcement learning methods. }
}

Generated by bib2html.pl (written by Patrick Riley) on Tue Oct 31, 2006 19:33:42 UTC