Academia.eduAcademia.edu

Outline

Large-margin structured prediction via linear programming

2009, International Conference on Artificial Intelligence and Statistics

Abstract

This paper presents a novel learning algorithm for structured classification, where the task is to predict multiple and interacting labels (multilabel) for an input object. The problem of finding a large-margin separation between correct multilabels and incorrect ones is formulated as a linear program. Instead of explicitly writing out the entire problem with an exponentially large constraint set, the linear program is solved iteratively via column generation. In this case, the process of generating most violated constraints is equivalent to searching for highest-scored misclassified incorrect multilabels, which can be easily achieved by decoding the structure based on current estimations. In addition, we also explore the integration of column generation and an extragradient method for linear programming to gain further efficiency. The proposed method has the advantages that it can handle arbitrary structures and larger-scale problems. Experimental results on part-of-speech tagging and statistical machine translation tasks are reported, demonstrating the competitiveness of our approach.

References (20)

  1. Y. Altun, I. Tsochantaridis, and T. Hofmann (2003). Hidden markov support vector machines. In ICML.
  2. P. L. Bartlett, M. Collins, B. Taskar, and D. A. McAllester (2004) Exponentiated gradient algorithms for large-margin structured classi- fication. In NIPS. Longer version available at: http://www.stat.berkeley.edu/˜bartlett/papers/bcmt- lmmsc-04.pdf
  3. M. Collins (2002) Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In EMNLP.
  4. K. Crammer and Y. Singer (2001). On the algorith- mic implementation of multiclass kernel-based vec- tor machines. Journal of Machine Learning Re- search, 2.
  5. K. Crammer, R. McDonald, and F. Pereira (2005). Scalable large-margin online learning for structured classification. Technical report, University of Penn- sylvania.
  6. N. Cristianini and J. Shawe-Taylor (2000). An In- troduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge Univer- sity Press.
  7. A. Demiriz, K.P. Bennett and J. Shawe-Taylor (2002). Linear programming boosting via column genera- tion. Machine Learning 46(1-3).
  8. P. Koehn et al. (2007). Moses: open source toolkit for statistical machine translation. In ACL 2007 Demo and Poster Sessions.
  9. G. Korpelevich (1976). The extragrdient method for finding saddle points and other problems. Ekonomika i Matematicheskie Metody, 12:747-756.
  10. J.D. Lafferty, A. McCallum, and F.C.N. Pereira (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.
  11. P. Liang, A. Bouchard-Côté, D. Klein, and B. Taskar (2006). An end-to-end discriminative approach to machine translation. In COLING/ACL.
  12. F.J. Och (2003). Minimum error rate training in sta- tistical machine translation. In ACL.
  13. S. Riezler and J.T. Maxwell III (2005). On Some Pit- falls in Automatic Evaluation and Significance Test- ing for MT. In ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Tranl- sation and/or Summarization.
  14. B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, and R.C. Williamson (2001). Estimating the sup- port of a high-dimensional distribution. Neural Computation, 13(7).
  15. L. Smith, T. Rindflesch, and W.J. Wilbur (2004). Medpost: a part-of-speech tagger for biomedical text. Bioinformatics, 20(14):2320-2321.
  16. B. Taskar, C. Guestrin, and D. Koller (2003). Max- margin markov networks. In NIPS.
  17. B. Taskar, S. Lacoste-Julien, and M. I. Jordan (2006). Structured prediction, dual extragradient and breg- man projections. Journal of Machine Learning Re- search, 7.
  18. C. Tillmann and T. Zhang (2006). A discriminative global training algorithm for statistical MT. In COLING/ACL.
  19. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun (2005). Large margin methods for struc- tured and interdependent output variables. Journal of Machine Learning Research, 6.
  20. T. Zhang (2002). Covering number bounds of certain regularized linear function classes. Journal of Ma- chine Learning Research, 2.