Accurate max-margin training for structured output spaces
2008
https://doi.org/10.1145/1390156.1390268Abstract
proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be more accurate and better-behaved. We present an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling.
References (18)
- Bordes, A., Bottou, L., Gallinari, P., & Weston, J. (2007). Solving multiclass support vector machines with larank. ICML (pp. 89-96).
- Bottou, L., & Bousquet, O. (2008). The tradeoffs of large scale learning. NIPS.
- Boykov, Y., Veksler, O., & Zabih, R. (2001). Fast approx- imate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell., 23, 1222-1239.
- Crammer, K., & Singer, Y. (2003). Ultraconservative on- line algorithms for multiclass problems. J. Mach. Learn. Res., 3, 951-991.
- Joachims, T. (2006). Training linear SVMs in linear time. KDD.
- Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. (1999). An introduction to variational methods for graphical models. In M. I. Jordan (Ed.), Learning in graphical models. MIT Press.
- Lafferty, J., McCallum, A., & Pereira, F. (2001). Con- ditional random fields: Probabilistic models for seg- menting and labeling sequence data. Proceedings of the International Conference on Machine Learning (ICML- 2001). Williams, MA.
- LeCun, Y., Chopra, S., Hadsell, R., Marc'Aurelio, R., & Huang, F. (2006). A tutorial on energy-based learning. Predicting Structured Data. MIT Press.
- McCallum, A., Nigam, K., Reed, J., Rennie, J., & Sey- more, K. (2000). Cora: Computer science research paper search engine. http://cora.whizbang.com/.
- McDonald, R., Crammer, K., & Pereira, F. (2005a). Flex- ible text segmentation with structured multilabel classi- fication. HLT/EMNLP.
- McDonald, R., Crammer, K., & Pereira, F. (2005b). Online large-margin training of dependency parsers. ACL (pp. 91-98).
- Peng, F., & McCallum, A. (2004). Accurate information extraction from research papers using conditional ran- dom fields. HLT-NAACL (pp. 329-336).
- Ratliff, N., Bagnell, J., & Zinkevich, M. (2007). (online) subgradient methods for structured prediction. AIStats.
- Sarawagi, S., & Cohen, W. W. (2004). Semi-markov con- ditional random fields for information extraction. NIPS.
- Taskar, B. (2004). Learning structured prediction models: A large margin approach. Doctoral dissertation, Stan- ford University.
- Taskar, B., Klein, D., Collins, M., Koller, D., & Manning, C. (2004). Max-margin parsing. EMNLP.
- Taskar, B., Lacoste-Julien, S., & Jordan, M. I. (2006). Structured prediction, dual extragradient and bregman projections. J. Mach. Learn. Res., 7, 1627-1653.
- Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research (JMLR), 6(Sep), 1453-1484.