A Generalized Learning Approach to Deep Neural Networks
2024, Journal of Telecommunications and Information Technology
https://doi.org/10.26636/JTIT.2024.3.1454Abstract
Optimization of machine learning architectures is essential in determining the efficacy and the applicability of any neural architecture to real world problems. In this work a generalized Newton's method (GNM) is presented as a powerful approach to learning in deep neural networks (DNN). This technique was compared to two popular approaches, namely the stochastic gradient descent (SGD) and the Adam algorithm, in two popular classification tasks. The performance of the proposed approach confirmed it as an attractive alternative to state-of-the-art first order solutions. Due to the good results presented in the case of shallow DNN, in the last part of the article an hybrid optimization method is presented. This method consists in combining two optimization algorithms, i.e. GNM and Adam or GNM and SGD, during the training phase within the layers of the neural network. This configuration aims to benefit from the strengths of both firstand second-order algorithms. In this case a convolutional neural network is considered and its parameters are updated with a different optimization algorithm. Also in this case, the hybrid approach returns the best performance with respect to the first order algorithms.
References (29)
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning", Nature, vol. 521, pp. 436-444, 2015 (https://doi.org/10.1038/natu re14539).
- I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, MIT Press, 2016 (http://www.deeplearningbook.org).
- Y. Bengio, Y. LeCun, and G. Hinton, "Deep Learning for AI", Com- munications of the ACM, vol. 64, no. 7, pp. 58-65, 2021 (https: //doi.org/10.1145/3448250).
- C.M. Bishop, Neural Networks for Pattern Recognition, Oxford Uni- versity Press, 502 p., 1996 (ISBN: 9780198538646).
- R.D. Reed and R.J. Marks, Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks, MIT Press, 1999 (https: //doi.org/10.7551/mitpress/4937.001.0001).
- R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "Fast Adaptive Digital Equalization by Recurrent Neural Networks", IEEE Trans- actions on Signal Processing, vol. 45, no. 11, pp. 2731-2739, 1997 (https://doi.org/10.1109/78.650099).
- R. Battiti, "First-and Second-order Methods for Learning: Between Steepest Descent and Newton's Method", Neural Computation, vol. 4, no. 2, pp. 141-166, 1992 (https://doi.org/10.1162/neco.1 992.4.2.141).
- L. Bottou, F.E. Curtis, and J. Nocedal, "Optimization Methods for Large-scale Machine Learning", SIAM Review, vol. 60, no. 2, pp. 223-311, 2018 (https://doi.org/10.1137/16M1080173).
- J. Nocedal and S.J. Wright, Numerical Optimization, Springer, 664 p., 2006 (https://doi.org/10.1007/978-0-387-40065-5).
- A.S. Berahas, M. Jahani, P. Richtárik, and M. Takác, "Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample", arXiv, 2019 (https://doi.org/10.48550/ARXIV.1901.09997).
- D. Goldfarb, Y. Ren, and A. Bahamou, "Practical Quasi-Newton Methods for Training Deep Neural Networks", arXiv, 2020 (https: //doi.org/10.48550/arXiv.2006.08877).
- A.S. Berahas, R. Bollapragada, and J. Nocedal, "An Investigation of Newton-Sketch and Subsampled Newton Methods", Optimization Methods and Software, vol. 35, no. 4, pp. 661-680, 2020 (https: //doi.org/10.1080/10556788.2020.1725751).
- A.S. Berahas and M. Takác, "A Robust Multi-batch L-BFGS Method for Machine Learning", Optimization Methods and Software, vol. 35, no. 1, pp. 191-219, 2020 (https://doi.org/10.1080/1055678
- 2019.1658107).
- J.E. Dennis, Jr. and J.J. Moré, "Quasi-Newton Methods, Motivation and Theory", SIAM Review, vol. 19, no. 1, pp. 46-89, 1977 (https: //doi.org/10.1137/1019005).
- Z. Yao et al., "ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning", arXiv, 2020 (https://doi.org/10.48550 /ARXIV.2006.00719).
- R. Anil et al., "Scalable Second Order Optimization for Deep Learn- ing", arXiv, 2020 (https://doi.org/10.48550/ARXIV.2002.0 9018).
- J.D. Lee et al., "First-order Methods Almost Always Avoid Saddle Points", arXiv, 2017 (https://doi.org/10.48550/ARXIV.1710 .07406).
- R. Parisi, E.D. Di Claudio, G. Orlandi, and B.D. Rao, "A Generalized Learning Paradigm Exploiting the Structure of Feedforward Neural Networks", IEEE Transactions on Neural Networks, vol. 7, no. 6, pp. 1450-1460, 1996 (https://doi.org/10.1109/72.548172).
- S. Ruder, "An Overview of Gradient Descent Optimization Algo- rithms", arXiv, 2016 (https://doi.org/10.48550/ARXIV.1609 .04747).
- H. Robbins and S. Monro, "A Stochastic Approximation Method", The Annals of Mathematical Statistics, vol. 22, no. 3, pp. 400-407, 1951 (https://doi.org/10.1214/aoms/1177729586).
- R. Rojas, Neural Networks. A Systematic Introduction, Springer, 504 p., 2006 (https://doi.org/10.1007/978-3-642-61068-4).
- D.E. Rumelhart and J.L. McClelland, "Learning Internal Represen- tations by Error Propagation", in: Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, MIT Press, pp. 318-362, 1987 (ISBN: 9780262291408).
- D.P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimiza- tion", arXiv, 2014 (https://doi.org/10.48550/ARXIV.1412. 6980).
- J.D. Lee et al., "Basic Classification: Classify Images of Cloth- ing" (https://www.tensorflow.org/tutorials/keras/cla ssification).
- Y. LeCun, C. Cortes, and C.J.C. Burges, The MNIST Database of Handwritten Digits, 2012 (http://yann.lecun.com/exdb /mnist/).
- M. Abadi et al., "TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems", arXiv, 2016 (https://doi. org/10.48550/ARXIV.1603.04467).
- P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-energy Physics with Deep Learning", Nature Communica- tions, vol. 5, art. no. 4308, 2014 (https://doi.org/10.1038/nc omms5308).
- D.-Y. Ge et al., "Design of High Accuracy Detector for MNIST Handwritten Digit Recognition Based on Convolutional Neural Net- work", 2019 12th International Conference on Intelligent Computa- tion Technology and Automation (ICICTA), Xiangtan, China, 2019 (https://doi.org/10.1109/ICICTA49267.2019.00145).