Đỗ Thanh Nghị *

* Tác giả liên hệ (dtnghi@ctu.edu.vn)

Abstract

In this paper, we present the support vector machines algorithm using the stochastic gradient descent for classifying very large datasets. To reach the sparsity in the solution, the support vector machines algorithm uses the hinge loss in classification tasks. Thus, the direct optimization using the stochastic gradient descent is difficult due to the differentiation of the hinge loss. Our proposal is to substitute the hinge loss used in the problem formula of the support vector machines algorithm by the smooth ones to improve the convergence rate of the stochastic gradient descent. The numerical test results on two large textual datasets (RCV1, twitter) show that our approach is more efficient than the usual hinge loss.
Keywords: Support vector machines (SVM), Stochastic gradient descent (SGD), classifying very large datasets

Tóm tắt

Trong bài viết, chúng tôi trình bày giải thuật giảm gradient ngẫu nhiên sử dụng trong máy học véc-tơ hỗ trợ cho phân lớp dữ liệu lớn. Máy học véc-tơ hỗ trợ sử dụng hàm hinge loss trong phân lớp nhằm đạt được tính chất thưa trong lời giải. Tuy nhiên, do hàm hinge loss không khả vi là nguyên nhân làm chậm hội tụ đến lời giải khi áp dụng giải thuật giảm gradient ngẫu nhiên. Chúng tôi nghiên cứu thay thế hàm hinge loss được sử dụng trong vấn đề tối ưu của giải thuật máy học véc-tơ hỗ trợ bằng các hàm xấp xỉ, khả vi nhằm cải tiến tốc độ hội tụ của giải thuật giảm gradient ngẫu nhiên. Kết quả thực nghiệm trên 2 tập dữ liệu văn bản lớn (RCV1, twitter) cho thấy hiệu quả của đề xuất sử dụng hàm xấp xỉ so với hàm hinge loss.  
Từ khóa: Máy học véc-tơ hỗ trợ (SVM), giảm gradient ngẫu nhiên (SGD), phân lớp dữ liệu lớn

Article Details

Tài liệu tham khảo

Boser, B., Guyon, I., Vapnik, V., “An training algorithm for optimal margin classifiers”, In proceedings of 5th ACM Annual Workshop on Computational Learning Theory, pp.144-152, 1992.

Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems (20):161-168 (2008).

Breiman, L., “Arcing classifiers”, The annals of statistics, vol. 26, no. 3, pp.801-849, 1998.

Chang, C. C., Lin, C. J., “LIBSVM: a library for support vector machines”, ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 27, pp.1-27, 2011 http://www.csie.ntu.edu.tw/~cjlin/libsvm

Cristianini, N., Shawe-Taylor, J., “An Introduction to Support Vector Machines: And Other Kernel-based Learning Methods”, Cambridge University Press, New York, NY, USA, 2000.

Do, T.N., “Parallel multiclass stochastic gradient descent algorithms for classifying million images with very-high-dimensional signatures into thousands classes”, Vietnam J. Computer Science, vol. 1, no. 2, pp.107-115, 2014.

Do, T. N., Nguyen, V. H., Poulet, F., “Speedup SVM algorithm for massive classification tasks”, In Proceedings of ADMA, pp.147-157, 2008.

Do,T.N., Fekete, J.D., “Large scale classification with support vector machine algorithms. In The Sixth International Conference on Machine Learning and Applications, ICMLA 2007, Cincinnati, Ohio, USA, pp.7-12, 2007.

Do, T.N., Poulet, F., “Classifying one billion data with a new distributed svm algorithm”, In proceedings of 4th IEEE Intl. Conf. on Computer Science, Research, Innovation and Vision for the Future, IEEE Press, pp.59-66, 2006.

Do, T.N., Poulet, F., “Mining very large datasets with svm and visualization”, In proceedings of 7th Intl. Conf. on Entreprise Information Systems, pp.127-134, 2005.

Freund, Y., Schapire, R., “A short introduction to boosting”, Journal of Japanese Society for Artificial Intelligence, vol. 14, no. 5, pp.771-780, 1999

Go, A., Bhayani, R., Huang, L.: Twitter sentiment (May 12th 2014 (accessed date)) http://help.sentiment140.com

Guyon, I., Web page on svm applications, 1999, http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html

Liu H., Syed, N. and K. Sung. Incremental learning with support vector machines. ACM SIGKDD, 1999.

McCallum, A.: Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. 1998. http://www-2.cs.cmu.edu/~mccallum/bow

Mangasarian O.L.: Mathematical Programming for Support Vector Machines. INRIA Rocquencourt, France July 17 (2001).

Osuna, E., Freund, R., Girosi, F., “An improved training algorithm for support vector machines”, Neural Networks for Signal Processing VII, J. Principe, L. Gile, N. Morgan, and E. Wilson Eds, pp.276-285, 1997.

Platt J.: Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report MSR-TR-98-14 (1998).

Poulet, F., Do, T.N., “Mining very large datasets with support vector machine algorithms”, Enterprise Information Systems V, O. Camp, J. Filipe, S. Hammoudi and M. Piattini Eds., pp.177-184, 2004.

Rennie, J.D.M.: Derivation of the f-measure. http://people.csail.mit.edu/jrennie/writing (February 2004).

Shalev-Shwartz, S., Singer, Y., Srebro, N., “Pegasos: Primal estimated sub-gradient solver for svm”, In Proceedings of the Twenty-Fourth International Conference Machine Learning, ACM, pp.807-814, 2007

Suykens, J., Vandewalle, J. “Least squares support vector machines classifiers”, Neural Processing Letters, vol. 9, no. 3, pp.293–300, 1999.

Tong, S., Koller, D., “Support vector machine active learning with applications to text classification”, In proceedings of the 17th Intl. Conf. on Machine Learning, ACM, pp. 999-1006, 2000.

Vapnik, V.: The Nature of Statistical Learning Theory. Springer-Verlag (1995).