Quách Luyl Đa , Phan Trọng Nghĩa , Trần Thanh Hùng Nguyễn Chí Ngôn *

* Tác giả liên hệ (ncngon@ctu.edu.vn)

Abstract

Artificial intelligence (AI) is often used in the classification of images. In this study, AI algorithms have been used in combining with SURF features, K-mean clustering on a 6-class shrimp disease dataset. In order to find the most appropriate model for image classification of shrimp diseases, the study has been tested on four AI models including Multinomial Logistic  Regression, Naïve Bayes, K Nearest Neighbors, and Random Forest. Criteria for evaluating the accuracy of these models include Precision, Recall and F1. Testing results when applying with initial feature dataset show a low accuracy that the best model is Random Forest algorithm, with Recall evaluation criterion of 47.7%. The study has been continued to conduct random combinations of 4 clusters classified by K-mean algorithm, the results indicate that the Random Forest model can get highest accuracy of 85.9% by Recall criteria.

Keywords: Naïve Bayes, SURF, K nearest neighbors, multinomial logistic regression, random forest, shrimp diseases

Tóm tắt

Trí tuệ nhân tạo thường được dùng trong việc phân loại hình ảnh. Trong nghiên cứu này, các giải thuật trí tuệ nhân tạo được sử dụng kết hợp với các đặc trưng SURF, phân cụm dữ liệu với K-mean trên bộ dữ liệu bệnh tôm 6 lớp. Nhằm tìm kiếm giải thuật thích hợp nhất trong việc phân loại bệnh tôm qua hình ảnh, nghiên cứu đã tiến hành kiểm thử trên 4 giải thuật trí tuệ nhân tạo, gồm: giải thuật hồi qui logic, Naïve Bayes, K láng giềng gần nhất và rừng ngẫu nhiên. Tiêu chí đánh giá độ chính xác của các giải thuật này gồm precision, recall và F1. Kết quả thử nghiệm khi áp dụng trên các tập đặc trưng cho thấy đạt tỷ lệ thấp, độ chính xác cao nhất là giải thuật rừng ngẫu nhiên với tiêu chí đánh giá recall là 47,7%. Nghiên cứu tiếp tục tiến hành kết hợp ngẫu nhiên của 4 cụm được phân loại bởi giải thuật K-mean, kết quả thu được với độ chính xác cao nhất theo tiêu chí recall cho giải thuật rừng ngẫu nhiên là 85,9%.

Từ khóa: K láng giềng gần nhất, hồi qui tuyến tính đa thức, Naïve Bayes, rừng ngẫu nhiên, bệnh tôm, SURF

Article Details

Tài liệu tham khảo

Al-Sharafat, W.S. & Reyadh Naoum (2009). Development of Genetic-based Machine Learning for Network Intrusion Detection. Inter. J. of Computer and Information Engineering, 3(7), 1677-1681. DOI: 10.5281/zenodo.10.5281/zenodo.1060305

Bao, T.Q., Cuong, T.C., Tu, N.D. & Hieu, L.T. (2019). Designing the Yellow Head Virus Syndrome Recognition Application for Shrimp on an Embedded System. Exchanges: The Interdisciplinary Research Journal, 6(2), 48-63. DOI: https://doi.org/10.31273/eirj.v6i2.309

Bay H., Tuytelaars T. & Van Gool L. (2006). SURF: Speeded Up Robust Features. In: Leonardis A., Bischof H., Pinz A. (eds) Computer Vision – ECCV 2006, Lecture Notes in Computer Science, vol 3951. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744023_32

Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324

Cát Tường (2019). Vietnam shrimp exports started to reverse, website of the Directorate of Fisheries, Ministry of Agriculture and Rural Development, issued 22-Aug-2019.

Cawley, G. C., Talbot, N. L. C. & Girolami, M. (2007). Sparse multinomial logistic regression via Bayesian L1 regularisation. In B. Schölkopf, J. Platt, & T. Hofmann (Eds.), Advances in Neural Information Processing Systems, vol. 19 (pp. 209-216). MIT Press.

Duong-Trung, Nghia, Luyl-Da Quach & Chi-Ngon Nguyen (2019). Learning deep transferability for several agricultural classification problems. Inter. J. of Advanced Computer Science and Applications, 10(1), 58 – 67. http://dx.doi.org/10.14569/IJACSA.2019.0100107

Durand, S., Lightner, D. V., Redman, R. M. & Bonami, J. R. (1997). Ultrastructure and morphogenesis of white spot syndrome baculovirus (WSSV). Diseases of Aquatic Organisms, 29(3), 205-211.

Ghasemi-Varnamkhasti, M., Goli, R., Forina, M., Mohtasebi, S.S., Shafiee, S. & Naderi-Boldaji, M. (2016). Application of image analysis combined with computational expert approaches for shrimp freshness evaluation. International Journal of Food Properties, 19(10), 2202-2222. DOI: 10.1080/10942912.2015.1118386

Goldberger, J., Hinton, G. E., Roweis, S. T. & Salakhutdinov, R. R. (2004). Neighbourhood components analysis. 17th Inter. Conf. on Neural Information Processing Systems, December 2004 (pp. 513-520). DOI: 10.5555/2976040.2976105

Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. (2016). Deep Learning. Cambridge: MIT press, 800 pages.

Hastie, T., Tibshirani, R. & Friedman, J.H., 2009. The elements of statistical learning: data mining, Inference and Prediction, 2nd edn. Springer, New York, USA, 533 pages.

Likas, A., Vlassis, N. and Verbeek, J.J. (2003). The global k-means clustering algorithm. Pattern recognition, 36(2), 451-461. DOI: 10.1016/S0031-3203(02)00060-2

Liu, Z., Cheng, F. & Zhang, W. (2016). Identification of soft shell shrimp based on deep learning. In 2016 ASABE Annual International Meeting, 162455470, American Society of Agricultural and Biological Engineers. DOI:10.13031/aim.20162455470

Lu, D. & Weng, Q. (2007). A survey of image classification methods and techniques for improving classification performance. Inter. J. of Remote sensing, 28(5), 823-870. DOI: 10.1080/01431160600746456.

MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. Fifth Symposium on Math, Statistics, and Probability. Berkeley, CA, University of California Press: 281–297.

Nguyen, T. B. T. (2015). Good Aquaculture Practices (VietGAP) and Sustainable Aquaculture Development in Viet Nam. In Romana-Eguia et.al. (2015), Resource enhancement and sustainable aquaculture practices in Southeast Asia: challenges in responsible production of aquatic species: proceedings of the international workshop on resource enhancement and sustainable aquaculture practices in Southeast Asia 2014 (pp. 85-92). Aquaculture Department, Southeast Asian Fisheries Development Center.

Nguyễn Chí Ngôn, Dương Trung Nghĩa & Quách Luyl Đa (2019). Thu thập dữ liệu tôm bệnh/ Truy cập 11/08/2020. https://sites.google.com/view/shrimp-image-collection/home

Okpala, C.O.R., Choo, W.S. & Dykes, G.A. (2014). Quality and shelf life assessment of Pacific white shrimp (Litopenaeus vannamei) freshly harvested and stored on ice. LWT-Food Science and Technology, 55(1), 110-116. DOI: 10.1016/j.lwt.2013.07.020

McCullagh, P., & Nelder, J. A. (1989). Generalized linear models. Monographs on Statistics and Applied Probability, 37, Chapman & Hall/CRC, 2nd edition, 532 pages. ISBN: 9780412317606.

Pongthanapanich, T., Nguyen, K. A. T., & Jolly, C. M. (2019). Risk management practices of small intensive shrimp farmers in the Mekong Delta of Viet Nam. FAO Fisheries and Aquaculture Circular, (C1194), I-20.

Powers, David Martin (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Inter. J. of Machine Learning Technology, 2(1), 37-63.

Roell, Y. E., Beucher, A., Møller, P. G., Greve, M. B., & Greve, M. H. (2020). Comparing a Random-Forest-Based Prediction of Winter Wheat Yield to Historical Yield Potential. Agronomy10(3), 395.

Zahraee, S.M., Assadi, M.K. & Saidur, R. (2016). Application of artificial intelligence methods for hybrid energy system optimization. Renewable and sustainable energy reviews, 66, 617-630. DOI: 10.1016/j.rser.2016.08.028.