Khai thác tập phổ biến từ dữ liệu giao dịch với nhiều ngưỡng phổ biến tối thiểu trên bộ xử lý đa nhân

Phan Thành Huấn; Lê Hoài Bắc

doi:10.22144/ctu.jsi.2017.021

Phan Thành Huấn ^* và Lê Hoài Bắc

* Tác giả liên hệ (pthuan@nomail.com)

Full Text: PDF

Ngày nhận bài: 15-09-2017

Ngày nhận bài sửa: 10-10-2017

Ngày duyệt đăng: 20-10-2017

Ngày xuất bản: 20-10-2017

Title: Mining frequent itemsets in transactional databases with multiple minimum support threshold on multiple-core processors

DOI: 10.22144/ctu.jsi.2017.021

Lượt xem

172

Downloads

67

Trích dẫn

Huấn, P. T., & Bắc, L. H. (2017). Khai thác tập phổ biến từ dữ liệu giao dịch với nhiều ngưỡng phổ biến tối thiểu trên bộ xử lý đa nhân. Tạp chí Khoa học Đại học Cần Thơ, (CĐ Công nghệ TT), 155-163. https://doi.org/10.22144/ctu.jsi.2017.021

Số báo

Số. CĐ Công nghệ TT (2017)

Chuyên mục

Công nghệ thông tin

Abstract

Association rule mining, one of the most important and well-researched techniques of data mining. Mining frequent itemsets are one of the most fundamental problems and most time-consuming in association rule mining. Most of the algorithms in literature used to find frequent itemsets satisfying single minimum support threshold. In practice, frequentcy of each item reflects the nature and role of items in transactional databases. This paper proposes an efficient mining parallel algorithm for frequent itemsets with multiple minimum support thresholds (a different minimum item support for each item) on Multiple-core Processors. Proposed algorithm easily extends on distributed computing systems as Hadoop, Spark. Finally, result experiments presented on both synthetic and real-life datasets show the better proposed algorithm than the existing algorithms.

Keywords: Association rule mining, frequent itemsets, multiple-core processor, multiple minimum support thresholds, parallel algorithm

Tóm tắt

Trong khai thác dữ liệu, kỹ thuật quan trọng và được nghiên cứu nhiều là khai thác luật kết hợp. Khai thác tập phổ biến là một trong những bước cơ bản và chiếm nhiều thời gian trong khai thác luật kết hợp. Hầu hết các thuật toán tìm tập phổ biến thỏa một ngưỡng phổ biến tối thiểu duy nhất. Trong thực tế, độ phổ biến của từng mục hàng phản ánh bản chất, vai trò của mục hàng trong các giao dịch. Trong bài viết này, chúng tôi đề xuất thuật toán song song khai thác hiệu quả tập phổ biến với nhiều ngưỡng phổ biến tối thiểu (mỗi mục hàng có một ngưỡng phổ biến tối thiểu riêng) trên bộ xử lý đa nhân. Thuật toán đề xuất dễ dàng mở rộng trên nhiều hệ thống tính toán phân tán như Hadoop, Spark. Sau cùng, chúng tôi trình bày kết quả thực nghiệm trên bộ dữ liệu thực và giả lập cho thấy thuật toán đề xuất hiệu quả hơn so với thuật toán hiện hành.

Từ khóa: Bộ xử lý đa nhân, luật kết hợp, nhiều ngưỡng phổ biến tối thiều, tập phổ biến, thuật toán song song

Tài liệu tham khảo

Agrawal, R., Imilienski, T., Swami, A., 1993. Mining association rules between sets of large databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC: 207-216.

Agrawal, R., Srikant, R., 1994. Fast algorithms for mining association rules. Proceedings of International Conference on Very Large Data Base, Santiago, Chile: 478-499.

Han, J., Pei, J., Yin, Y., Mao, R., 2004. Mining frequent patterns without candidate generation: A frequent pattern tree approach. Data Mining and Knowledge Discovery, 8(1): 53–87.

Hu, Y.H., Chen, Y.L., 2006. Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism. Decision Support Systems, 42(1): 1–24.

Kiran, R. U., Reddy, P. K., 2011. Novel techniques to reduce search space in multiple minimum supports based frequent pattern mining algorithms. In EDBT: 11–20.

Liu, B., Hsu, W., Ma, Y., 1999. Mining association rules with multiple minimum supports. Proceedings of the ﬁfth ACM SIGKDD International Conference on Knowledge discovery and Data mining:337–341.

Song, W., Yang, B., 2008. Index-BitTableFI: An improved algorithm for mining frequent itemsets. Knowledge-Based Systems 21: 507-513.

Article Sidebar

Abstract

Tóm tắt

Article Details

Tài liệu tham khảo