Outlier Detection of Transaction Data Using DBSCAN Algorithm

Sunjana; Azizah Zakiah

doi:10.61841/fvzwt261

Authors

Sunjana Computer Science, Faculty of Engineering, Widyatama University Jln. Cikutra 20124 A, Bandung 40125, Indonesia Author
Azizah Zakiah Computer Science, Faculty of Engineering, Widyatama University Jln. Cikutra 20124 A, Bandung 40125, Indonesia Author

DOI:

https://doi.org/10.61841/fvzwt261

Keywords:

Data Mining, Outlier Detection, Euclidean Distance, Clustering, DBSCAN

Abstract

The supermarket is one means of marketing the company's products. Marketing activities undertaken with supermarkets provide a wide range of types of products from different companies (as producers). Consumers prefer to go to the supermarket than traditional markets due to promotions. For example, the products offered were given a discounted half price of the normal price. Consumers tend to buy more of their needs so that existing stock items in the supermarket can be drastically reduced. Therefore, the supermarket had to anticipate in order to not have a shortage of stock in the warehouse. Various techniques in data mining can be used, one of which is outlier detection. The role of an outlier detection is needed in order to detect abnormal transactions, including candidate anomalies and normal transactions, and will help the supermarket in anticipation of running out of stock items. Outlier detection is an outlier search process on a dataset and is one of the first steps to be able to perform analysis of data coherently. The main objective in outlier detection is to detect data with properties/state data with different data, or are most of the anomalies found in multidimensional datasets. One of the formidable algorithms for detecting outliers is DBSCAN. Therefore, in this study, the author will use the technique of outlier detection algorithm with expected DBSCAN to help supermarkets in anticipation of running out of stock items. The result from research that has been done by calculating 1862 products is that there was no product data that was classified as an outlier, whereas by calculating 100 first products, there are 4 product data that were classified as outliers, products with ids 80069449, 80015728, 82024920, and 80021527.

Downloads

Download data is not yet available.

References

[1] Asih, Nur dkk. 2016. Metode Pengclusteran Berbasis Densitas Menggunakan Algoritma DBSCAN. Bandung: Universitas Islam Bandung.

[2] Devi, Ni Made Anindya Santika dkk. 2015. Implementasi Metode Clustering DBSCAN pada Proses Pengambilan Keputusan. Bali: Universitas Udayana.

[3] Fitriany, Indah Ayu. 2017. Anomaly Detection Pada Data Konsumsi Listrik Pelanggan Menggunakan Algoritma Density Based Spasial Clustering Application with Noise, Studi Kasus: PT PLN (persero) Distribusi Jabar Area Purwakarta. Universitas Widyatama.

[4] Handriyadi, Dedy dkk. 2009. Analisis Perbandingan Clustering-Based, Distance-Based, and Density-Based, Dalam Mendeteksi Outlier. Bandung: IT Telkom.

[5] Hussain, H.I., Kamarudin, F., Thaker, H.M.T., & Salem, M.A. (2019), Artificial Neural Network to Model Managerial Timing Decision: Non-Linear Evidence of Deviation from Target Leverage, International Journal of Computational Intelligence Systems (forthcoming).

[6] Jariah, Nur. 2007. Analisis Brand Switching Untuk Memprediksi Market Share Dan Segmentasi Terhadap Jenis Merek Shampoo Dengan Marcov Chain Dan Cluster Analysis Studi Kasus: Toserba Swalayan MITRA Kartasura. Surakarta: Universitas Muhammadiyah.

[7] Jiawei, Han dkk. 2011. Data mining: Concept and Techniques, Third Edition USA: Elsevier Inc

[8] Lailasari, Siti Nur Elia dkk. 2009. Implementasi Dan Analisis Distance-Based Outlier Detection Pada Kumpulan Artikel Web Berita Berbahasa Indonesia. Bandung: Universitas Telkom.

[9] Mumtaz, K., and Duraiswamy, K. (2010). An analysis on density-based clustering of multi-dimensional spatial data. Indian Journal of Computer Science and Engineering, 1(1), pp. 8–12.

[10] Nagpal, P. B. & Mann, P. A. (2011). Comparative study of density-based clustering algorithms. International Journal of Computer Applications, 27 (11), 44-47.

[11] Prasetyo, Eko. 2014. “DATA MINING-Mengolah Data Menjadi Informasi Menggunakan Matlab." Yogyakarta: Andi Yogyakarta.

[12] Sinwar dan R. Kaushik, “Study of Euclidean and Manhattan Distance Metrics using Simple K-Means Clustering." International Journal for Research in Applicated Science and Engineering Technology (IJRASET), vol. 2, no. 5, 2014.

[13] Solimun (2002), Structural Equation Modeling LISREL dan Amos, Fakultas MIPA Universitas Brawijaya, Malang.

[14] Tan, dkk. 2006. “TAHAPAN KNOWLEDGE DISCOVERY in DATABASE.”.

[15] Vitalievichaveryanov, S., Khairzamanova, K.A., Kudashkina, N.V., Hasanova, S.R., Tuygunov, M. Efficiency of clinical application of phytofilm in treating patients with traumatic lesions of oral mucosa (2018) International Journal of Pharmaceutical Research, 10 (4), pp. 611-615. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85062407112&partnerID=40&md5=c790c8507f3c4f39fc6e07ef73e66f55

[16] M. I. Niyas ahamed (2014) Ecotoxicity concert of nano zero-valent iron particles—a review. Journal of Critical Reviews, 1 (1), 36-39.

[17] Gangurde HH, Gulecha VS, Borkar VS, Mahajan MS, Khandare RA, Mundada AS. "Swine Influenza A (H1N1 Virus): A Pandemic Disease." Systematic Reviews in Pharmacy 2.2 (2011), 110-124. Print. doi:10.4103/0975-8453.86300