Performance Analysis of Duplicate Record Detection Techniques
In this paper, a comprehensive performance analysis of duplicate data detection techniques for relational databases has been performed. The research focuses on traditional SQL based and modern bloom filter techniques to find and eliminate records which already exist in the database while performing bulk insertion operation (i.e. bulk insertion involved in the loading phase of the Extract, Transform, and Load (ETL) process and data synchronization in multisite database synchronization). The comprehensive performance analysis was performed on several data sizes using SQL, bloom filter, and parallel bloom filter. The results show that the parallel bloom filter is highly suitable for duplicate detection in the database.
A. K. Elmagarmid, P. G. Ipeirotis, V. S. Verykios, “Duplicate record detection: A survey”, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 1, pp. 1-16, 2007
O. H. Akel, A Comparative Study of Duplicate Record Detection Techniques, MSc Thesis, Middle East University, 2012
B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors”, Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970
L. Fan, P. Cao, J. Almeida, A. Z. Broder, “Summary cache: a scalable wide-area web cache sharing protocol”, IEEE/ACM Transactions on Networking, Vol. 8, No. 3, pp. 281-293, 2000
F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, G. Varghese, “An improved construction for counting bloom filters”, in: European Symposium on Algorithms, Springer, pp. 684-695, 2006
M. Mitzenmacher, “Compressed bloom filters”, IEEE/ACM Transactions on Networking, Vol. 10, No. 5, pp. 604-612, 2002
B. Chazelle, J. Kilian, R. Rubinfeld, A. Tal, “The Bloomier filter: an efficient data structure for static support lookup tables”, Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, USA, January 11-14, 2004
A. Kumar, J. Xu, J. Wang, “Space-code bloom filter for efficient per-flow traffic measurement”, IEEE Journal on Selected Areas in Communications, Vol. 24, No. 12, pp. 2327-2339, 2006
D. Guo, J. Wu, H. Chen, X. Luo, “Theory and network applications of dynamic bloom filters”, 25th IEEE International Conference on Computer Communications, Barcelona, Spain, April, 23-29, 2006
S. Geravand, M. Ahmadi, “Bloom filter applications in network security: A state-of-the-art survey”, Computer Networks, Vol. 57, No. 18, pp. 4047-4064, 2013
Y. Emami, R. Javidan, “An Energy-efficient Data Transmission Scheme in Underwater Wireless Sensor Networks”, Engineering, Technology & Applied Science Research, Vol. 6, No. 2, pp. 931-936, 2016
MetricsAbstract Views: 132
PDF Downloads: 90
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
Most read articles by the same author(s)
- M. Ebrahim, S. H. Adil, D. Nawaz, A Performance Comparative Analysis of Block Based Compressive Sensing and Line Based Compressive Sensing , Engineering, Technology & Applied Science Research: Vol. 8 No. 2 (2018): April, 2018