A Modified Approach of OPTICS Algorithm for Data Streams
Abstract
Data are continuously evolving from a huge variety of applications in huge volume and size. They are fast changing, temporally ordered and thus data mining has become a field of major interest. A mining technique such as clustering is implemented in order to process data streams and generate a set of similar objects as an individual group. Outliers generated in this process are the noisy data points that shows abnormal behavior compared to the normal data points. In order to obtain the clusters of pure quality outliers should be efficiently discovered and discarded. In this paper, a concept of pruning is applied on the stream optics algorithm along with the identification of real outliers, which reduces memory consumption and increases the speed for identifying potential clusters.
Keywords:
two phase, cluster quality, clustering technique, pruning, time and space complexity, threshold valueDownloads
References
L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, R. Motwani “Streaming-Data Algorithms for High-Quality Clustering”, 18th International Conference on Data Engineering, pp. 685-694, February 26-March 1, 2002
C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Clustering Evolving Data Streams”, International Conference on Very Large Databases, Vol. 29, pp. 81-92, 2003 DOI: https://doi.org/10.1016/B978-012722442-8/50016-1
C. C. Aggarwal, J. Han, J. Wang, P. S. Yu, “A Framework for Projected Clustering of High Dimensional Data Streams”, Thirtieth International Conference On Very Large Data Bases, Vol. 30, pp. 852-863, 2004 DOI: https://doi.org/10.1016/B978-012088469-8.50075-9
F. Cao, M. Ester, W. Qian, A. Zhou, “Density-based Clustering over an Evolving Data Stream with Noise”, SIAM International Conference on Data Mining and Secure Data Management (SDM), Vol. 6, pp. 328-339, 2006 DOI: https://doi.org/10.1137/1.9781611972764.29
Li-xiong, H. Hai, G. Yun-fei, and C. Fu-cai, “rDenStream: A Clustering Algorithm over an Evolving Data Stream”, International Conference on Information Engineering and Computer Science, pp. 1-4, December 19-20, 2009
K. Udommanetanakit, T. Rakthanmanon, K. Waiyamai, “E-Stream: Evolution-Based Technique for Stream Clustering”, Lecture Notes in Computer Science, Vol. 4632, pp. 606-616, 2007 DOI: https://doi.org/10.1007/978-3-540-73871-8_58
C. Dharni, M. Bnasal, “An improvement of DBSCAN Algorithm to analyze cluster for large datasets”, IEEE International Conference on MOOC Innovation and Technology in Education (MITE), pp. 42-46, 2013 DOI: https://doi.org/10.1109/MITE.2013.6756302
M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander, “OPTICS : Ordering Points To Identify the Clustering Structure”, ACM SIGMOD, Vol. 28, No. 2, pp. 49-60, 1999 DOI: https://doi.org/10.1145/304181.304187
L. Wan, W. K. Ng, X. H. Dang, P. S. Yu, and K. Zhang, “Density-based clustering of data streams at multiple resolutions”, ACM Transactions on Knowledge Discovery from Data (TKDD), Vol. 3, No. 3, pp. 1-28, 2009 DOI: https://doi.org/10.1145/1552303.1552307
I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, H. Kriegel, “Density-based projected clustering over high dimensional data streams”, Society of Industrial and Applied Mathematics (SIAM) International Conference on Data Mining, pp. 987-998, 2012 DOI: https://doi.org/10.1137/1.9781611972825.85
A. Amini, T. Y. Wah, “DENGRIS-Stream: A density-grid based clustering algorithm for evolving data streams over sliding window”, International Conference on Data Mining Computer Engineering, pp. 206-211, 2012
Y. Cao, H. He, H. Man, “SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps”, IEEE Transactions on Neural Networks Learning Systems., Vol. 23, No. 8, pp. 1254-1268, 2012.
A. Amini, T. Y. Wah, “LeaDen-Stream: A Leader Density-Based Clustering Algorithm over Evolving Data Stream”, Journal of Computer Communication, Vol. 1, No. 5, pp. 26-31, 2013 DOI: https://doi.org/10.4236/jcc.2013.15005
P. P. Rodrigues, J. Gama, J. P. Pedroso, “ODAC: Hierarchical Clustering of Time Series Data Streams”, IEEE Transaction on Knowledge Data Engineering , Vol. 20, No. 5, pp. 615-627, 2008 DOI: https://doi.org/10.1109/TKDE.2007.190727
T. Zhang, R. Ramakrishnan, M. Livny, “BIRCH: An Efficient Data Clustering Databases Method for Very Large”, ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996 DOI: https://doi.org/10.1145/235968.233324
E. Keogh, S. Chu, D. Hart, M. Pazzani, “An online algorithm for segmenting time series”, International Conference on Data Mining, pp. 289-296, 2001
Kavita, P. Bedi, “Clustering of Categorized Text Data Using Cobweb Algorithm”, International Journal Computer Science and Information Technology Research, Vol. 3, No. 3, pp. 249-254, 2015
M. Khalilian, N. Mustapha, “Data Stream Clustering: Challenges and Issues”, International Multi Conference of Engineers and Computer Scientists, Vol. 1, Hong Kong, March 17-19, 2010
A. Amini, T. Y. H. Saboohi, “On Density-Based Data Streams Clustering Algorithms:A Survey”, Journal of Computer Science and Technology, Vol. 29, No. 1, pp.116-141, 2014 DOI: https://doi.org/10.1007/s11390-014-1416-y
M. Shukla, Y. P. Kosta, P. Chauhan, “Analysis and evaluation of outlier detection algorithms in data streams”, IEEE International Conference on Computer, Communication and Control (IC4), pp. 1-8, September 10-12, 2015 DOI: https://doi.org/10.1109/IC4.2015.7375696
P. Chauhan, M. Shukla, “A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm”, IEEE International Conference on Advances in Computer Engineering and Applications (ICACEA), pp. 580-585, 2015 DOI: https://doi.org/10.1109/ICACEA.2015.7164758
M. Kamber, J. Han, Data Mining: Concepts and Techniques, Second edition, Elsevier, 2001
Downloads
How to Cite
License
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.