Analysis on pre-processing of heterogeneous dataset for ensemble clustering

Main Article Content

Darsana Prakash, S. Saranya and R. Abitha

Abstract

Cluster analysis is an unsupervised learning which reveals underlying structures in data and organizes them in clusters based on similarities. The approach to the both hard and soft clustering involves the concept of partial membership of the instance in the clusters and distance measure in the cluster. Clustering algorithms that have been analyzed are Fuzzy $c$-means (FCM), $K$- Means and $K$-Medoids etc. All these clustering algorithms do have some successful applications in agriculture, medicine, education, finance and business. Pre-processing is one of the key components in the clustering framework. The main objective is to preprocess heterogeneous dataset for different clustering algorithms and the time complexity is analyzed. In this project heterogeneous dataset that contains missing value is obtained from the UCI repository and it is used for preprocessing. The data pre- processing techniques are applied on the target data set to fill the missing value and attribute reduction to increase the effectiveness of algorithm. The key idea of this project is to preprocess the heterogeneous dataset and apply different clustering algorithms thereby to obtain best clustering result based on the time complexity. Finally the resultant clusters is be validated using silhouette plot and time complexity is also analyzed.

Article Details

Section
Articles