Contact at mumbai.academics@gmail.com or 8097636691
Responsive Ads Here

Thursday, 22 February 2018

Clustering with Multiviewpoint-Based Similarity Measure(2012)

Clustering with Multi viewpoint-Based Similarity Measure(2012)

Abstract:
All clustering methods have to assume some cluster relationship among the data objects that they are applied on. Similarity between a pair of objects can be defined either explicitly or implicitly. In this paper, we introduce a novel multiviewpoint-based similarity measure and two related clustering methods. The major difference between a traditional dissimilarity/similarity measure and ours is that the former uses only a single viewpoint, which is the origin, while the latter utilizes many different viewpoints, which are objects assumed to not be in the same cluster with the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could be achieved. Theoretical analysis and empirical study are conducted to support this claim. Two criterion functions for document clustering are proposed based on this new measure. We compare them with several well-known clustering algorithms that use other popular similarity measures on various document collections to verify the advantages of our proposal.
Existing System
A common approach to the clustering problem is to treat it as an optimization process. An optimal partition is found by optimizing a particular function of similarity (or distance) among data. Basically, there is an implicit assumption that the true intrinsic structure of data could be correctly described by the similarity formula defined and embedded in the clustering criterion function. Hence, effectiveness of clustering algorithms under this approach depends on the appropriateness of the similarity measure to the data at hand. For instance, the original k-means has sum-of-squared-error objective function that uses Euclidean distance. In a very sparse and high-dimensional domain like text documents, spherical k-means, which uses cosine similarity (CS) instead of Euclidean distance as the measure, is deemed to be more suitable.
Proposed System:
The work in this paper is motivated by investigations from the above and similar research findings. It appears to us that the nature of similarity measure plays a very important role in the success or failure of a clustering method. Our first objective is to derive a novel method for measuring similarity between data objects in sparse and high-dimensional domain, particularly text documents. From the proposed similarity measure, we then formulate new clustering criterion functions and introduce their respective clustering algorithms, which are fast and scalable like k-means, but are also capable of providing high-quality and consistent performance.
Modules:
  • Select File
HTML root file is selected from the list of files displayed in the window
  • Process
By processing the root file, we can get the child files which are linked to root file.
  • Histogram
Histogram displays the no of documents by showing the similarity range between 0 to 1.
  • Clusters
Clusters formed by considering similarity of the documents.
  • Similarity
Similarity is calculated between the keyword tags between two files
  • Result
Result is displayed as a bar chart which axis has similarity between file to file.
Software Requirement Specification
Software Specification
Operating System       :           Windows XP
Technology                 :           JAVA 1.6, Jfreechart
Hardware Specification
Processor                     :           Pentium IV
RAM                           :           512 MB
Hard Disk                   :           80GB

No comments:

Post a Comment