Login

NANDA · 10-04-2017, 07:46 PM

Abstract

Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known methods of text clustering, however, do not really address the special problems of text clustering: very high dimensionality of the data, very large size of the databases and understandability of the cluster description. In this paper, we introduce a novel approach which uses frequent item (term) sets for text clustering. Such frequent sets can be efficiently discovered using algorithms for association rule mining. To cluster based on frequent term sets, we measure the mutual overlap of frequent sets with respect to the sets of supporting documents. We present two algorithms for frequent term-based text clustering, FTC which creates flat clusterings and HFTC for hierarchical clustering. An experimental evaluation on classical text documents as well as on web documents demonstrates that the proposed algorithms obtain clusterings of comparable quality significantly more efficiently than state-of-theart text clustering algorithms. Furthermore, our methods provide an understandable description of the discovered clusters by their frequent term sets.

chintu.kay1986 · 10-04-2017, 07:46 PM

I want to the frequent term based text clustering code to run my data,and make a contrast with my method.