Login

jacqueskoshy · 08-16-2017, 10:14 PM

ABSTRACT

The project titled Mining Frequent Patterns from Large Databases Using Count distribution algorithm is helpful in mining frequent items from transactional databases. This will enable the Analyzer to know the frequent items that are presented or purchased from large amounts of data. It is designed in a way that simplifies the process of finding frequent items. The system provides a text field for the analyzer to enter minimum support value and the system calculates the frequent items that are presented.

The process of finding the frequent item sets includes, developing an application domain, the relevant prior knowledge, and the goals of the end-user. Selecting the target data set on which discovery is to be performed, and cleaning and transforming this data if necessary. Choosing the data-mining task, the algorithm, and deciding which models and parameters may be appropriate. Before using the algorithm we first preprocess the data. We implemented Count Distribution Algorithm for finding frequent patterns from transactional databases. We implemented Partition Algorithm to partition the database. The Count Distribution Algorithm is a simple parallelization of Apriori. All processors generate the entire candidate hash tree from item set. Each processor can thus independently get partial supports of the candidates from its local database partition. Next, the algorithm does a sum reduction to obtain the global counts by exchanging local counts with all other processors. Partition algorithm, which logically divides the horizontal database into non overlapping partitions. Each partition is read, and vertical transaction lists (lists of all transactions where the item appears) are formed for each item. Partition then generates all locally frequent item sets through transaction list intersections. Performing the actual data mining to extract patterns and models. Visualizing, interpreting and consolidating the discovered knowledge. Finally, after calculating the frequent item data sets from transactional data bases, the system if required calculates the frequent items from the transactions that are based upon time intervals.