10-06-2017, 07:08 AM
FPGA implementations of data mining algorithms
Abstract
In recent decades there has been an exponential growth in quantity of collected data. Various data mining procedures have been developed to extract information from such large amounts of data. Handling ever increasing amount of data generates increasing demand for computing power. There are several ways of dealing with this demand, such as multiprocessor systems, and use of graphic processing units (GPU). Another way is use of field programmable gate array (FPGA) devices as hardware accelerators. This paper gives a survey of the application of FPGAs as hardware accelerators for data mining. Three data mining algorithms were selected for this survey: classification and regression trees, support vector machines, and k-means clustering. A literature review and analysis of FPGA implementations was conducted for the three selected algorithms. Conclusions on methods of implementation, common problems and limitations, and means of overcoming them were drawn from the analysis.
INTRODUCTION
Thanks to development of computer systems and its applications, the last several decades have been marked by exponential growth of collected data in all areas of human activity. To deal with this continuous and increasing influx of data it was necessary to develop computational methods for extracting information and discovering knowledge. Computational process of non-trivial information extraction is called data mining. Data mining makes use of methods from closely related fields such as statistics, artificial intelligence, machine learning, pattern recognition and databases. With most data mining methods, the quantity of data directly impacts computational load. High computational loads occur because many problems include large quantities of data and require carrying out complex computations in many-dimensional space. The issue of computational load is a significant one. Quantity of collected data continually increases which implies that available compute power must increase to keep up with it.
SELECTED ALGORITHMS
A. Classification and Regression Trees Classification and regression tree (CART) is a decision and regression tree learning algorithm. In decision trees the output is a prediction on class to which the data item belongs. In regression trees the output is a real number, and the tree represents an approximation of the function that maps input data to predicted outcome. One other well known decision tree learning algorithm is C4.5. Decision trees are easy to interpret, can readily be converted into a set of if-then rules, and can work with incomplete data.
K-means clustering FPGA implementations of k-means clustering are primarily focused on image and video processing applications. Research is for the most part focused on computation of distance metric, which is the most computationally intensive part of the algorithm. Estlick et al. [13] considered two modifications of the algorithm which could increase the available parallelism: alternative distance metrics, and smaller word width of input vectors. Alternative metrics were experimentally evaluated and the Manhattan distance i xi i was found to be the most suitable with respect to quality of clustering and complexity of implementation. Experimentally was demonstrated that input data word width can be significantly decreased without adversely affecting clustering quality.
CONCLUSION
FPGA platform can be used as accelerator in data mining processes. It has great potential for use in data mining applications and computing in general, and especially in embedded systems. FPGA s flexibility and programmability enables implementation of optimal computer architecture for each specific task. From this literature survey we conclude that FPGA platform performs well as a hardware accelerator.