Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Collaboration of filtering based Recomender system full report with code
#1



[attachment=8361]

Introduction

Recommender systems apply data analysis techniques to the problem of helping users find the items they would like to purchase at E-Commerce sites by producing a predicted likeliness score or a list of top-N recommended items for a given user. Item recommendations can be made using different methods. Recommendations can be based on demographics of the users, overall top selling items, or past buying habit of users as a predictor of future items. Collaborative Filtering (CF) is the most successful recommendation technique to date. The basic idea of CF-based algorithms is to provide item recommendations or predictions based on the opinions of other like-minded users. The opinions of users can be obtained explicitly from the users or by using some implicit measures.

Problem definition
In this project we demonstrate that an association rule based recommender system provides an algorithmic approach that shows improvement in performance compared to standard memory based k-nearest neighbor. Furthermore, our results show that Apriori can achieve better recommendation accuracy to k-nn.

Conclusion
In this chapter, a general overview of the problem was given as well as what motivated us to pursue this problem. The following chapters would give the reader an insight into the work we have completed over the course of this semester. Chapter 2 covers a summary of the theoretical description of the project as well as the important terms and definition relevant to the project. Chapter 3 covers the specifications of the design module. It lets the reader know about the various functions which each module comprises. Chapter 4 describes how we implemented the various modules in this project. Finally, in chapter 5 we have given a summary of our project work.

Theory

In this chapter we have given a theoretical description of the various definitions, terminology employed, algorithms used in our project. It would allow the reader to better understand our project.

Recommender System:
Recommender system form a specific type of information filtering (IF) technique that attempts to present information items (movies, music, books, news, images, web pages etc.) that are likely of interest to the user. Typically, a recommender system compares the user s profile to some reference characteristics, and seeks to predict the rating that a user would give to an item that they had not yet considered. These characteristics may be from the information items (the content based approach) or user s social environment (the collaborative filtering approach).

Benefits of recommendation system:
Based on real activity
The biggest benefit of recommendation systems is that they record, and then base their recommendations on actual user behavior. Their recommendations are not based on guesswork, but on an objective reality. This is the holy grail of design: watching people in their natural environment and making design decisions based directly on the results. Recommendation systems are not perfect, but because they predict the future based on the past, they are remarkably good.
Great for discovery
Sometimes it can feel like we're the victims of horribly inefficient advertising. This is apparent when we go to the movies and none of the previews are interesting, or when none of the music on the radio is aligned to our tastes. Recommendation systems help alleviate this problem because they allow us to discover things that are similar to what we already like. And, as in the Beatles/Radiohead example, they can make some pretty surprising recommendations that we probably wouldn't have found out about otherwise.
Personalization
We often take recommendations from friends and family because we trust their opinion. Part of that trust is built on our relationship: they know us better than anyone else so they know what we like and what we don't like. They're good at recommending things for that very reason. This is what recommendation systems try to model: the intimate knowledge of what we like and don't like.
Always up to date
Recommendation systems are dynamically updated, and therefore are always up-to-date. Some new, interesting news? It will be on Digg within minutes. An interesting new product on Amazon? It quickly gets recommended as long as people rate it highly. The ability for a recommendation system to bubble up activity in real time is a huge advantage because the system is always on.
Reduced organizational maintenance
Building recommendation systems is quite different from how we've built information-rich web sites in the past. For many designers the primary task of building an information-rich web site is creating navigation systems built on top of an underlying taxonomy. The taxonomy is built out of the designer's knowledge of users and the domain, generated from observations made during field research, insights from persona creation, or knowledge gained from other design techniques. Most of the organizational maintenance of a site is keeping the navigation system and taxonomy in line with the users changing needs.

Drawbacks of recommendation system are:
Difficult to setup
Recommendation systems are intensive, database-driven applications that are not trivial to create and get running. It takes a serious development project to do so. Moving to a recommendation system takes time, energy, and a long-term commitment.
Maintenance shifted elsewhere
Even though recommendation systems can reduce organizational maintenance, they don't get rid of maintenance itself. For example, keeping the system up and running becomes a major task, as it is with any significant database-backed system.
Sometimes they are wrong
Recommendation systems aren't just a technological challenge. They're also a social one. Sometimes people are unhappy with recommendations. Sometimes recommendations are wrong.
Gaming
Gaming is a big issue, especially on news recommendation sites. Digg, the most popular site of this kind, is continually battling attempts by users to game the system. When a news story is popular on Digg, it gets promoted to the home page as a recommendation for everyone to click on and read. When this happens the clicking and reading can quickly get into the thousands. (Many web sites have actually been brought down by the resulting traffic.) This huge increase in attention is attractive to those people willing to game the system for their own personal benefit. Digg is constantly moderating its members to keep this at a minimum.

Collaborative Filtering:
Collaborative filtering (CF) is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including sensing and monitoring data such as mineral exploration, environmental sensing over large areas of multiple sensors, financial data such as financial service institutions that integrate many financial sources, or in electronic commerce and web applications where the focus is on user data etc.
The underlying assumption of collaborative filtering approach is that those who agreed in the past tend to agree again in the future. These predictions are specific to the user, but use information gleaned from many users. There are two techniques of collaborative filtering:
Memory based collaborative filtering
Memory-based algorithms utilize the entire user-item database to generate a prediction. These systems employ statistical techniques to find a set of users, known as neighbors, that have a history of agreeing with the target user (i.e., they either rate different items similarly or they tend to buy similar sets of items). Once a neighborhood of users is formed, these systems use different algorithms to combine the preferences of neighbors to produce a prediction or top-N recommendation for the active user. The techniques, also known as nearest-neighbor or user-based collaborative filtering are more popular and widely used in practice.
Model based collaborative filtering
Model-based collaborative filtering algorithms provide item recommendation by first developing a model of user ratings. Algorithms in this category take a probabilistic approach and envision the collaborative filtering process as computing the expected value of a user prediction, given his/her ratings on other items. The model building process is performed by different machine learning algorithms such as Bayesian network, clustering, and rule-based approaches

Association Rule Mining:
Association rule mining is a common technique for performing market basket analysis. The intent is to gain insight into customer s buying habit and discover groups of product that are commonly purchased together. For example, an association rule may show that 98% of the customers that purchase frozen pizza purchase soda. Association rules capture relationships among items based on patterns of co-occurrence across transactions. We have adapted this approach to the context of collaborative filtering. Considering each user profile as a transaction, it is possible to use the Apriori algorithm and generate association rules for groups of commonly liked items.
Given a set of user profiles U and a set of item sets I = {I1, I2, . . . , Ik}, the support of an item set Ii belongs to I is defined as S(Ii) = {u belongs to U : Ii is subset of u} / U . Item sets that satisfy a minimum support threshold are used to generate association rules. These groups of items are referred to as frequent item sets. An association rule r is an expression of the form X => Y (Sr, Sr), where X and Y are item sets, Sr is the support of X U Y , and Ar is the confidence for the rule r given by S(XUY)/S(X). In addition, association rules that do not satisfy a minimum lift threshold are pruned, where lift is defined as Ar/S(Y). If there is not enough support for a particular item, that item will never appear in any frequent item set. The implication is that such an item will never be recommended. The issue of coverage is a tradeoff. Lowering the support threshold will ensure that more items can be recommended, but at the risk of recommending an item without sufficient evidence of a pattern.

Association rules are required to satisfy a user-specifies minimum support and a user-specified minimum confidence at the same time. To achieve this, association rule generation is a two-step process. First minimum support is applied to find all frequent item sets in a database. In a second step, these frequent item sets and the minimum confidence constraint are used to form rules. While the second step is straight forward, the first step needs more attention. Finding all frequent item sets in a database is difficult since it involves searching all possible item sets (item combinations). The set of possible item sets is the power set over I and has size 2^n-1(excluding the empty set which is not a valid item set). Although the size of the power set grows exponentially in the number of items n in I, efficient search is possible using the downward-closure property of support (also called anti-monotonically) which guarantees that for a frequent item set also all its subsets are frequent and thus for an infrequent item set, all its supersets must be infrequent. Exploiting this property, efficient algorithms (eg., Apriori and Eclat) can find all frequent item sets.

Apriori algorithm:
In computer science and data mining, Apriori is a classic algorithm for learning asoociation rules. Apriori is designed to operate on databases containing transactions (for example, collections of items brought by customers, or details of a website frequestation). Other algorithms are designed for finding association rules in data having no transactions or having no timestamps.
As is common in association rule mining, given a set of item sets (for instance, sets of retail transactions, each listing individual items purchased), the algorithm attempts to find subsets which are common to at least a minimum number C of the item sets. Apriori uses a bottom up approach, where frequent item sets are extended one item at a time (a stem known as candidate generation) and groups of candidate are tested against the data. The algorithm terminates when no further successful extensions are found.
Apriori uses breadth-first search and a tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k-1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequest k-length item sets. After that, it scans the transaction database to determine frequest item sets among the candidates.

Algorithm
Association rule mining is to find out association rules that satisfy the predefined minium support and confidence from a given database. The problem is usually decomposed into two subproblems. One is to find those itemsets whose occurrence exceed a predefined threshold in the database; those itemsets are called frequent or large itemsets. The second problem is to generate association rules from those large itemsets with the constrains of minimal confidence. Suppose one of the large itemsets is Lk, Lk={I1,I2, ,Ik-1}, association rules with this itemsets are generated in the following way: the first rule is {I1,i2, Ik-1}=>{Ik},by checking the confidence this rule can be determined as interesting or not. The other rule are generated by deleting the last items in the antecedent and inserting it to the consequent, further the confidence of the new rules are checked to determine the interestingness of them. Those processes iterated until the antecedent becomes empty. Since the second sub problem is quite straight forward , most of the researches focus on the first sub problem. The apriori algorithm finds the frequest sets L in database D.

Recommendation Algorithms

In this chapter we first present a collaborative recommendation algorithm based on association rule mining. We next provide information on the standard memory-based k-nn

Association rule based collaborative filtering:
Before performing association rule mining on a collaborative filtering dataset, it is necessary to discretize the rating values of each user profile. We first subtract each user s average rating from the ratings in their profile to obtain a zero-mean profile. Next, we give a discrete category of like or dislike to each rated item in the profile if it s rating value is > or _ zero, respectively.
Discretizing the dataset effectively doubles the total number of features used in analysis, but is necessary for inferring recommendable items. In classic market basket analysis, it is assumed that a customer will not purchase an item they do not like. Hence, a transaction always contains implicit positive ratings. However, when dealing with explicit rating data, certain items may be disliked. It is clear that a collaborative recommender must take such preference into account or risk recommending an item that is rated often, but disliked by consensus.

To make a recommendation for a target user profile u, we create a set of candidate items C such that an association rule r exists of the form X S u => i 2 C where i is an unrated item in the profile u. In practice, it is not necessary to search every possible association rule given u. It is
sufficient to find all frequent item sets X S u and base recommendations on the next larger frequent itemsets Y S X where Y contains some item i that is unrated in u. A caveat to this approach is the possibility of conflicting recommendations in the candidate set C. For example, one association rule may add item i to the candidate set with a confidence of 90% whereas another rule may add the same item with a confidence of 5%. In this case, we simply use the rule with the highest confidence.

Another possibility is that one association rule may add item i to the candidate set with a like label, whereas another rule may add the same item with a dislike label. There is not an ideal solution in this case, but we have chosen to assume that there are opposing forces for the recommendation of the item. In our implementation, we subtract the confidence value of the dislike label from the confidence value of the like label.

To facilitate the search for item sets, we store the frequent item sets in a directed acyclic graph, called a Frequent Item set Graph. The graph is organized into levels from 0 to k, where k is the maximum size among all frequent item sets. Each node at depth d in the graph corresponds to an
item set I of size d and is linked to item sets of size d + 1 that contain I. The root node at level 0 corresponds to the empty item set. Each node also stores the support value of the corresponding frequent item set.

Given a target user profile u, we perform a depth-first search of the Frequent Itemset Graph. When we reach a node whose frequent item set In is not contained in u, the item i 2 In not found in u is added to the candidate set C and search at the current branch is ceased. Note that the item set of the parent node Ip to In must be contained in u by definition, and because In is of size d+1 where Ip is size d, there can be only one item i 2 In that is not contained in u. It follows that In = Ip [{i} and the two nodes correspond to the rule Ip => {i}. We calculate the confidence of the rule as S(In)/S(Ip). The candidate i 2 C is stored in a hashtable along with it s confidence value. If it already exists in the hashtable, the highest confidence value takes precedent. After completion of the depth-first search, all possible candidates for the target user u are contained in C, including items labeled dislike . In order to properly represent an estimated negative connotation, items labeled dislike are given a recommendation score that is the negation of the
confidence value. If there is a corresponding like label for the item in C, it s recommendation score is decreased by the confidence of the dislike label. As a final step, the candidate set C is sorted according to the recommendation scores and the top N items are returned as a recommendation.
k-nearest neighbor based collaborative filtering:
One critical step in the item-based collaborative filtering algorithm is to compute the similarity between items and then to select the most similar items. The basic idea in similarity computation between two items i and j is to first isolate the users who have rated both of these items and then to apply a similarity computation technique to determine the similarity si,j. Figure 3.2.1 illustrates this process, here the matrix rows represent users and the columns represent items.

There are a number of different ways to compute the similarity between items. We use cosine-based similarity.
The most important step is to generate the output interface in terms of prediction. Once we isolate the set of most similar items based on the similarity measures, the next step is to look into the target users' ratings and use a technique to obtain predictions. We use the technique of weighted sum. Basically, this approach tries to capture how the active user rates the similar items. The weighted sum is scaled by the sum of the similarity terms to make sure the prediction is within the predefined range.

Experimental evaluation

In this chapter we evaluate our recommendation algorithm based on association rule mining and k-nearest neighbor, we compare the results. We report the relative improvement of association rule mining over the k-nearest Neighbor.

Selecting dataset for evaluation:

Several key decisions regarding data sets underlie successful evaluation of a recommender system algorithm. Can the evaluation be carried out offline on an existing data set or does it require live user tests? If a data set is not currently available, can evaluation be performed on simulated data? Evaluations can be completed using offline analysis, a variety of live user experimental methods, or a combination of the two. Much of the work in algorithm evaluation has focused on off-line analysis of predictive accuracy. In such an evaluation, the algorithm is used to predict certain withheld values from a dataset, and the results are analyzed using one or more of the metrics discussed in the following section. Such evaluations have the advantage that it is quick and economical to conduct large evaluations, often on several different datasets or algorithms at once. Once a dataset is available, conducting such an experiment simply requires running the algorithm on the appropriate subset of that data.

Offline analyses have two important weaknesses. First, the natural scarcity of ratings data sets limits the set of items that can be evaluated. We cannot evaluate the appropriateness of a recommended item for a user if we do not have a rating from that user for that item in the dataset. Second, they are limited to objective evaluation of prediction results. No offline analysis can determine whether users will prefer a particular system, either because of its predictions or because of other less objective criteria such as the aesthetics of the user interface

Dataset:
In our experiments, we have used the publicly-available Movie-Lens 100K dataset1. This dataset consists of 100,000 ratings on 1682 movies by 943 users. All ratings are integer values between one and five, where one is the lowest (disliked) and five is the highest (liked).

Accuracy analysis
We first analyze the accuracy of our association rule recommender compared to k-nearest neighbor. Determining a suitable evaluation metric was challenging because the two algorithms are based on fundamentally different approaches. The k-nn algorithm predicts a rating value for each target item and ranks all items based on this score. However, the association rule algorithm produces a ranked list, such that the recommendation score is the confidence that a target user will like the recommended item. It is not obvious how to directly compare the recommendation scores of the two algorithms, because k-nn uses a predicted value and association rules use a confidence measure.

It is also not possible to make a prediction of the rating value from the association rule recommendation list. However, the association rule recommender does make a more general pre-
diction; it predicts a binary like or dislike classification for a recommended item if the confidence value is positive or negative, respectively. In order to compare the accuracy of our association rule recommender and k-nn, we transform both recommendation lists into binary like and dislike categories for each item. For the association rule recommender, it is simply a matter of using the sign of the confidence value, as discussed in the previous paragraph. For k-nn, we categorize an item as like if the predicted rating is greater than the user s mean rating, and dislike otherwise. It is now possible to compare the accuracy of the two algorithms.

Because Apriori selects recommendations from only among those item sets that have met the support threshold, it will by necessity have lower coverage than our baseline algorithms. There will be some items that do not appear in the Frequent Itemset Graph, and about which the algorithm cannot make any predictions. This problem may occur in a k-nn algorithm as well, since there may be no peer users who have rated a given item.

Conclusion

In this paper, we have demonstrated the relative robustness and stability of model-based algorithms over the memory- based approach. In particular, we have introduced a robust recommendation algorithm based on association rule mining that attempts to capture relationships among items based on patterns of co-occurrence across user profiles. Frequent item sets are generated for the association rules by first discretizing all user profiles such that an item rating is classified as like or dislike . Overall, the association rule recommender far outperforms the standard k-nearest neighbor algorithm with respect to robustness.

Code

Code:
Main.java (for k-nearest neighbor)
package lens;
import com.sun.java_cup.internal.runtime.Symbol;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import java.math.*;
import java.util.HashMap;

public class Main {  
  public static void main(String[] args) throws FileNotFoundException, IOException {
  int countRated = 0, countSameRated = 0, countNotSameRated = 0;
  Main obj = new Main();
  FileReader fr = new FileReader("C:\\MovieLens\\u.data");
  BufferedReader br = new BufferedReader(fr);
  FileReader fr2 = new FileReader("C:\\MovieLens\\u.item");
  BufferedReader br2 = new BufferedReader(fr2);
  String s;
  int maxUser = 0, maxItem = 0;
  HashMap movieNames = new HashMap();
  while((s = br2.readLine()) != null) {
  StringTokenizer st = new StringTokenizer(s, " ");
  //System.out.println(Integer.parseInt(st.nextToken()) + " " + st.nextToken());
  //System.out.println();
  movieNames.put(Integer.parseInt(st.nextToken()), st.nextToken());
  //System.out.println(tmp1 + " " + tmp2);
  }
  while((s = br.readLine()) != null)
  {
  StringTokenizer st = new StringTokenizer(s, "\t");
  int tmp1 = Integer.parseInt(st.nextToken());
  if(tmp1 > maxUser)
  maxUser = tmp1;
  int tmp2 = Integer.parseInt(st.nextToken());
  if(tmp2 > maxItem)
  maxItem = tmp2;
  
  //System.out.println(tmp1 + " " + tmp2);
  }
  br.close();
  fr.close();
  
  //System.out.println(maxUser + " " + maxItem);
  
  int[][] matrix = new int[maxUser][maxItem];
  for(int i=0;i<maxUser;i++) for(int j=0;j<maxItem;j++) matrix[i][j] = 0;
  fr = new FileReader("C:\\MovieLens\\u.data");
  br = new BufferedReader(fr);
  while((s = br.readLine()) != null)
  {
  StringTokenizer st = new StringTokenizer(s, "\t");
  matrix[Integer.parseInt(st.nextToken()) - 1][Integer.parseInt(st.nextToken()) - 1] = Integer.parseInt(st.nextToken());
  }
  //Get the UID
  int uid = 925;
  int bit =-1, uskiRating=-1;
  boolean beforePrediction[] = new boolean[maxItem];
  int sumintial = 0;
  float avgIntial=0;
  int countIntial=0;
  for(int i=0;i<maxItem;i++)
  {
  beforePrediction[i] = false;
  if(matrix[uid][i] != 0)
  {
  sumintial += matrix[uid][i];
  countIntial++;
  }
  }
  avgIntial = (float)sumintial / countIntial ;
  for(int i=0 ; i<maxItem ; i++)
  {
  if(matrix[uid][i] != 0)
  {
  if((matrix[uid][i] - avgIntial) > 0)
  beforePrediction[i] = true;
  }
  }

  for(int a = 0; a < maxItem ;a++) {
  if(bit != -1) matrix[uid][bit] = uskiRating;
  if(matrix[uid][a] != 0) {
  countRated++;
  bit=a;
  uskiRating = matrix[uid][a];
  }
  else {
  continue;
  }
  int count1=countIntial-1;
  float avgAfter =(float)(sumintial - matrix[uid][bit]) / count1 ;
  matrix[uid][bit] = 0;
  
  double cosineMeasures[] = new double[10]; //10 most similar users
  int userIDs[] = new int[10];
  for(int i=0;i<10;i++) cosineMeasures[i] = -2;
  
  double magUID = obj.magnitude(matrix[uid]);
  for(int i=0; i<maxUser; i++) {
  if(i == uid) continue;
  
  double tmp = obj.dotProduct(matrix[uid], matrix[i]) / (obj.magnitude(matrix[i]) * magUID);
  if(tmp>cosineMeasures[9]) {
  for(int k=9; k>=-1; k--) {
  if(k != -1 && tmp > cosineMeasures[k])
  continue;
  
  for(int j=9;j>k+1;j--) {
  cosineMeasures[j] = cosineMeasures[j-1];
  userIDs[j] = userIDs[j-1];
  }
  
  cosineMeasures[k+1] = tmp;
  userIDs[k+1] = i;
  break;
  }
  }
  }
  
//  for(int i=0;i<10;i++) {
//
//  System.out.println(cosineMeasures[i] + " " + userIDs[i]);
//  }
  
  //System.out.println("The Recommended Movies are : ")
  //Code for recommendations based on "one" most similar user
  /*for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 5) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 4) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 3) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 2) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 1) System.out.println(i);
  }
  }*/
  //Code for recommendation based on 10 most similar users
  /*double predictedRatings[] = new double[maxItem];
  for(int i=0;i<maxItem;i++) {
  
  }*/
  
  //int itemIDs[] = new int[maxItem];
  double topRatings[] = new double[10];
  String topMovies[] = new String[10];
for(int i=0;i<10;i++) topRatings[i] = -1;
  double rating=0;
  for(int i=bit;i<bit+1;i++) {
  if(matrix[uid][i] == 0) {
  double numer = 0 , denom = 0;
  for(int j=0;j<10;j++) {
  numer += cosineMeasures[j] * matrix[userIDs[j]][i];
  denom += cosineMeasures[j];
  }

  rating = numer / denom;
//  int k;
//  if(rating > topRatings[9]) {
//  for(k=9;k>=0;k--) {
//  if(rating < topRatings[k]) break;
//  }
//  //if(k==-1) k = 0;
//  k++;
//  for(int l=9; l>k;l--) {
//  topRatings[l] = topRatings[l-1];
//  topMovies[l] = topMovies[l-1];
//  }
//
//  topRatings[k] = rating;
//  topMovies[k] = (String) movieNames.get(i);
//  }
  //if(numer != 0)
  //System.out.println("Movie Name : " + movieNames.get(i+1) + "\tPredicted Rating : " + numer/denom);
  }
}
  
  if((rating - avgAfter) > 0)
  {
  if(beforePrediction[bit])
  {
  countSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "same liked");
  
  }
  else
  {
  countNotSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "not same liked");
  }

  }
  else
  {
  if(beforePrediction[bit] == false)
  {
  countSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "same disliked");
  }
  else
  {
  countNotSameRated++;
  
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "not same disliked");
  }
  }
  // System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating);
  }
  //  for(int i=0;i<10;i++) {
//  System.out.println("Movie Name : " + topMovies[i] + "\tRating : " + topRatings[i]);
//  }
  
  System.out.println(countRated + " " + countSameRated + " " + countNotSameRated);
  }
  
  int dotProduct(int []a, int []b){
  int ret = 0;
  for(int i=0;i<a.length;i++){
  ret += a[i] * b[i];
  }  
  return ret;
  }
  double magnitude(int []a) {
  int sum = 0;
  for(int i=0;i<a.length;i++) {
  sum += a[i] * a[i];
  }
  double ret = Math.sqrt(sum);
  return ret;
  }
}

Association Rules.java
package lens;
import com.sun.java_cup.internal.runtime.Symbol;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import java.math.*;
import java.util.HashMap;
public class Main {
  
  public static void main(String[] args) throws FileNotFoundException, IOException {
  // TODO code application logic here
  int countRated = 0, countSameRated = 0, countNotSameRated = 0;
  Main obj = new Main();
  FileReader fr = new FileReader("C:\\MovieLens\\u.data");
  BufferedReader br = new BufferedReader(fr);
  FileReader fr2 = new FileReader("C:\\MovieLens\\u.item");
  BufferedReader br2 = new BufferedReader(fr2);
  String s;
  int maxUser = 0, maxItem = 0;
  HashMap movieNames = new HashMap();
  while((s = br2.readLine()) != null) {
  StringTokenizer st = new StringTokenizer(s, " ");
  //System.out.println(Integer.parseInt(st.nextToken()) + " " + st.nextToken());
  //System.out.println();
  
  movieNames.put(Integer.parseInt(st.nextToken()), st.nextToken());
  //System.out.println(tmp1 + " " + tmp2);
  }
  while((s = br.readLine()) != null) {
  StringTokenizer st = new StringTokenizer(s, "\t");
  int tmp1 = Integer.parseInt(st.nextToken());
  if(tmp1 > maxUser)
  maxUser = tmp1;
  int tmp2 = Integer.parseInt(st.nextToken());
  if(tmp2 > maxItem)
  maxItem = tmp2;
  //System.out.println(tmp1 + " " + tmp2);
  }
  br.close();
  fr.close();
  //System.out.println(maxUser + " " + maxItem);
  int[][] matrix = new int[maxUser][maxItem];
  for(int i=0;i<maxUser;i++) for(int j=0;j<maxItem;j++) matrix[i][j] = 0;
  fr = new FileReader("C:\\MovieLens\\u.data");
  br = new BufferedReader(fr);
  while((s = br.readLine()) != null)
{
  StringTokenizer st = new StringTokenizer(s, "\t");
  matrix[Integer.parseInt(st.nextToken()) - 1][Integer.parseInt(st.nextToken()) - 1] = Integer.parseInt(st.nextToken());
  }
  //Get the UID
  int uid = 925;
  int bit =-1, uskiRating=-1;
  boolean beforePrediction[] = new boolean[maxItem];
  int sumintial = 0;
  float avgIntial=0;
  int countIntial=0;
  for(int i=0;i<maxItem;i++)
  {
  beforePrediction[i] = false;
  if(matrix[uid][i] != 0)
  {
  sumintial += matrix[uid][i];
  countIntial++;
  }
  }
  avgIntial = (float)sumintial / countIntial ;
  for(int i=0 ; i<maxItem ; i++)
  {
  if(matrix[uid][i] != 0)
  {
  if((matrix[uid][i] - avgIntial) > 0)
  beforePrediction[i] = true;
  }
  }
for(int a = 0; a < maxItem ;a++) {
  if(bit != -1) matrix[uid][bit] = uskiRating;
  if(matrix[uid][a] != 0) {
  countRated++;
  bit=a;
  uskiRating = matrix[uid][a];
  }
  else {
  continue;
  }
  int count1=countIntial-1;
  float avgAfter =(float)(sumintial - matrix[uid][bit]) / count1 ;
  matrix[uid][bit] = 0;
  double cosineMeasures[] = new double[10]; //10 most similar users
  int userIDs[] = new int[10];
  for(int i=0;i<10;i++) cosineMeasures[i] = -2;
  double magUID = obj.magnitude(matrix[uid]);
  for(int i=0; i<maxUser; i++) {
  if(i == uid) continue;
  double tmp = obj.dotProduct(matrix[uid], matrix[i]) / (obj.magnitude(matrix[i]) * magUID);
  if(tmp>cosineMeasures[9]) {
  for(int k=9; k>=-1; k--) {
  if(k != -1 && tmp > cosineMeasures[k])
  continue;
  
  for(int j=9;j>k+1;j--) {
  cosineMeasures[j] = cosineMeasures[j-1];
  userIDs[j] = userIDs[j-1];
  }
  cosineMeasures[k+1] = tmp;
  userIDs[k+1] = i;
  break;
  }
  }
  }
  
//  for(int i=0;i<10;i++) {
//
//  System.out.println(cosineMeasures[i] + " " + userIDs[i]);
//  }
  
  //System.out.println("The Recommended Movies are : ");

  //Code for recommendations based on "one" most similar user
  /*for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 5) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 4) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 3) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {  
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 2) System.out.println(i);
  }
  }
  for(int i=0;i<maxItem;i++) {
  if(matrix[uid][i] == 0 && matrix[userIDs[0]][i]!=0) {
  if(matrix[userIDs[0]][i] == 1) System.out.println(i);
  }
  }*/
  //Code for recommendation based on 10 most similar users
  /*double predictedRatings[] = new double[maxItem];
  for(int i=0;i<maxItem;i++) {
  
  }*/
  
  //int itemIDs[] = new int[maxItem];

  double topRatings[] = new double[10];
  String topMovies[] = new String[10];

  for(int i=0;i<10;i++) topRatings[i] = -1;

  double rating=0;
  for(int i=bit;i<bit+1;i++) {
  if(matrix[uid][i] == 0) {
  double numer = 0 , denom = 0;
  for(int j=0;j<10;j++) {
  numer += cosineMeasures[j] * matrix[userIDs[j]][i];
  denom += cosineMeasures[j];
  }

  rating = numer / denom;
//  int k;
//  if(rating > topRatings[9]) {
//  for(k=9;k>=0;k--) {
//  if(rating < topRatings[k]) break;
//  }
//  //if(k==-1) k = 0;
//  k++;
//  for(int l=9; l>k;l--) {
//  topRatings[l] = topRatings[l-1];
//  topMovies[l] = topMovies[l-1];
//  }
//
//  topRatings[k] = rating;
//  topMovies[k] = (String) movieNames.get(i);
//  }

  //if(numer != 0)
  //System.out.println("Movie Name : " + movieNames.get(i+1) + "\tPredicted Rating : " + numer/denom);
  }

  }
  if((rating - avgAfter) > 0)
  {
  if(beforePrediction[bit])
  {
  countSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "same liked");
  
  }
  else
  {
  countNotSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "not same liked");
  }

  }
  else
  {
  if(beforePrediction[bit] == false)
  {
  countSameRated++;
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "same disliked");
  }
  else
  {
  countNotSameRated++;
  
  //System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating + "not same disliked");
  }
  }
  // System.out.println("Movie Name : " + rating + "\tRating : " + uskiRating);
  }
  

//  for(int i=0;i<10;i++) {
//  System.out.println("Movie Name : " + topMovies[i] + "\tRating : " + topRatings[i]);
//  }
  
  System.out.println(countRated + " " + countSameRated + " " + countNotSameRated);
  }
  
  int dotProduct(int []a, int []b){
  int ret = 0;
  for(int i=0;i<a.length;i++){
  ret += a[i] * b[i];
  }  
  return ret;
  }
  double magnitude(int []a) {
  int sum = 0;
  for(int i=0;i<a.length;i++) {
  sum += a[i] * a[i];
  }
  double ret = Math.sqrt(sum);
  return ret;
  }
}


Reply

#2
hi,
thanks for the source code. while observing the code, it seems to me that KNN code repeated twice. check it and send the association rule code(apriori algorithm)
Reply

#3
hi,
the KNN code is working. but Association rule code is not working. update it
executable code required for association rule implementation in recommender system
Reply

#4

Please,can anyone provide the working java code for Association Rules according to this project? The one which is provided is not for association Rules.
Thnx
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Powered By MyBB, © 2002-2024 iAndrew & Melroy van den Berg.