PLAGIARISM AUTO-DETECTION - Printable Version +- Free Academic Seminars And Projects Reports (https://easyreport.in) +-- Forum: Project Ideas And Disscussion (https://easyreport.in/forumdisplay.php?fid=32) +--- Forum: Engineering Project Ideas (https://easyreport.in/forumdisplay.php?fid=33) +---- Forum: Computer Science Project Ideas (https://easyreport.in/forumdisplay.php?fid=36) +---- Thread: PLAGIARISM AUTO-DETECTION (/showthread.php?tid=34134) |
PLAGIARISM AUTO-DETECTION - pbcool - 08-17-2017 [attachment=5158] PLAGIARISM AUTO-DETECTION PLAGIARISM AUTO-DETECTION IN ARABIC SCRIPTS USING STATEMENT-BASED FINGERPRINTS MATCHING AND FUZZY-SET INFORMATION RETRIEVAL SALHA MOHAMMED ALZAHRANI A project report submitted in partial fulfilment of the requirements for the award of the degree of Master of Science (Computer Science) ABSTRACT Many plagiarism detection techniques and tools have been developed mainly for English scripts. It has been found that different methods use different document descriptors ranging from characters to document structure. There is possibly no research involved in Arabic plagiarism detection although Arabic is the academic language in Arab universities and schools. Therefore in this study, two techniques have been developed for Arabic; three least-frequent 4-grams fingerprints matching and fuzzy-set IR using statement-based document representation. Two statements are treated as either similar if their fingerprints matched in the first technique, or if the degree of similarity computed by the second technique exceeded the threshold value. The corpora used in this study has 100 document collected from Arabic Wikipedia with 3763 statements and 54346 non-stopped, stemmed words in total. Another 15 query documents with 943 statements were constructed with different degree of plagiarism. Preprocessing operations were applied on the corpus collection and query documents, such as removing stop words and stemming. Resulted documents were stored into a database. In this study, preliminary experiments were carried out using WCopyFind and a na ve algorithm and results are still accurate, just not optimal. Thus, more investigation of three least-frequent 4-grams fingerprints matching and fuzzy-set IR techniques has been done to handle more practices of plagiarism effectively, such as rewording, rephrasing and restructuring of the statements. Our results using both techniques with Arabic are as successful as with English taking into account Arabic natural language processing is much more complex than English. The main conclusion is that Arabic plagiarism best can be handled with fuzzy-set IR since it outperforms the three least-frequent 4-grams fingerprints matching in terms of detecting similar, but not necessarily the same, statements. |