《automatic generation of transformation-based plagiarism detectors》由会员分享,可在线阅读,更多相关《automatic generation of transformation-based plagiarism detectors(5页珍藏版)》请在金锄头文库上搜索。
1、Automatic Generation of Transformation-based Plagiarism Detectors5Rachel Edita O. Roxas Nathalie Rose Lim Natasja R. BautistaSoftware Technology DepartmentCollege of Computer StudiesDe la Salle University-ManilaTel: (63) (2) (524-0402)Fax: 63) (2) (536-0278)roxasrdlsu.edu.ph, limndlsu.edu.ph, ABSTR
2、ACTPlagiarism Detector Generator, PDG, is a tool for computer programming professors to generate plagiarism detection software systems that would detect copied programs among a set of student programs. The plagiarism detector generated must be immune to the changes that students perform on copied pr
3、ograms; thus, performs transformations on the student programs to minimize the effect of such changes to similarity measures performed to detect plagiarism. The professor specifies the programming language definitions and transformation specifications, and the system automatically generates the corr
4、esponding plagiarism detector. The plagiarism detectors generated accept any program in the programming language defined and outputs the transformed strings of tokens ready for similarity analysis for suspected copies. Initial qualitative evaluations on mini-languages control and scope 5 illustrate
5、a flexible, convenient and cost-effective tool for building plagiarism detectors for various imperative and procedural programming languages. The approach also addresses some of the changes that students perform on copied programs which JPlag 9 fails to handle. These include modification of control
6、structures, use of temporary variables and sub-expressions, in-lining and re-factoring of methods, and redundancy (variables or methods that were not used).KeywordsPlagiarism detector, transformations, similarity measures.1. INTRODUCTIONProgram plagiarism has been an area of concern in the computer
7、science community, most especially, in institutions teaching computer science courses. It has been emphasized that the process of learning computer programming is experienced through the actual exposure of the individual to various problem-solving scenarios. Since it is totally vital for a student t
8、o develop his own skills in programming, it is imperative that a measure be implemented in order to discourage students to plagiarize other peoples work. Thus, schemes for detecting plagiarism must be formulated.From a set of student programs that are handed to the professor for a particular program
9、ming assignment, one way to detect plagiarism is to find pairs of programs that may be similar to each other to detect possible copied programs within the group of students. It is believed that the most frequently resorted source of copied programs are the internet and peers/classmates.As the number
10、 of computer science students increases, the detection of plagiarism becomes a difficult and complicated task. The use of the manual system of inspecting student programs becomes an unreliable and slow process; thus, automated systems are introduced. These systems are used to compare all pairs of pr
11、ograms, and to report suspected plagiarized ones which are consequently examined manually to confirm plagiarism.Existing automated plagiarism detectors perform analysis of programs written in particular programming languages and implement specific plagiarism detection schemes. Because of the increas
12、ing demand to develop these various plagiarism detectors, a generalized plagiarism detector is presented. It offers flexibility in the specification of the programming language on which the programs being analyzed are written and the application of specific plagiarism detection scheme.Plagiarism det
13、ectors have been implemented to analyze programs written in various programming languages. It is believed that the particular paradigm of the programming language of the student assignments to be analyzed would greatly affect the detection scheme to be applied.Existing plagiarism detectors implement
14、 various plagiarism detection schemes. These systems aim to quantify or extract the characteristics of programs written to measure program content and are compared to detect program similarities. In previous studies, program similarities were measured using the quantitative or structural approaches.
15、Earlier systems used quantitative approaches, while later ones used structural approaches or a combination of both approaches. In 1987, a lexical detector 6 has been devised to count the occurrences of all the words in each of the programs. Similarity is then captured using this word occurrence sche
16、me. It was based on the fact that program writers have their own unique way of representation and expression, thus similar use of words could mean plagiarized programs. Note that this system only detects copies which have common words used in programs. Because of the fact that students introduce changes to copied programs to disguise a copy, this system would be very limited to the lexical l