measurement of software similarity

7/22/2019 Measurement of Software Similarity

1/46

MEASUREMENT OF SOFTWARE SIMILARITY

UNDER THE SUPERVISION OF

PROF. ARITRA PAN

In the Year of 2010

Group NO: 13

SAMRAT GUPTA (ROLL-071531012037)

SANJUKTA MITRA (ROLL-071531012009)

MOU MONDAL (ROLL- 071531012064)

MOITRAYEE MONDAL (ROLL-071531012090)

SUSHOVAN POLLEY (ROLL-071531012065

SYAMAPARASAD INSTITUTE OF TECHNOLOGY &

MANAGEMENT7, Raja Ram Mohon Ray Road Kolkata: 41West

Bengal India

Syamaprasad Institute of Technology and Management


2/46

2


MANAGEMENT

7, Raja Ram Mohon Ray Road Kolkata:41 West Bengal,

India

Certificate

The work presented in this report is the united effort of

Sanjukta Mitra, Samrat Gupta & Mou Mondal, Moitrayee Mondal and

Sushovan Polley that any work of others that was used during the

execution of the project or is included in the report has been

suitably acknowledgement through the, standard practice of citing

references and stating appropriate acknowledgements.

We hereby forward the project

entitled MEASUREMENT OF SOFTWARE SIMILARITY, presented by

Samrat Gupta(Roll No: 071531012037 Reg. NO:

071531012101037), Sanjukta Mitra (Roll No: 071531012009

Reg. NO: : 071531012201009) & Mou Mondal (Roll No:

071531012064Reg. NO: 071531012201064) & Moitrayee Mondal

(Roll No: 071531012090 Reg No: 071531012201090)&

Sushovan Polley(Roll No: 071531012065 Reg. NO:

071531012101065 ) of 2007-2008 of 6th semester , Bachelor Of

Computer Application under the guidance in partial fulfillment of

the requirements for the degree of Bachelor Of Computer

Application of this college.

Prof. Aritra Pan

Syamaprasad Institute Of Technology & Managemen


3/46

(Project Supervisor)Associate Professor.Dept. of BCA, SITM


MANAGEMENT

7, Raja Ram Mohon Ray Road Kolkata:41 West Bengal,

India

Certificate Of Approval

The forgoing project report is hereby

approved as a creditable study of Bachelor in Computer Application

in a manner satisfactory to warrant its acceptance as a prerequisite

to the degree for which it has been submitted. It is understood that

by this approval the undersigned do not necessarily endorse or

approve any statement made, opinion expressed or conclusion

therein but approve this project report only for the purpose for

which it is submitted.

.

(External Examiners)

Prof. Aritra Pan Prof. Manikaustabh

Goswami

(Project Supervisor) Teacher In - Charge

Associate Professor SITM



4/46


5/46

4.FLOW CHART28-29

4.1.FLOW CHART OF THE CHARACTER MATCHING30-31

4.2. PROGRAM OF THE CHARACTER MATCHING....32-334.3. FLOW CHART OF THE STRING MATCHING.34-35

4.4. PROGRAM OF THE STRING MATCHING............36-37

5. HARDWARE & SOFTWARE.....

5.1. NECESSITY OF HARDWARE AND SOFTWARE...38

6. ADVANTAGES39

7. FUTURE SCOPE......40

8. PROBLEMS.41

9. REFERENCES.42-54

10. CONCLUSION..55



6/46

ACKNOWLEDGMENT

We would like to thank our project Supervisor Prof. Aritra Pan for her

moral support and guidance to complete our synopsis on time.

WE express our gratitude to all our friends and classmates for their

support and help in this project.

Last, but not the least we wish to express our gratitude to God

almighty for his abundant blessings without which this synopsis would

not have been successful.



7/46

ABSTRACT

Program assignments are traditionally an area of

serious concern in maintaining the integrity of the educational process.

Systematic inspection of all solutions for possible plagiarism has

generally required unrealistic amounts of time and effort. The Measure

Of Software Similarity tool developed by Alex Aiken at UC Berkeley

makes it possible

to objectively and automatically check all solutions

for evidence of plagiarism. We have used MOSS in several large sections

of a C programming course. (MOSS can also handle

a variety of other languages.) We feel that MOSS is a

major innovation for faculty who teach programming and recommend

that it be used routinely to screen for plagiarism.



8/46

1. INTRODUCTION

Probably every instructor of a programming course

has been concerned about possible plagiarism in the program

solutions turned in by students. Instances of cheating are found, but

traditionally only on an ad hoc basis. For example, the instructor

may notice that two programs have the same idiosyncrasy in their

I/O interface, or the same pattern of failures with certain test cases.

With suspicions raised, the programs may be examined further and

the plagiarism discovered. Obviously, this leaves much to chance.The larger the class, and the more different people involved in the

grading, the less the chance that a given instance of plagiarism will

be detected. For students who know about various instances of

cheating, which instances are detected and which are not may

seem (in fact, may be) random. A policy of comparing all pairs of

solutions against each other for evidence of plagiarism seems like

the correct approach. But a simple file diff would of course detect

only the most obvious attempts at cheating. The standard dumb

attempt at cheating on a program assignment is to obtain a copy of

a working program and then change statement spacing, variable

names, I/O prompts and comments. This has been enough to

require a careful manual comparison for detection, which simply

becomes infeasible for large classes with regular assignments.

Thus, programming classes have been in need of an automated

tool which allows reliable and objective detection of plagiarism.



9/46

1.1What is Moss

Moss (for a Measurement Of Software Similarity) is an automatic system

for determining the similarity of programs. To date, the main application

of Moss has been in detecting plagiarism in programming classes. Since

its development in 1994, Moss has been very effective in this role. The

algorithm behind moss is a significant improvement over other cheating

detection algorithms (at least, over those known to us. Measure Of

Software Similarity (MOSS) is a tool for determining similarities among

software programs, As of now MOSS can be used to detect similarities inC, C++, Java, Pascal, Ado, ML, Lisp and Scheme programs. MOSS is

primarily used for detecting plagiarism in programming assignments in

computer science and other engineering courses, though several text

formats are supported as well. The latest MOSS script can be

downloaded from the MOSS site. MOSS can execute on all UNIX, Linux

systems which have Perl, Mail etc. After downloading the MOSS script,

copy it to the directory consisting of the student programs. Then run

the moss script in that directory. After execution, the script sends the

data to MOSS server at Berkeley. MOSS server sends back a webpage

address, which is displayed at the prompt. This webpage consists of the

results. Results are available on the MOSS server for 14 days. The script

can also be run with one more options which handle more complicated

situations like comparing programs from different directories, excluding

Certain part of a program from the comparison etc.Our project namely is

developed on C programme code .At first we are developing a C

programme based on String Similarity, that checks two strings in two

separate array, if they are similar or not. When this is done successfully,



10/46

10

the same checking is done on two separate files using C programme

code.

1.2 FEATURES

This Measure of Software Similarity program can detect the similarity of

any program, text, and file.

This project can be used in any institute to prevent copy any assignment

from other.

This program can also applicable in search for any duplicating

information from same program is executing in different machine which

is connected to the main server.

t is also applicable in online duplication.

This project can also be used to avoid plagiarism.

It can also be used to eliminate redundancy of data.

It also helps to reduce the cost of a particular project.

This project namely Measurement of Software Similarity, helps to detect

data redundancy of any software, programme, text or file. One of the

biggest disadvantages of data redundancy is that it increases the size of

the database unnecessarily. Also data redundancy might cause the same

result to be returned as multiple search results when searching the

database causing confusion in the results. This also wastes a lot of space

thus incurring extra cost.

Another problem that can be met with Plagiarism is the act of taking

credit for someone else's work. This particular project helps to eliminatethis drawback.



11/46

1

1.3 plagiarism

Plagiarism, as defined in the 1995 Random House Compact Unabridged

Dictionary, is the "use or close imitation of the language and thoughts ofanother author and the representation of them as one's own original

work". Within academia, plagiarism by students, professors, or

researchers is considered academic dishonesty or academic fraud and

offenders are subject to academic censure, up to and including

expulsion. In journalism, plagiarism is considered a breach of

journalistic ethics, and reporters caught plagiarizing typically face

disciplinary measures ranging from suspension to termination of

employment. Some individuals caught plagiarizing in academic or

journalistic contexts claim that they plagiarized unintentionally, by failing

to include quotations or give the appropriate citation. While plagiarism

in scholarship and journalism has a centuries-old history, the

development of the Internet, where articles appear as electronic text,

has made the physical act of copying the work of others much easier.

Plagiarism is not the same as copyright infringement. While both

terms may apply to a particular act, they are different transgressions.

Copyright infringement is a violation of the rights of a copyright holder,

when material protected by copyright is used without consent. On the

other hand, plagiarism is concerned with the unearned increment to the

plagiarizing author's reputation that is achieved through false claims of

authorship.



12/46

12

1.3.1 Plagiarism prevention

Plagiarism cannot be eliminated completely but som

preventive measures may reduce plagiarism to minimum. There are threemain strategies to prevent plagiarism. First is the Trust Method, wherei

the students are told that we trust them and they are mature enough to

know that the test is for their benefit and cheating will prohibit thei

chances to see how well they mastered a particular concept. Thus, th

Trust Method trusts the learners to obey the rules and is implemented b

making the learners sign a Honor code before appearing for the test

Second is the Fence Method, which aims at making cheating impossible

This is implemented by tightening the security during tests, differen

questions for different students etc. Third Method is the Threat Method

which threatens the learners with the punishments that they will have to

face if plagiarism is detected. This is done, by announcing the penalt

before the assignment submissions or tests have started. Ideally one o

more of the above methods can be used as a preventive measure. Th

Instructor has to decide as to which method/methods to adopt based o

the purpose of the test. If the test is a part of the final exam of a course o

degree, then Fence Method or Threat method or both could b

implemented. If the test were a practice test for the self-assessment o

learners, Trust Method would be the best. These methods are to b

implemented before the commencement of the test/assignmen

submission. Additionally preventive measures can also be taken whil

conducting the test. If we have a test running parallel in a number o

remote centers, we can have authorised proctors to inspect the exam a

respective centers. These proctors can make sure that only authorised

students are taking the test at proper time, without any unauthorized



13/46

13

help. The tests at these centers can also be supervised by observing eac

center by Video Conferencing from a coordinating center. 1.3.2

Plagiarism Detection

In the previous section we saw the preventive measures fo

plagiarism. Now, after the test is conducted and the results are out, the

tough job starts. The questions becoming increasingly important in thi

context are can we trust the results that the machine has given us i.e

Does it mean that a student has understood a concept just because hi

score says so. If not then how do we differentiate between genuin

attempts and copies. In other words how to we detect copies among the

number of assignments submitted. Detecting plagiarism in a test fo

which n students had appeared, involves comparing each solution with

other n-1 solutions and this is not a trivial task. Let us see some attempt

to detect plagiarism in programming tests, which have evolved over time

Traditional attempts to detect plagiarism have been ad-hoc, typicall

involving manual checking, of programming assignments, for plagiarismThis manual checking too mostly happens only for suspected program

like two programs failing for same testcases, two programs looking ver

similar by structure etc. Also, the plagiarism detection is limited t

programs, which look alike or verbatim copy. Manually checking all th

programs in all possible combinations of plagiarism requires fair amoun

of time and manpower, especially when the number of programs to be

tested is large. Inspecting all the possible combinations for more complex

attempts of plagiarism (beyond verbatim copy), in such a scenario, is a

tougher job. The inconvenience and limitations of traditional attempts fo

detecting plagiarism led the instructors to exploit some advance

methods to do the same. Instructors eventually started using available



14/46

14

tools (e.g Unix utilities like diff, cmp, etc) to automate the task of

detecting all possible combinations of plagiarism among a large set o

programs. Use of these tools minimized the time and effort, howeve

plagiarism detection was yet limited to verbatim copies.

1.4 Ways to handle technologyenhanced cheating

Focus on the process of writing - observe and coach thprocess. Require a thesis statement, an initial bibliography, an outlinenotes, a first draft etc.

Avoid "choose any topic" papers. Tie the topic to the goals o

the course.

Use a few papers from "cheat sites" as examples. Provide agrade for these and use as reference material. Students will be hesitant touse a service you know about.

Be clear and comprehensive regarding plagiarism policies. Thmore students know the less likely they will be to attempt plagiarism.

Require students to use material from class lectures

presentations, discussions etc in their graded assignments. This makefinding "matching" papers more difficult.

Require students to conduct an original survey or interview apart of the assignment. The survey or transcripts of the interview arincluded as an appendix.

Require an annotated bibliography as part of the process owriting the assignment. These are difficult to plagiarize.

Require an abstract of the paper where appropriate. Writing aaccurate synopsis of a plagiarized paper is difficult.

Require a description of the research process with the finadraft.

Get to know your students. Require a writing sample during thfirst week of class. Have the students do this in their "best written style



15/46

1

and make it personalized and customized to them individually. Keep thion record for comparison purposes.

Use Plagiarism.org or Plagiarism.com to check submitted wor(links below).

Use MOSS (Measure of Software Similarity) which detectplagiarism in programming classes (link below).

Make assignments relatively difficult. This makes it mordifficult to get casual, though ongoing, help during the semester.

Frequent assessments also make getting help logisticalldifficult.

Use master type questions and case studies rather tha"memorization" questions.

If using online quizzes - give different questions to differenstudents - i.e. use a test bank. Add a short answer question that will begraded by hand.

If using online tests or quizzes limit the amount of time the tesis available.

Use alternate means of assessment, portfolios and multiplmeasures of mastery.

Use proctored exams (only if absolutely necessary).

If you suspect plagiarism, look carefully at the paper and gentlyconfront the student with your concerns. Frequently this is enough to

uncover or deter plagiarism.

Require raw materials of the research process. For examplecopies of the cited works.



16/46

16

1.5 Teaching activities to preventcheating

Quizzes: Create regular, frequent (weekly or daily) quizzes fo

students.

Discussion: Create discussions and use participation in discussionas an aid in measuring student progress.

Request feedback: Randomly e-mail all the students in the clasand request a comment or two on some subject.

Variance analysis: Check the regular quiz scores to see if there i

a sudden change. For example, a student flunks five quizzes and thenhires someone to take the final online exam and gets an A.

Spot calls: If a teacher has any concerns about a particulaindividual, she or he can call the student and have a short discussion. Iwill quickly reveal whether the student knows the course material.

Online chat exams: The instructor can conduct an oral charoom exam with each student to interactively test the studentsknowledge of the course material.



17/46

17

1.6 Additional security techniques

First, many of the same problems regarding the authenticity of astudents work and plagiarism exist in the traditional classroom as well

To get someones help through an entire online program would taksubstantial effort. For most students it is just not possible to havconsistent help through many tests at many different times. Besideswho would consent to putting in so much work for someone else and noget credit for it?

Use a log-in/password system (but of course, a student could jusgive the username and password to someone else).

Make exercises difficult enough so that the person who hasnt donethe previous work in your course will not be able to complete thassignment.

Give many short exams that are embedded in class exercises sothat it would be difficult for a student to have "help" there all the time.

Ask mastery-type questions so that a student must know thmaterial himself/herself in order to answer the question (i.e. case studieVs memorization questions).

Ask students to relate the subject matter to their owpersonal/professional/life experiences so their answers are personalizeand difficult to replicate.

Require students to submit an outline and rough draft of termpapers and essays before the final paper is due. This way, a professocan see the work in progress.



18/46

1

Give different questions to different students construct a large seof questions from which an automated testing program can randomlselect (i.e. a database of 50 questions with 10 randomly chosen).

Limit the times when the online test is available; ensure that th

test is taken in a certain amount of time. Some automated testingprograms allow this feature.

Provide online exam practice sample questions, self-studquestions with answers and feedback, and require a proctored, nononline examination for course credit (i.e. on campus, at a testing centerlibrary, etc.)

Finally, remember that testing should never be the only means by

which you assess the abilities of your students. If they are evaluated witvarious different methods, you have the best way of ensuring that theris real learning taking place. As with a traditional classroom, the besway to assess student and course progress is to know the studenthrough the student's work and pay attention to student feedback.

The American Association of Higher Education has devisenine principles of good practice for assessing student learning. Thescan also be helpful when thinking about how to avoid plagiarism andcheating in online courses. The principles are:

The assessment of student learning begins with educational valuesAssessment is not an end in itself but a vehicle for educationaimprovement. Its effective practice, then, begins with and enacts a visionof the kinds of learning we most value for students and strive to helthem achieve. Educational values should drive not only whatwe choosto assess but also how we do so. Where questions about educationamission and values are skipped over, assessment threatens to be aexercise in measuring what's easy, rather than a process of improvingwhat we really care about.

Assessment is most effective when it reflects an understanding olearning as multidimensional, integrated, and revealed in performanceover time. Learning is a complex process. It entails not only whastudents know but what they can do with what they know; it involves noonly knowledge and abilities but values, attitudes, and habits of mindthat affect both academic success and performance beyond thclassroom. Assessment should reflect these understandings b



19/46

19

employing a diverse array of methods, including those that call for actuaperformance, using them over time so as to reveal change, growth, andincreasing degrees of integration. Such an approach aims for a morecomplete and accurate picture of learning, and therefore firmer bases foimproving our students' educational experience.

Assessment works best when the programs it seeks to improvhave clear, explicitly stated purposes. Assessment is a goal-orienteprocess. It entails comparing educational performance with educationapurposes and expectations -- those derived from the institution'mission, from faculty intentions in program and course design, and fromknowledge of students' own goals. Where program purposes lacspecificity or agreement, assessment as a process pushes a camputoward clarity about where to aim and what standards to applyassessment also prompts attention to where and how program goals wi

be taught and learned. Clear, shared, implementable goals are thcornerstone for assessment that is focused and useful.

Assessment requires attention to outcomes but also to thexperiences that lead to those outcomes.Information about outcomes iof high importance; where students "end up" matters greatly. But timprove outcomes, we need to know about student experience along theway -- about the curricula, teaching, and kind of student effort that leadto particular outcomes. Assessment can help us understand whic

students learn best under what conditions; with such knowledge comethe capacity to improve the whole of their learning.

Assessment works best when it is ongoing not episodic. Assessmenis a process whose power is cumulative. Though isolated, "one-shotassessment can be better than none, improvement is best fostered whenassessment entails a linked series of activities undertaken over time

This may mean tracking the process of individual students, or of cohortof students; it may mean collecting the same examples of studenperformance or using the same instrument semester after semester. The

point is to monitor progress toward intended goals in a spirit ocontinuous improvement. Along the way, the assessment process itseshould be evaluated and refined in light of emerging insights.

Assessment fosters wider improvement when representatives fromacross the educational community are involved. Student learning is campus-wide responsibility, and assessment is a way of enacting tharesponsibility. Thus, while assessment efforts may start small, the aim



20/46

20

over time is to involve people from across the educational communityFaculty members play an especially important role, but assessment'questions can't be fully addressed without participation by studentaffairs educators, librarians, administrators, and students. Assessmenmay also involve individuals from beyond the campus (alumni/aetrustees, employers) whose experience can enrich the sense o

appropriate aims and standards for learning. Thus understoodassessment is not a task for small groups of experts but a collaborativeactivity; its aim is wider, better-informed attention to student learning byall parties with a stake in its improvement.

Assessment makes a difference when it begins with issues of useand illuminates questions that people really care about. Assessmenrecognizes the value of information in the process of improvement. Buto be useful, information must be connected to issues or questions thapeople really care about. This implies assessment approaches tha

produce evidence that relevant parties will find credible, suggestive, andapplicable to decisions that need to be made. It means thinking in

advance about how the information will be used, and by whom. Thepoint of assessment is not to gather data and return "results"; it is aprocess that starts with the questions of decision-makers, that involvethem in the gathering and interpreting of data, and that informs andhelps guide continuous improvement.

Assessment is most likely to lead to improvement when it is part o

a larger set of conditions that promote change. Assessment alonchanges little. Its greatest contribution comes on campuses where thquality of teaching and learning is visibly valued and worked at. On succampuses, the push to improve educational performance is a visible andprimary goal of leadership; improving the quality of undergraduateducation is central to the institution's planning, budgeting, anpersonnel decisions. On such campuses, information about learninoutcomes is seen as an integral part of decision making, and avidlsought.

9.Through assessment, educators meet responsibilities to studentand to the public. There is a compelling public stake in education. Aeducators, we have a responsibility to the public that supports odepends on us to provide information about the ways in which oustudents meet goals and expectations. But that responsibility goebeyond the reporting of such information; our deeper obligation -- toourselves, our students, and society -- is to improve. Those to whom



21/46

2

educators are accountable have a corresponding obligation to supporsuch attempts at improvement.

2. PlatformProcedural programming can sometimes be used a

a synonym for imperative programming (specifying the steps thprogram must take to reach the desired state), but can also refer (as inthis article) to a programming paradigm, derived from structureprogramming, based upon the concept of theprocedure call. Proceduresalso known as routines, subroutines, methods, or functions (not to beconfused with mathematical functions, but similar to those used ifunctional programming) simply contain a series of computational stepto be carried out. Any given procedure might be called at any poinduring a program's execution, including by other procedures or itself. Aprocedural programming language provides a programmer a mean

to define precisely each step in the performance of a task. Thprogrammer knows what is to be accomplished and provides through thelanguage step-by-step instructions on how the task is to be done. Using aprocedural language, the programmer specifies language statements tperform a sequence of algorithmic steps. Procedural programming ioften a better choice than simple sequential or unstructureprogramming in many situations which involve moderate complexity owhich require significant ease of maintainability.

Possible benefits:

The ability to re-use the same code at different placein the program without copying it.An easier way to keep track of program flow than a

collection of "GOTO" or "JUMP" statements (which can turn a largecomplicated program into spaghetti code).

The ability to be strongly modular or structured.Emphasis is on doing things algorithm.Employs top-down approach in program design.



22/46

22

Large programs are divided into smaller programknown as functions.

3. What is Algorithm

In mathematics, computer science, and relatesubjects, an algorithm is an effective method for solving a problemusing a finite sequence of instructions. Algorithms are used focalculation, data processing, and many other fields. Each algorithm is alist of well-defined instructions for completing a task. Starting from ainitial state, the instructions describe a computation that proceedthrough a well-defined series of successive states, eventuallterminating in a final ending state. The transition from one state to th

next is not necessarily deterministic; some algorithms, known arandomized algorithms, incorporate randomness. If you sit down in fronof a computer and try to write a program to solve a problem, you will btrying to do four out of five things at once.

These are:

ANALYSE THE PROBLEM

DESIGN A SOLUTION/PROGRAM

CODE/ENTER THE PROGRAM

TEST THE PROGRAM

5. EVALUATE THE SOLUTION



23/46

23

To begin with we will look at three methods used icreating an algorithm, these are

STEPPING

LOOPING

CHOOSING

3.1 ALGORITHM OF THE CHARACTERMATCHING

STEP 1: Begin

STEP 2: We take two file names in two pointer variable fn1 andfn2

STEP 3: fopen(fn1)

STEP 4: If fn1 not opened then

Print Cannot open first file

Return

Else

Print File is open

STEP 5: c=0

STEP 6: Repeat 6 TO 16 as long as !feof(f1)

STEP 7: str1= fgetc(f1)



24/46

24

STEP 8: f=0

STEP 9: for i=0; i=0) thenFlag=1i=i+1

STEP 15: if(flag==1) thenPrint match

STEP 16: fclose(f1)

STEP 17: fclose(f2)

STEP 18: END



25/46

2

3.2 Algorithm of the string Matching

Step 1: Begin

Step 2: We take two files name in two pointer variable fn1 & fn2

Step 3: fopen (fn1)

Step 4: If fn1 not opened thenPrint cannot open first fileReturnElsePrint file is open

Step 5: Repeat Steps 6 TO 24 as long as! Feof (f1)

Step 6: i =0

Step 7: str1=NULL

Step 8: ch= fgetc (f1)

Step 9: Repeat Steps 10 to 12 as long as ch! = and ch! = \nand ch!= EOF



26/46

26

Step 10: str1[i]= ch

Step 11: ch=fgetc(f1)

Step 12: i=i+1

Step 13: fopen(fn2)

Step 14: if fn2 not open thenPrint cannot open second fileexit

Step 15: repeat steps 16 to 24 as long as! Feof (f2)

Step 16: i=0

Step 17: str2=NULL

Step 18: ch=fgetc (f2)

Step 19: repeat steps 20 to 22 as long as ch= and ch! \nand ch! =EOF

Step 20:str2 [i] =ch

Step 21: ch=fgetc (f2)

Step 22: i=i+1

Step 23: str1=str2 thenPrint match

Step 24: fclose (f1)

Step 25: fclose (f2)

Step 26: END



27/46

27

4. FLOW-CHART

What is Flow-Chart

A flowchart is a pictorial representation of an algorithm. It is the layout, i

a visual, two-dimensional format, of the plan to be followed when the correspondin

algorithm is converted into a program by writing in a programming language. It acts like

roadmap like a programmer and guides him/her on how to go from the starting point to th

final point while converting the algorithm into a computer program.Flow Chart is the pictorial representation of separate steps o

a process.Using Flow-Chart one can easily design, analyze, prepar

documentation or manage a process running in a system

Why we use Flow-Chart

Normally, an algorithm is first represented in the form of aflowchart and the flowchart is then expressed in some programminlanguage to prepare a computer program. The main advantage of thestwo step approach in programming writing is that while drawing flowchart, a programmer is not concern with the details of the elementof programming language. Hence, he/she can fully concentrate on thlogic of the procedure. Moreover, since a flowchart shows the flow ooperations in pictorial form, any error in the logic of the procedure cabe detected more easily than in the case of a program. Once th



28/46

2

flowchart is ready the programmer can forget about the logic and canconcentrate only on coding the operations in each box of the flowchart interms of the statements of the programming language. This will normallyensure an error-free program.

The symbols used in a Flow-Chart are shown below:

SYMBOLNAMEDESCRIPTION

TerminatorTo indicate The start or stop of a Flow-Chart

Input/OutputTo take any input or output In th Flow-Chart

ProcessTo represent a running process in the Flow-Chart

Decision BoxTo make a decision

ConnectorTo connect one part of the flow chart to the other, oto continue flow Chart from one page to another.

4.1 Flow chart of Character Matching


Input fn1, fn2

C = 0

Fopen (fn1)

Start


29/46

29

Yes


Str 1 = fgetc (f1)

f = 0

l = 0

If

match[i

] =STR

1

f = 1

I =i+1

If

i


30/46

30

4.2 PROGRAM OF THE CHARACTERMATCHING


Str2=fgetc (f2)

If (str1 =

str2) and

(str1 >=0)

Flag =1

I = i+1

If? Feof

(f2)

If flag

==1?

Prientf (tMatch = %c: appeared %d times,

str1, i)

Fclose (f2)

If!

Feof

(f1)

Prientf (End of the program..);

Fclose (f1)

4

Stop

Yes

Yes

No

YesNo

No


31/46

3

#include #include #include void main()

{ FILE *f1,*f2;char ch,*fn1,*fn2,str1,str2,*match;int i,len,flag,c,f;clrscr();printf(\n\t Enter 1st file name with extension: );gets(fn1);if((f1=fopen(fn1,r))==NULL){

printf(Cannot open first file.\n);getch();

return;}else

printf(%s File is opend,fn1);fflush(stdin);printf(\n\tEnter 2nd file name with extension: );gets(fn2);c=0;

while (!feof (f1)){

Str1=fgetc(f1);F=0;For(i=0;i


32/46

32

getch();Return;

}if(f==0){Flag=0;

i=0;While(!feof(f2));{

str2=fgetc(f2);if(str1==str2 && str1>=0)

{flag=1;i++;

}

}if(flag==1)

{printf(\n\n\tMatch = %c; appeared %d times,str1,i);fclose(f2);

}}printf(\n\n\t\tEnd of the program.);fclose(f1);

fflush(stdin);getch();}

4.3 Flow chart of the string Matching


Start

Input fn1, fn2

C = 0

Fopen (fn1)


33/46

33

Yes


i = o: str1= NULL: ch= fgetc = (f1)

=o

Str1 [i] = ch

i= i+1

ch = fgetc (f1)

If ch! =

NULL &&

ch! =\n &&

ch! = EOF?

Fopen (f2)

i = o: str1= NULL: ch= fgetc = (f2)

=o

Step2[i] =ch : ch=fgetc(f2): I++

=o If ch! = NULL&& ch! =\n

&& ch! = EOF?

2

1

If

STR

=st2?

1

Printf ("\n\n\match = %s", str2);

Yes

No


34/46

34

4.4 Program of the String Matching

#include#include#include

void main()


! Feof

(f2) ?

Fclose (f2)

! Feof

(f1)?2

3

Printf (End of the program..)

Fclose (f1)

Stop

Yes

No

No

Yes


35/46

3

{FILE *f1, *f2;char ch,*fn1,*fn2,*str1,*str2;int i,len;clrscr();printf("\n\tEnter 1st file name with extension : ");

i=0;gets(fn1);if ((f1 = fopen(fn1, "r")) == NULL)

{printf("Cannot open first file.\n");getch();return ;

}else

printf("%s file is opened",fn1);fflush(stdin);printf("\n\tEnter 2nd file name with extension : ");i=0;gets(fn2);while (!feof(f1)){i=0;str1="";

ch=fgetc(f1);while(ch!=' ' && ch!='\n' && ch!=EOF){str1[i]=ch;ch=fgetc(f1);i++;

}str1[i]='\0';printf("\nfile 1 string : %s",str1);

if ((f2=fopen(fn2, "r+")) == NULL){

printf("Cannot open second file.\n");getch();return ;

}while (!feof(f2)){



36/46

36

i=0;str2="";ch=fgetc(f2);while(ch!=' ' && ch!='\n' && ch!=EOF){

str2[i]=ch;

ch=fgetc(f2);i++;

}str2[i]='\0';printf("\nFile 2 string : %s ",str2);if(strcmp(str1,str2)==0)printf("\n\n\tMatch = %s",str2);

}fclose(f2);

}printf("\n\n\t\tEnd of the program....");fclose(f1);fflush(stdin);getch();

}

5.1 Necessary hardware

The project is designed so that it is compatible with server basmachine with operating system Windows (XP, Windows server 2000)Moreover the project is being computerized, as because computerizedsystem are worth-mentioning.

The hardware requirements for the project are follows: -

Motherboard-Intel OriginalProcessor-core 2 QuadOperating System- Windows server 2000RAM-DDR3 4GB 800 MHzHDD 1 TB



37/46

37

5.2 Necessary software

Operating System Windows XP

Software - Turbo C++: full installation minimum 5MB

Compiler.

6. AdvantagesMoss (for a Measurement Of Software Similarity) is an automati

system for determining the similarity of programrs.The system allows for a variety of more complicated situations

For example, it allows for a base file. The base file might be a programoutline or partial solution handed out by the instructor.

MOSS makes it easy to examine the corresponding portions of aprogram pair. Clicking on a program pair in the results summary bringup side-by-side frames containing the program sources.

MOSS just as easily uncovers more sophisticated attempts acheating. Multiple distinct similar sections separated by sections witdifferences are still found and given color-coded highlighting.

Traditional attempts to detect plagiarism have been ad-hoctypically involving manual checking, of programming assignments, foplagiarism.

There was strongly a need for more sophisticated mechanismwhich would automate the task to a large extent as well as detect fairlycomplex attempts of copies.



38/46

3

7. FUTURE Scope

The future scope of the project is that it enables to detecseveral software which are very much similar to each other. Hence i

helps to increase the efficiency of the project. Hence it helps to preventhe duplicacy of any software. Analogy software estimation is based oassumption. Similar software projects have similar software effort. Buincomplete and noisy data, measurement and similarity assessmenuncertainty, complex interaction between attributes, data type ordinaand nominal scale.

8.Problems

Two projects that may seem similar may indeed be different ia critical way. The uncertainty in assessing similarities and differencemeans that two different estimators could significantly develop differenviews and effort estimates.

The uncertainty stem form:Data collection tool.

The type of information available.Attribute measurement.Skill of estimator.



39/46

39

8. REFERENCE

Help with Cheating

Plagiarism.org includes software to detect plagiarism and allowa free trial. http://www.plagiarism.org/

Plagiarism.com is more plagiarism software, also has a sedetection test (http://www.plagiarism.com/self.detect.htm) to hel

students spot plagiarism in their work. http://www.plagiarism.com/

Plagiarism Webliography for Faculty

An extensive list of the websites, resources and detection toolshttp://www.utpb.edu/library/plagiarism.htmlMOSS (Measure of Software Similarity) Detects plagiarism i

programming classes http://theory.stanford.edu/~aiken/moss/

2. Word Check Systems "checks keyword uses and keywordfrequencies in electronic documents and presents a "percentage omatch" between compared data.http://www.wordchecksystems.com/

Cheat Sites

1.Direct Essays: http://directessays.com/

http://www.plagiarism.org/http://www.plagiarism.com/http://www.wordchecksystems.com/http://www.plagiarism.org/http://www.plagiarism.com/http://www.wordchecksystems.com/


40/46

40

A 1 Term Paper: http://www.a1-termpaper.com/

Fast Papers: http://www.fastpapers.com/

Student Network Resources: http://www.snrinfo.com/

Schoolsucks: http://www.schoolsucks.com/

Cheathouse: http://www.cheathouse.com/

EZwrite: http://www.ezwrite.com/

Term Papers on File: http://www.termpapers-on-file.com/

Research Assistance: http://www.research-assistance.com/^ J. MacQueen, 1967^ Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, and Susa

Brown, "FGKA: A Fast Genetic K-means Algorithm", in Proc. of the 19tACM Symposium on Applied Computing, pp. 162-163, Nicosia, CyprusMarch, 2004.

^ Yi Lu, Shiyong Lu, Farshad Fotouhi, Youping Deng, and SusaBrown, "Incremental Genetic K-means Algorithm and its Application i

Gene Expression Data Analysis", BMC Bioinformatics, 5(172), 2004.^ Bezdek, James C. (1981), Pattern Recognition with FuzzyObjective Function Algorithms, ISBN0306406713

^Google News personalization: scalable online collaborativfiltering

^ Basak S.C., Magnuson V.R., Niemi C.J., Regal R.R. "DetermingStructural Similarity of Chemicals Using Graph Theoretic Indices". Discr

Appl. Math., 19, 1988: 17-44.^ E. B. Fowlkes & C. L. Mallows (September 1983). "A Method

for Comparing Two Hierarchical Clusterings".Journal of the American

Statistical Association78(384): 553584. doi:10.2307/2288117.^ Alexander Kraskov, Harald Stgbauer, Ralph G. Andrzejak

and Peter Grassberger, "Hierarchical Clustering Based on MutuaInformation", (2003)ArXiv q-bio/0311039

^ David J. Marchette, Random Graphs for Statistical PatternRecognition, Wiley-Interscience, 2004.

^ Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-sheng ChenOner Ulvi Celepcikay, Christian Giusti, and Christoph F. Eick, "MOSAIC: A

http://www.a1-termpaper.com/http://www.ezwrite.com/http://www.termpapers-on-file.com/http://www.research-assistance.com/http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-0%23cite_ref-0http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-1%23cite_ref-1http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-2%23cite_ref-2http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-Bezdek1981_3-0%23cite_ref-Bezdek1981_3-0http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0306406713http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-4%23cite_ref-4http://www2007.org/program/paper.php?id=570http://www2007.org/program/paper.php?id=570http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-5%23cite_ref-5http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-6%23cite_ref-6http://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.2307%2F2288117http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-7%23cite_ref-7http://arxiv.org/abs/q-bio/0311039http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-8%23cite_ref-8http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-9%23cite_ref-9http://www.a1-termpaper.com/http://www.ezwrite.com/http://www.termpapers-on-file.com/http://www.research-assistance.com/http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-0%23cite_ref-0http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-1%23cite_ref-1http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-2%23cite_ref-2http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-Bezdek1981_3-0%23cite_ref-Bezdek1981_3-0http://en.wikipedia.org/wiki/International_Standard_Book_Numberhttp://en.wikipedia.org/wiki/Special:BookSources/0306406713http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-4%23cite_ref-4http://www2007.org/program/paper.php?id=570http://www2007.org/program/paper.php?id=570http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-5%23cite_ref-5http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-6%23cite_ref-6http://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Journal_of_the_American_Statistical_Associationhttp://en.wikipedia.org/wiki/Digital_object_identifierhttp://dx.doi.org/10.2307%2F2288117http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-7%23cite_ref-7http://arxiv.org/abs/q-bio/0311039http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-8%23cite_ref-8http://en.wikipedia.org/wiki/Cluster_analysis#cite_ref-9%23cite_ref-9


41/46

4

proximity graph approach for agglomerative clustering," Proceedings 9thInternational Conference on Data Warehousing and KnowledgeDiscovery (DaWaK), Regensbug Germany, September 2007.

.Clatworthy, J., Buick, D., Hankins, M., Weinman, J., & Horne, R(2005). The use and reporting of cluster analysis in health psychology: Areview. British Journal of Health Psychology10: 329-358.

Cole, A. J. & Wishart, D. (1970). An improved algorithm for thJardine-Sibson method of generating overlapping clusters. The ComputeJournal 13(2):156-163.

Ester, M., Kriegel, H.P., Sander, J., and Xu, X. 1996. A densitybased algorithm for discovering clusters in large spatial databases witnoise. Proceedings of the 2nd International Conference on KnowledgDiscovery and Data Mining, Portland, Oregon, USA: AAAI Press, pp. 226231.

Heyer, L.J., Kruglyak, S. and Yooseph, S., Exploring Expressio

Data: Identification and Analysis of Coexpressed Genes, GenomeResearch 9:1106-1115.

S. Kotsiantis, P. Pintelas, Recent Advances in Clustering: A BrieSurvey, WSEAS Transactions on Information Science and ApplicationsVol 1, No 1 (73-81), 2004.

Huang, Z. (1998). Extensions to the K-means Algorithm foClustering Large Datasets with Categorical Values. Data Mining andKnowledge Discovery, 2, p. 283-304.

Wong, W., Liu, W. & Bennamoun, M. Tree-Traversing An

Algorithm for Term Clustering based on Featureless Similarities. In: DatMining and Knowledge Discovery, Volume 15, Issue 3, Pages 349381. doi: 10.1007/s10618-007-0073-y. A demo of this term clusteringalgorithm is available here

Jardine, N. & Sibson, R. (1968). The construction of hierarchiand non-hierarchic classifications. The Computer Journal 11:177.

The on-line textbook: Information Theory, Inference, anLearning Algorithms, by David J.C. MacKay includes chapters on k-meanclustering, soft k-means clustering, and derivations including the E-Malgorithm and the variational view of the E-M algorithm.

MacQueen, J. B. (1967). Some Methods for classification anAnalysis of Multivariate Observations, Proceedings of 5-th BerkeleSymposium on Mathematical Statistics and Probability, BerkeleyUniversity of California Press, 1:281-297

Ng, R.T. and Han, J. 1994. Efficient and effective clusterinmethods for spatial data mining. Proceedings of the 20th VLDConference, Santiago, Chile, pp. 144155.

http://dx.doi.org/10.1007/s10618-007-0073-yhttp://explorer.csse.uwa.edu.au/research/algorithm_tta.plhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://dx.doi.org/10.1007/s10618-007-0073-yhttp://explorer.csse.uwa.edu.au/research/algorithm_tta.plhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKay


42/46

42

Prinzie A., D. Van den Poel (2006), Incorporating sequentiainformation into traditional classification models by using aelement/position-sensitive SAM. Decision Support Systems 42 (2): 508526.

Rivera, C. G., Vakil, R. M. & Bader, J. S. NeMo: Network Moduleidentification in Cytoscape. BMC Bioinformatics 2010, 11(Supp

1):S61.doi: 10.1186/1471-2105-11-S1-S61. The plugin can bdownloaded in Cytoscape or here.

Romesburg, H. Clarles, Cluster Analysis for Researchers, 2004340 pp. ISBN 1-4116-0617-5, reprint of 1990 edition published by KriegePub. Co... A Japanese language translation is available from UchidRokakuho Publishing Co., Ltd., Tokyo, Japan.

Sheppard, A. G. (1996). The sequence of factor analysis ancluster analysis: Differences in segmentation and dimensionality throughthe use of raw and factor scores. Tourism Analysis, 1(Inaugural Volume)

49-57.Sergios Theodoridis, Konstantinos Koutroumbas (2009) "Patter

Recognition" , 4th Edition, Academic Press, ISBN: 978-1-59749-272-0.Zhang, T., Ramakrishnan, R., and Livny, M. 1996. BIRCH: A

efficient data clustering method for very large databases. Proceedings oACM SIGMOD Conference, Montreal, Canada, pp. 103114.

Nguyen Xuan Vinh, Epps, J. and Bailey, J., 'Information TheoretiMeasures for Clusterings Comparison: Is a Correction for ChancNecessary?', in Procs. the 26th International Conference on Machin

Learning (ICML'09).Jianbo Shi and Jitendra Malik, "Normalized Cuts and ImagSegmentation", IEEE Transactions on Pattern Analysis and MachinIntelligence, 22(8), 888-905, August 2000. Available onJitendra Malik'homepage

Marina Meila and Jianbo Shi, "Learning Segmentation witRandom Walk", Neural Information Processing Systems, NIPS, 2001Available fromJianbo Shi's homepage

see referenced articles atLuigidragone.comKernel MDL to Determine the Number of Clusters,MLDM, pp

203-217, 2007.Stan Salvador and Philip Chan, Determining the Number o

Clusters/Segments in Hierarchical Clustering/Segmentation AlgorithmsProc. 16th IEEE Intl. Conf. on Tools with AI, pp. 576584, 2004.

Can, F., Ozkarahan, E. A. (1990) "Concepts and effectiveness othe cover coefficient-based clustering methodology for text databases.ACM Transactions on Database Systems. 15 (4) 483-517

Aldenderfer, M.S., Blashfield, R.K, Cluster Analysis, (1984), Newbury Par(CA): Sage.

http://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://www.biomedcentral.com/1471-2105/11/S1/S61/abstracthttp://baderlab.bme.jhu.edu/baderlab/index.php/NeMohttp://en.wikipedia.org/wiki/Special:BookSources/1411606175http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cis.upenn.edu/~jshi/jshi_publication.htmhttp://www.luigidragone.com/datamining/spectral-clustering.html#referenceshttp://www.tsi.enst.fr/~kyrgyzov/publications.htmlhttp://www.springerlink.com/content/j646uqx4p435j530/http://www.springerlink.com/content/j646uqx4p435j530/http://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://econpapers.repec.org/paper/rugrugwps/05_2F292.htmhttp://www.biomedcentral.com/1471-2105/11/S1/S61/abstracthttp://baderlab.bme.jhu.edu/baderlab/index.php/NeMohttp://en.wikipedia.org/wiki/Special:BookSources/1411606175http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Krieger_Pub._Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Uchida_Rokakuho_Publishing_Co.&action=edit&redlink=1http://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cs.berkeley.edu/~malik/malik-pubs-ptrs.htmlhttp://www.cis.upenn.edu/~jshi/jshi_publication.htmhttp://www.luigidragone.com/datamining/spectral-clustering.html#referenceshttp://www.tsi.enst.fr/~kyrgyzov/publications.htmlhttp://www.springerlink.com/content/j646uqx4p435j530/http://www.springerlink.com/content/j646uqx4p435j530/http://cs.fit.edu/~pkc/papers/ictai04salvador.pdfhttp://cs.fit.edu/~pkc/papers/ictai04salvador.pdf


43/46

43

[edit]External links

This article's use ofexternal links may not follow Wikipedia'spolicies orguidelines

Pleaseimprove this article by removing excessive and inappropriate external links or by converting linkintofootnote references. (May 2009)Citeseerx.ist.psu.edu, When Is "Nearest Neighbor" Meaningful?

P. Berkhin, Citeseer.ist.psu.edu Survey of Clustering Data Mining Techniques, Accru

Software, 2002.Jain, Murty and Flynn: Citeseer.ist.psu.edu Data Clustering: A Review, ACM Comp. Surv

1999.for another presentation of hierarchical, k-means and fuzzy c-means see the introduction t

clustering on home.dei.polimi.it. It also has an explanation on mixture ofGaussians.David Dowe, csse.monash.edu.au, Mixture Modelling page - other clustering and mixtur

model links.

Gauss.nmsu.edu, A tutorial on clustering.Inference.phy.cam.ac.uk, The on-line textbook: Information Theory, Inference, an

Learning Algorithms, byDavid J.C. MacKay includes chapters on k-means clustering, soft k-mean

clustering, and derivations including the E-M algorithm and the variational view of the E-M algorithm.People.revoledu.com, Numerical example of Hierarchical Clustering.

Cran.r-project.org, kernlab - R package for kernel based machine learning (include

spectral clustering implementation)Home.dei.polimi.it - Tutorial with introduction of Clustering Algorithms (k-means, fuzzy

c-means, hierarchical, mixture of gaussians) + some interactive demos (java applets)

Data Mining Software at the Open Directory Project

Machine Learning Software at the Open Directory ProjectHomepages.feis.herts.ac.ukJava, Competitive Learning Application, a suite o

Unsupervised Neural Networks for clustering. Written in Java. Complete with all source code.

Factominer.free.fr, FactoMineR (free exploratory multivariate data analysis software linketo R)

AI4r.rubyforge.org, Data clustering algorithms implementation in Ruby (AI4R)

PMML Representation - Standard way to represent clustering models.

1C. Alexander. Notes on the Synthesis of Form. Harvard U. Press, 1964. 2Edward B. Allen , Sampat

Gottipati , Rajiv Govindarajan, Measuring size, complexity, and coupling of hypergraph abstractions osoftware: An information-theory approach, Software Quality Control, v.15 n.2, p.179-212, Jun

2007 [doi>10.1007/s11219-006-9010-3] 3Tom Arbuckle, Visually Summarising Software ChangProceedings of the 2008 12th International Conference Information Visualisation, p.559-568, July 09-11

2008 [doi>10.1109/IV.2008.58] 4T. Arbuckle, A. Balaban, D. K. Peters, and M. Lawford. Softwar

documents: Comparison and measurement. In Proc. 18th Int. Conf. on Software Eng.&Knowledge Engpages 740--745, July 2007. 5C. H. Bennett, P. Gcs, M. Li, P. Vitnyi, and W. H. Zurek. Informatio

distance. IEEE Trans. Information Theory, 44(4):1407--1423, 1998. 6M. Cebrin, M. Alfonseca, and A

Ortega. Common pitfalls using the normalized compression distance: What to watch out for in compressor. Comms. Info. Sys., 5(4):367--384, 2005. 7Gregory J. Chaitin, On the Length of Program

http://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edit&section=24http://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edithttp://en.wikipedia.org/wiki/Wikipedia:Citing_sourceshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422http://citeseer.ist.psu.edu/berkhin02survey.htmlhttp://citeseer.ist.psu.edu/jain99data.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.htmlhttp://en.wikipedia.org/wiki/Normal_distributionhttp://en.wikipedia.org/wiki/Normal_distributionhttp://www.csse.monash.edu.au/~dld/cluster.htmlhttp://gauss.nmsu.edu/~lludeman/video/ch6pr.htmlhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://en.wikipedia.org/wiki/David_J.C._MacKayhttp://people.revoledu.com/kardi/tutorial/Clustering/index.htmlhttp://cran.r-project.org/web/packages/kernlab/index.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/http://www.dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://www.dmoz.org/Artificial_Intelligence/Machine_Learning/Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://homepages.feis.herts.ac.uk/~nngroup/software.phphttp://factominer.free.fr/http://en.wikipedia.org/wiki/R_programming_languagehttp://ai4r.rubyforge.org/index.htmlhttp://www.dmg.org/v4-0/ClusteringModel.htmlhttp://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1007/s11219-006-9010-3http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/IV.2008.58http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edit&section=24http://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not#Wikipedia_is_not_a_mirror_or_a_repository_of_links.2C_images.2C_or_media_fileshttp://en.wikipedia.org/wiki/Wikipedia:External_linkshttp://en.wikipedia.org/w/index.php?title=Cluster_analysis&action=edithttp://en.wikipedia.org/wiki/Wikipedia:Citing_sourceshttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422http://citeseer.ist.psu.edu/berkhin02survey.htmlhttp://citeseer.ist.psu.edu/jain99data.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/index.htmlhttp://en.wikipedia.org/wiki/Normal_distributionhttp://www.csse.monash.edu.au/~dld/cluster.htmlhttp://gauss.nmsu.edu/~lludeman/video/ch6pr.htmlhttp://www.inference.phy.cam.ac.uk/mackay/itila/http://en.wikipedia.org/wiki/David_J.C._MacKayhttp://people.revoledu.com/kardi/tutorial/Clustering/index.htmlhttp://cran.r-project.org/web/packages/kernlab/index.htmlhttp://home.dei.polimi.it/matteucc/Clustering/tutorial_html/http://www.dmoz.org/Computers/Software/Databases/Data_Mining/Public_Domain_Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://www.dmoz.org/Artificial_Intelligence/Machine_Learning/Software/http://en.wikipedia.org/wiki/Open_Directory_Projecthttp://homepages.feis.herts.ac.uk/~nngroup/software.phphttp://factominer.free.fr/http://en.wikipedia.org/wiki/R_programming_languagehttp://ai4r.rubyforge.org/index.htmlhttp://www.dmg.org/v4-0/ClusteringModel.htmlhttp://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1232687&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1007/s11219-006-9010-3http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1440202&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/IV.2008.58http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483


44/46

44

for Computing Finite Binary Sequences: statistical considerations, Journal of the ACM (JACM), v.16 n.

p.145-159, Jan. 1969 [doi>10.1145/321495.321506] 8Robert Noyes Chanon, On a measure o

program structure., Carnegie Mellon University, Pittsburgh, PA, 1974 9R. N. Chanon, On a measure oprogram structure, Programming Symposium, Proceedings Colloque sur la Programmation, p.9-16, Apr

09-11, 1974 10S. R. Chidamber , C. F. Kemerer, A Metrics Suite for Object Oriented Design, IEE

Transactions on Software Engineering, v.20 n.6, p.476-493, June 1994 [doi>10.1109/32.295895] 11RCilibrasi. The CompLearn Toolkit. {Online} http://complearn.sourceforge.net/, 2003. 12R. Cilibrasi and P

Vitnyi. Clustering by compression. IEEE Trans. Information Theory, 51(4):1523--1545, Apr2005. 13David Clark , Sebastian Hunt , Pasquale Malacaria, Quantitative Information Flow, Relation

and Polymorphic Types, Journal of Logic and Computation, v.15 n.2, p.181-199, Apr2005 [doi>10.1093/logcom/exi009] 14Norman Fenton, When a software measure is not a measur

Software Engineering Journal, v.7 n.5, p.357-362, Sept., 1992 15Maurice H. Halstead, Elements o

Software Science (Operating and programming systems series), Elsevier Science Inc., New York, NY1977 16Mark Harman, The Current State and Future of Search Based Software Engineering, 200

Future of Software Engineering, p.342-357, May 23-25, 2007 [doi>10.1109/FOSE.2007.29] 17Mar

Harman, Search Based Software Engineering for Program Comprehension, Proceedings of the 15th IEEInternational Conference on Program Comprehension, p.3-13, June 26-2

2007 [doi>10.1109/ICPC.2007.35] 18L. Hellerman, A Measure of Computational Work, IEE

Transactions on Computers, v.21 n.5, p.439-446, May 1972 [doi>10.1109/T-C.1972.223539] 19MJackson. The Name and Nature of Software Engineering, pages 1--38. LNCS. 2008. 20Dennis KafurA survey of software metrics, Proceedings of the 1985 ACM annual conference on The range of computin

: mid-80's perspective: mid-80's perspective, p.502-506, October 1985, Denver, Colorado, Unite

States [doi>10.1145/320435.320583] 21Huzefa Kagdi , Michael L. Collard , Jonathan I. Maletic, Asurvey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice, v.19 n.2, p.77-131, Marc

2007 [doi>10.1002/smr.344] 22T. M. Khoshgoftaar and E. B. Allen. Applications of informatiotheory to software engineering measurement. Software Quality Journal, 3(2):79--103, June 1994. 23A. N

Kolmogorov. Three approaches to the quantitative definition of information. Probl. Inform. Trans., 1(1):1

7, 1965. 24G. Kroah-Hartman and K. Sievers. udev. {Online} http://www.kernel.org/, 2003. 25M. Li, X

Chen, X. Li, B. Ma, and P. Vitnyi. The similarity metric. IEEE Trans. Information Theory, 50(12):32503264, 2004. 26Rudi Lutz, Evolving good hierarchical decompositions of complex systems, Journal o

Systems Architecture: the EUROMICRO Journal, v.47 n.7, p.613-634, July 2001 [doi>10.1016/S1383

7621(01)00019-4] 27Thomas J. McCabe, A complexity measure, Proceedings of the 2nd internationconference on Software engineering, p.407, October 13-15, 1976, San Francisco, California, Unite

States 28Stephen McCamant , Michael D. Ernst, Quantitative information flow as network flow

capacity, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design animplementation, June 07-13, 2008, Tucson, AZ, USA [doi>10.1145/1375581.1375606] 29Stacy

Prowell , Jesse H. Poore, Foundations of Sequence-Based Software Specification, IEEE Transactions o

Software Engineering, v.29 n.5, p.417-429, May 2003 [doi>10.1109/TSE.2003.1199071] 30C. EShannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379--423 an

623--656, 1948. 31H. A. Simon and A. Ando. Aggregation of variables in dynamic systems. Econometrica29:111--138, 1961. 32R. J. Solomonoff. A formal theory of inductive inference. part I and part I

Information and Control, 7(1 and 2):1--22 and 224--254, 1964. 33M. H. van Emden. Hierarchicdecomposition of complexity. Machine Intelligence, 5:361--380, 1969. 34M. H. van Emden. An Analys

of Complexity. PhD thesis, Mathematisches Zentrum, Amsterdam, 1971. 35Horst Zuse, Softwar

Complexity: Measures and Methods, Walter de Gruyter & Co., Hawthorne, NJ, 1990

http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/321495.321506http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/32.295895http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1093/logcom/exi009http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/FOSE.2007.29http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/ICPC.2007.35http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/T-C.1972.223539http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/320435.320583http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1002/smr.344http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/1375581.1375606http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/TSE.2003.1199071http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=321506&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/321495.321506http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=906949&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=721517&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=631131&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/32.295895http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1094516&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1093/logcom/exi009http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=146592&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=540137&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1254729&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/FOSE.2007.29http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1271341&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/ICPC.2007.35http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1310458&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/T-C.1972.223539http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=320583&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/320435.320583http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1345057&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1002/smr.344http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=543292&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://dx.doi.org/10.1016/S1383-7621(01)00019-4http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=807712&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=1375606&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://doi.acm.org/10.1145/1375581.1375606http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=776809&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://dx.doi.org/10.1109/TSE.2003.1199071http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483http://portal.acm.org/citation.cfm?id=533784&dl=GUIDE&coll=GUIDE&CFID=85807985&CFTOKEN=28673483


45/46

4

Conclusion

Our project namely is developed on C programme code. At first we are developing a

programme based on String Similarity, that checks two strings in two separate array, if they are similar onot. When this is done successfully, the same checking is done on two separate files using C programm

code.



46/46

46

measurement of software similarity

Documents