student performance prediction using svm · 2017-11-28 · srinivasa ramanujan center, kumbakonam,...
TRANSCRIPT
http://www.iaeme.com/IJMET/index.asp 649 [email protected]
International Journal of Mechanical Engineering and Technology (IJMET)
Volume 8, Issue 11, November 2017, pp. 649–662, Article ID: IJMET_08_11_066
Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11
ISSN Print: 0976-6340 and ISSN Online: 0976-6359
© IAEME Publication Scopus Indexed
STUDENT PERFORMANCE PREDICTION
USING SVM
K. B. Eashwar, R. Venkatesan
Assistant Professor, CSE, SASTRA University,
Srinivasa Ramanujan Center, Kumbakonam, India
D. Ganesh
Final year student M.Sc Computer Science,
ABSTRACT
Education Mining is one of the essential and inevitable processes that were
conducted in the vast environment of education. Initially, the methods of data mining
were applied in this area with limited number of parameters, because of the poor
maintenance of student records in the respective institutions. After the sudden leap of
the technology in various aspects of human life, this field’s dimension has been
changed. Institutions have been equipped highly efficient technical components to
maintain the data. Now-a-days the amounts of data that are stored in Educational
Data Base (EDB) are increasing rapidly. At any time, the following problems are
faced by institutions: poor performance of the student, leaving the programme in-
between due to the complexity of the curriculum, financial problems, psychological
problems, lack of support from the parent side, etc,. But in this work, the
concentration is focused on Post-Graduation students. Because, now-a-days the
research status at PG level is low compared to other parts of the world. To increase
the standard, as well as, to identify the selected individual’s state, whether he/she is to
be given additional care or to sharpen his/her research abilities. According to the
report of MHRD of India, among the Asian countries the contribution of India is 25%,
which is low compared to other parts the world. Also, in India, the percent of students
moving from Master Degree to Ph. D level is 0.3%, which is highly low. The
prediction process has been concentrated on the student’s performance with various
parameters and the students were categorized as high, medium, and low. For this
process, we combined two techniques: (i) for classification – Support Vector Machine
(SVM), and (ii) for clustering – K-means algorithm.
Keywords: EDB, SVM, k-Means, Classification, Clustering, MMH, Hyperplane
Cite this Article: K. B. Eashwar, R. Venkatesan and D. Ganesh, Student Performance
Prediction Using Svm, International Journal of Mechanical Engineering and
Technology 8(11), 2017, pp. 649–662.
http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=8&IType=11
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 650 [email protected]
1. INTRODUCTION
Data mining assumes a vital part in the business world and it is additionally being utilized as a
part of the academics to foresee and make choices identified with the students' proficiency
status. In modern days, the measure of information that are put away in Educational Database
(EDB) is expanding quickly. A student‟s scholastic execution is impacted by many
variables/elements. It is fundamental to create prescient information digging model for
foreseeing students' execution, in order to distinguish the contrast between high learners and
moderate-level learners among the group of students. The information sources are gathered
from pupils, with the goal that the data are organized, and experimented on the right direction.
Based on the results the student details were classified. K-means Clustering algorithm has
been used (based on academic records). As a result, we will have n-number of student
records, on which are Support Vector Classification (SVM) was applied and the prediction
model was constructed.
2. BACKGROUND STUDY
Overlade et al., K-means clustering algorithm is a simple and efficient technique to monitor
the progression of a student in his/her performance state higher education. Based on the
student‟s score they are grouped into various clusters (using k-means, fuzzy c-means etc),
where each clusters denoting the different level of performance (High, Moderate, Low). By
knowing the number of students in each cluster we are able to know the average performance
of a class as a whole[1]. According to Carlos Villagrá-Arnedo et al., the system used Random
Forests technique to model data coming from previous years of Standard Admission Test
(SAT). While their results are good, the method designed was neither thought to be
maintained over time, nor to do progressive predictions based on incremental information [2].
Dr. Saurabh Pal et al., suggested that the Naïve Bayes Classifier technique is particularly
suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can
often out perform more sophisticated classification methods. Naïve Bayes model identifies the
characteristics of dropout students. It shows the probability of each input attribute for the
predictable state. A Naive Bayesian classifier is a simple probabilistic classifier based on
applying Bayesian theorem (from Bayesian statistics) with strong (naive) independence
assumptions [3]. But for dealing with continuous-valued attributes we need to do Gaussian
distribution, which in turn increases the complexity of the task. Also these continuous-valued
attributes may lead to inaccuracy in assumptions.
M. Ramaswami and R. Bhaskaran et al., proposed their main objective of feature selection
as, to choose a subset of input variables by eliminating features, which are irrelevant or not
possessing predictive information [4]. Guodong Zhao et al., expressed their interest in various
feature selection methods, which are roughly divided into three types: (i) embedded, (ii)
wrapper and (iii) filter methods. Embedded and wrapper methods are classifier-dependent,
which evaluate the features using a learning algorithm [5].
Mrinal Pandey et al., offered filtering methods: The filter methods are generally sub
divided into two classes: Ranking and Subset selection. The methodology adopted for this
research starts with the data collection followed by initial pre-processing, attribute selection
and balancing the class [6]. Zhiyun Ren et al., focused on two methods to predict student
performance, namely, Regression- based Methods and Matrix factorization-based Methods.
These two methods use student-academic-records (especially course grades) [7].
Chih-Wei Hsu et al., proposed SVM (Support Vector Machines) which is a very useful
technique for data classification. Although SVM is considered easier to use, than Neural
Networks, users not familiar with it often get unsatisfactory results at first. SVMs (Support
Vector Machines) are a useful technique for data classification [8]. J.K. Jothi Kalpana et al.,
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 651 [email protected]
compelled that main objective of higher education institutes should be, to provide quality
education to its students and to improve the quality of managerial decision-making skill of a
student. One way to achieve the highest level of quality in higher education system is, by
discovering knowledge from educational data to study the main attributes that may affect the
students „performance. By utilizing these methods, numerous types of learning can be found,
for example, K-means and Gaussian. Mikko Vinni et al., suggests that the problem contains
many uncertain or unknown factors that we cannot classify the students deterministically into
two mutual classes. Rather, we should use additional class values (e.g. the mastering level is
good, average, or poor) or estimate the class probabilities. The student profile contains too
many attributes for building accurate classifiers, and we should select only the most
influencing factors for predicting purposes [10].
Pooja Thakar et al., In this paper the authors used to generalize the data mining methods.
Data mining methods such as preprocessing, feature selection, clustering, classification can be
combined. So, finally it may be easy to show the predicted results via graphs and in consistent
manner [12]. P. Usha et al., this paper deals with Support Vector Machines, Decision Tree,
feature extraction and selection, genetic algorithm. These are the major methods that are used
in predicting the student performance using multiple classifiers. So, the main objective of the
work is to achieve the higher level of accuracy in predicting the student(s)‟ performance [13].
Hector M. Romero Ugalde et al., - Neural networks are suitable for modeling complex
nonlinear systems when we consider the plant as a black-box. If low order reference models
are used, the number of parameters to be computed will be rather small to address this
problem many works are to be done to derive balanced accuracy, complexity and
computational cost models [15].
2.1. Scope of related work
This paper is intended to concentrate on Post-Graduation students‟ performance level. In
order to expose the real situation at higher studies, the data have been collected only from
those set of students.
3. METHOD OF PREDICTION
The proposed method for predicting the performance of the student has been done in two
phases, namely (i) training phase, and (ii) testing phase. At the initial stage of the work,
certain preliminary tasks have been performed related to the work. The dataset was collected
from the students both at juniors and senior levels of Post-Graduation programmes.
Questionnaires were prepared under the details that are part of the profile of each student and
with selected variables such as consolidated internal mark in each subject, CGPA (Cumulative
Grade Point Average), Father‟s financial income, mother‟s income (in the absence of father,
at any case), whether the student has opted this programme under interest or stress, sports
activity, living environment, family support for his/her studies, is he/she working as part-time
student, college/university environment, resources for the programme, regarding friends
circle, showing interest in academic related activities, etc.
This collected information about the students were uploaded. Since, always there exist a
possibility of samples that are sometimes don‟t satisfy the maximum number of criteria of
values with their respective variables or sometimes they may have irrelevant values that are
not suitable to present conditions of the work. Those data samples are considered as irrelevant
and should be omitted always for the prediction of student performance for a whole class. So
pre-processing of data samples was done on them.
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 652 [email protected]
3.1. Preprocessing
Data pre-processing is an important step in the data mining process. If there is much irrelevant
and redundant information exist or noise and unreliable data, exist in the data samples, cause
the knowledge discovery at the training phase, to generate unexpected results. This step
removes noise words such as stemming words. By pre-processing the student data, applying
suitable data mining method the dataset were cleansed thus resulting in expected knowledge.
The discovered knowledge is used to provide constructive recommendation to overcome the
problem of low grades of graduate students and to improve student‟s academic performance at
the earliest. Pre-handling jobs were applied to set up all the already portrayed information so
the grouping assignment could be completed effectively. During this procedure those pupils
without 100% finish data will be considered that they are out of the process. A few changes
were likewise made to the estimations of a few properties.
For example: Name, Register No, Contact, Address, Email, Marks1,2,3,4., Personal
details, etc.
3.2. Feature Selection
Feature extraction starts from an initial set of measured data and builds derived values
(features) intended to be informative and non-redundant, facilitating the subsequent learning
and generalization steps, and in some cases leading to better human interpretations. Feature
selection algorithms can be used to identify which feature has the greatest effect on the output
variable (academic status). The objective is to resolve the problem of high dimensional data
by reducing the number of attributes without losing reliability in classification. Feature
selection techniques have been used to choose a subset of variables and eliminate others that
could be irrelevant or of no predictive information and therefore could prevent the classifiers
from reaching a good accuracy. A feature selection process can also be used to remove terms
in the training documents that are statistically uncorrelated with the class labels. This will
reduce the set of terms to be used in classification, thus improving both efficiency and
accuracy.
For example: having interest in social activities (Yes or No), showing involvement project
(Yes or No), his/her non-involvement in institution functions like terms.
3.3. Clustering
Clustering is a procedure of arranging information (or items) into an order/topology of
important sub-classes, called groups. This part causes the clients to comprehend the
characteristic gathering or structure in an informational collection and clustering calculation
(k-Means) can be actualized to state the highlights This algorithm clusters the data
into k groups (student groups), where k is predefined and selects k points at random as cluster
centers (total of obtained examination results by the student). To assign objects to their closest
cluster center according to the Euclidean distance function. Finally, it calculates the centroid
or mean of all objects (student samples) in each cluster.
For example: TOT_INT1, TOT_INT2, TOT_INT3, TOT_INT4, TOT_INT (internal
marks for various subjects) ADS_LAB, TOT_INT for JAVA_LAB (sample courses).
3.4. Classification
In this step, the classification algorithm (SVM) was implemented to classify the data, and
finally the status of students will be provided. Using the SVM algorithm test data and training
data records were collected from the previous process and were used for the classification.
The data were collected from the database. Some of the training record-set were also used for
testing. The SVM classifier function has been applied over the data sets and the classification
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 653 [email protected]
details were obtained. These details from the classification were further analysed to produce
the results and the accuracy of the prediction was found for the overall functionality. Finally,
the students were categorized, based on given conditions.
For example : Internal Marks for individual theory courses, Internal Lab Marks, whether
he/she involves in sports activities, department activities, experiencing any psychological
pressure from college environment or family environment, etc.,.
The above mentioned steps are shown in the following architecture diagram:
Figure 1 Phases of Prediction System
3.5. Algorithms
As we mentioned in the previous part, the entire work was accomplished by using two
techniques namely, clustering and classification. To implement these techniques two
algorithms were used. K-Means algorithm was used for clustering and Support Vector
Machine (SVM) for classification process. Now those two algorithms are explained as
follows:
3.5.1. Algorithm 1: K-Means (Clustering):
If we want to give the literal meaning for cluster, then this is one of the formal definitions for
clustering. The process of grouping a set of physical or abstract objects into classes of similar
objects is called clustering. A cluster is an accumulation of information protests that are like
each other inside a similar group and are unlike the items in different groups. A cluster of
information items can be dealt with on the whole as one gathering thus might be considered as
a type of data compression
These objects are going to be organized ask number of clusters (k< n) or (k=n). Cluster
similarity is measured in regard to the mean value of the objects in a cluster, which can be
viewed as the cluster‟s centroid or center of gravity.
The k-means algorithm continues as takes after. In the first place, it arbitrarily chooses k of
the articles, each of which at first speaks to a group mean or focus. For each of the remaining
objects, an object is assigned to the cluster to which it is the most similar, based on the
distance between the object and the cluster mean. It then computes the new mean for each
cluster. This process iterates until the criterion function converges. For each object in each
cluster, the distance from the object to its cluster center is squared, and the distances are
summed. This criterion tries to make the resulting k clusters as compact and as separate as
possible. The square-error criterion is used, defined as
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 654 [email protected]
)1(
2
1
k
i Cipi
mp
Where,
E - Sum of the square error for all objects in the data set
p - Point in space representing a given object
mi- mean of cluster Ci
Ci – clusters
3.5.1.1. Working way of K-Means algorithm:
According to our problem, we have predicted the student performance into three categories.
So let us assume that k=3, the students are partitioned into three clusters. Initially, we
arbitrarily choose three students‟ performance state (for high, medium, low) as three initial
cluster centers. The following Figure 2.0 helps us to understand how K-Means algorithm is
useful in partitioning the group of students. In this diagram, these three cluster centers are
denoted as „+‟. Each object is distributed to a cluster based on the cluster center to which it is
the nearest. Such a distribution forms clusters encircled by dotted curves, as shown in Figure
2.0 (a).
Next, the cluster centers (performance state) are updated. That is, the mean value
(performance state) of each cluster is recalculated based on the current objects in the cluster.
Using the new cluster centers, the objects are redistributed to the clusters based on which
cluster center is the nearest. Such redistribution forms new student cluster groups are
encircled by dashed curves, as shown in Figure 2.0 (b).
This process iterates, leading to Figure 2.0 (c). The process of iteratively reassigning
objects to clusters to improve the partitioning is referred to as iterative relocation. Eventually,
no redistribution of the objects in any cluster occurs, and so the process terminates. The
resulting student clusters are returned by the clustering process [Jiawei Han, Micheline
Kamber].
Figure 2 The k- Means partitioning algorithm.
Now the results that were obtained in clustering phase should be forwarded to
classification phase. That phase is going to be discussed in the next topic.
3.5.2. Algorithm-2 Support Vector Machine(SVM) (Classification):
It‟s a promising method for classifying both linear and non-linear data. It uses the non-linear
mapping to transform the original training data into higher dimension. Since, we defined each
individual student with multiple variables, and then each one of them are termed as multi-
dimensional objects. Within this new dimension, it searches for the linear optimal separating
hyperplane (that is, a “decision boundary” separating the students of one class from another).
Data from two classes can always be separated by a hyperplane (H1 and H2). The SVM
discovers this hyperplane utilizing support vectors ("critical" preparing information samples)
and edges (Large edge and Small edge, which is characterized by help vectors.
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 655 [email protected]
Advantages:
Many data analysts argue, that this method performs very slowly during its training phase; but
its accuracy is very high, especially for small amount of support vectors, which are
independent of higher dimension of the objects. Thus, we can say that to classify the meagre
amount of training samples with numerous parameters SVM is the highly suitable method.
When we compare every selected parameters of one student with other students to predict the
performance category on which those students are falling, we can use non-linear approach. It
is much less prone to over-fitting than other methods.
Since, we already clustered the students under three different performance measures; it
was easy to classify the groups further, with n number of constraints.
3.5.2.1. SVM- Linearly Separable:
So far, the existing approaches concentrated on classifying the sample data set into two major
classifications either (“yes” and “no”) or (“1” and “0”). Because, they use linear type of
samples. So when the dimension of the objects (students) was increased the SVM would go
for maximum marginal hyperplane (MMH). There are infinite number of separation lines
among the objects. Among them, the best separator has to be found (with minimum
classification error).
The training samples that are fallen on the HyperPlane (HP), are known as Support
Vectors. Actually they are close to MMH. So SVM is best suitable for small number of data,
which are less than 2000 training set, to classify them. Any software package can be used to
find support vectors and MMH. The above mentioned reason is the one behind the selection
of SVM for this work with the sample count of one hundred students.
Lagrangian formulation is used to define MMH and it can be referred in [16]
bXXyXT
ii
l
ii
Td
01
)(
(2)
Where,
1. yi is the class label of support vector Xi
2. XT
is a test tuple;
3. αi and b0 are numeric parameters that were determined automatically by the
optimization or SVM algorithm. and
4. l is the number of support vectors.
As already mentioned when we use SVM, dimension of the student is not important, only
number of Support Vectors is important. So, in short, the work is independent of data
dimension. Even we can produce a good generalized result when the dimensionality of data
sample is high.
3.5.2.2. SVM-Linearly Inseparable:
We cannot always expect that we always have a type of yes or no classes. Sometimes we may
need to classify the samples with more than two in count. In our work we obtained three
classes of students that were classified with high, medium, and low performance states, which
is actually a non-linear in type. It is accomplished in two steps: (i) the original input data is to
be transformed into higher dimensional space, (ii) In that space, a search is to be conducted
for linearly separating hyperplane. But this process is costly and also it is difficult to find a
non-linear mapping to a higher dimensional space. So we have to use a kernel function to the
original input data. Also an user-specified upper bound is used, like the maximum value for
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 656 [email protected]
the internal marks and CGPA (the two essential parameters to predict the performance of the
students).
Advantages of SVM in non-linear environment:
Usually, SVM goes for global solution. But other approaches like neural networks, produce
local minima in their solution. Another important advantage of SVM is, it is well-suited for
multi-class case, where the classification has to obtain the result with more than two classes.
Thus, it was suggested to go ahead with the student performance prediction process [Jiawei
Han, Micheline Kamber].
4. EXPERIMENTS AND RESULTS
The work we have taken is to identify the students who are at the edge of the performance
level in a Post Graduate programme. For that purpose, we collected the input data from the
students of both first year and second or final year students. Since, the number of students are
always less than the Under Graduate level, the availed data samples count was also limited to
one hundred students. The students‟ personal information were collected from their database.
The attendance status was also collected from the same student database. Because, this
parameter is playing a vital role in determining the performance of a student. Every
institution/university has made a constraint that every one of its student should attend at least
the minimum of a margin of tutorial hours for every subject. Suppose a student is not
attending the classes not in a regular manner, because of high level of involvement in other
academic activities. With high probability, this low state of attendance of a student will affect
his/her academic performance. So they should be warned by their respective tutors to
maintain their attendance state in “safe” zone. These details were collected from the students
through questionnaires. The format of the questionnaire is given below. These data are the
input to the system.
1) Model of a Questionnaire:
Your Name:
Father‟s Name:
Father‟s Occupation:
Mother‟s Name:
Mother‟s Occupation:
ADDRESS:
DOB:
Email id:
Gender:
Age:
Marital status:
PG Course:
Year of Passing (PG):
College (PG):
UG Course:
Year of Passing (UG):
College (UG):
SGPA:
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 657 [email protected]
CGPA:
Arrear? Yes No
From the input of data only a selected set of samples were selected through pre-processing
step. Here, the variables that are not relevant for the prediction are not considered usually. For
example, DOB, E-mail ID, since they are not part of any equations for partitioning and
classification. Sometimes the collected data values may not suitable or not been provided. All
these type of samples are to be neglected for consideration. Lot of parameters may improve
the performance state of the student indirectly, like parents are educated. But they are playing
a very little role in the part. So they too omitted.
Figure 3 List of phases of the tool
The processes should be executed in chronological order as given in Figure 3.0. The pre-
processing step as mentioned already removed the noisy and irrelevant data samples, which
were not fully equipped. It is shown in Figure 4.0. The filtered samples are now ready for
clustering. Before clustering get started the features of the data samples have to be selected.
The reason behind this action is, we decided that certain parameters are causing the impact
directly on the performance of the student. And that will ensure the expected accuracy level
from the system. The feature selection process is shown in Figure 5.0.
Figure 4 Pre-processing stage.
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 658 [email protected]
Figure 5 Feature Selection
Once the feature selection process was done the data samples were sent to the clustering
phase. Here, the center of a group was selected randomly (though) from CGPA, Consolidated
Internal Marks from each subject, Attendance Percentage of a student. These three values are
important in deciding the attitude of a student. To calculate the center we used Euclidean
distance. For an n-dimensional space, the distance is defined as
)()22()21(222
....),( qnpnppppqpd
d ( p , q ) = ( p 1 − q 1 ) 2 + ( p 2 − q 2 ) 2 + ⋯ + ( p i − q i ) 2 + ⋯ + ( p n − q n ) 2 .
{\displaystyle d(p,q)={\sqrt {(p_{1}-q_{1})^{2}+(p_{2}-q_{2})^{2}+\cdots +(p_{i}-
q_{i})^{2}+\cdots +(p_{n}-q_{n})^{2}}}.} (3)
Where p and q are the points to be taken to find the similarity and dissimilarity
Figure 6 Clustering the samples
Then the clustered data samples were passed on to SVM classifier. There additional
information is added to the features of the samples. We considered the parameters that would
affect the student‟s present state in the partitioning. Sometimes the student who has been
considered normal could be moved to high, because of his overall performance in all areas of
the academic programme. Here, parameters such as participating in sports, social activities,
department activities were used and be checked how the students were performed in
examinations, though they participated in these kind of activities . Suppose, for assumption,
a student who did well in sports, NCC, cultural activities, also obtained 7.0 or above that level
in CGPA in the sense, then he/she would be moved to HIGH class though there are other
students who got more CGPA than this student. The SVM classification process is exhibited
in Figure 7.0.
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 659 [email protected]
Figure 7 SVM Classification
5. ANALYSIS
When the prediction has been made on a student or on a single class performance, it should
not deviate from the original score of accuracy. The tool, which has been designed for this
work predicted the results with 96.7% accuracy. It matches with the overall assessment done
by the tutors of both the classes. The reason behind this is the change in the count of HIGH,
NORMAL, LOW groups once they were classified with SVM. For example, the following
graphs in Figure 8.0., Figure 9.0., show that the juniors were having a good count than the
seniors in HIGH class. Because more number of students were participating in academic
activities as well as they have obtained good results in their examinations.
Figure 8 Results of Classifier on Seniors
Figure 9 Results of Classifier on Juniors
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 660 [email protected]
As mentioned in the previous topic, if a student performs well in sports, then
automatically his/her performance in other activities and academics would be compromised.
The following graphs in Figure 10.0, Figure 11.0, Figure 12.0 show the results of student‟s
individual achievement.
Figure 10 Student under LOW classification
Figure 11 Student under Normal classification
Once the training phase was over, we tested the tool with a sample data and the results are
shown in the following graph (Figure 12.0):
Figure 12 Result of Testing sample
Student Performance Prediction Using Svm
http://www.iaeme.com/IJMET/index.asp 661 [email protected]
Both individual and class-wise data samples were taken and tested against the tool and
results were matched with the original data set and compared.
6. CONCLUSIONS
The data samples were collected from the tutors and students directly and they were fed into
the system for partitioning and classification. We implemented the rules that are defined in
SVM algorithm to predict the students‟ final grade. The results were obtained and checked
with the original set of data. Some of the samples were omitted because of irrelevant answers
and non-filled state. This may even lead to reduction in the accuracy, since the samples rate
was reduced. As the future enhancement, it is planned to add more number of parameters at
clustering level itself. Still we are lacking in finding the reason behind the performance loss of
a student or a class. We can include psychological parameters, social behaviour parameters,
family circumstances, parental care related parameters. These set of parameters would
increase the accuracy and helpful to reason out the lack of performance of students and we
can give a helping hand to them, guide them in the “right” path. Since, it‟s our duty as a
teacher.
ACKNOWLEDGEMENT
We would like to thank all the tutors, students and staff of Department of CSE, SRC,
SASTRA, who gave us support and encouragement to accomplish this work.
REFERENCES
[1] Overlade, O. J,,Oladipupo, Obagbuwa, I. C, Application of k-Means Clustering algorithm
for prediction of Students‟ Academic Performance‟ (IJCSIS) International Journal of
Computer Science and Information Security, Vol. 7, _o. 1, 2010
[2] Carlos Villagrá-Arnedo, Predicting Academic performance from Behavioural and
Learning data, Int. J. of Design & Nature and Ecodynamics. Vol. 11 No. 3 (2016) 239-
249.
[3] Dr. Saurabh Pal, Mining Educational Data Using Classification to Decrease Dropout Rate
of Students, International Journal Of Multidisciplinary Sciences and Engineering, Vol. 3,
No. 5, May 2012.
[4] M. Ramaswami and R. Bhaskaran, A Study on Feature Selection Techniques in
Educational Data Mining, Journal of Computing, Volume 1, Issue 1, December 2009,
ISSN: 2151-9617.
[5] Guodong Zhao and Jing Bai, Effective feature selection using feature vector graph for
classification, Journal of Computing-Science Direct(2015).
[6] Mrinal Pandey, S. Taruna, Towards the integration of multiple classifiers pertaining to the
Student‟s performance prediction, Journal of Computing-Science Direct(2016).
[7] Zhiyun Ren ,George Karypis, Predicting Student Performance Using Personalized
Analytics Predicting Student Performance Using Personalized Analytics, IEEE(April
2016).
[8] Chih-Wei Hsu, Chih-Chung Chang, A Practical Guide to Support Vector Classification,
[9] J.K. Jothi Kalpana, Intellectual Performance Analysis of Students by Using Data Mining
Techniques, International Journal of Innovative Research in Science, Engineering and
Technology Volume 3, Special Issue 3, March 2014.
[10] Wilhelmiina Hamalainen and Mikko Vinni, Comparison of Machine Learning Methods
for Intelligent Tutoring Systems, Springer-Verlag Berlin Heidelberg 2006.
[11] Andreas Jahn and Andreas Zell, Interpreting linear support vector machine models with
heat map molecule coloring, Journal of Cheminformatics 2011.
K. B. Eashwar, R. Venkatesan and D. Ganesh
http://www.iaeme.com/IJMET/index.asp 662 [email protected]
[12] Pooja Thakar, Performance Analysis and Prediction in Educational Data Mining: A
Research Travelogue, International Journal of Computer Applications (0975 – 8887)
Volume 110 – No. 15, January 2015.
[13] P. Usha, Predicting Student Performance Using Genetic and SVM Classifier, International
Journal Of Computer Engineering, July-Dec 2011, Volume 3, Number 2, pp. 97-102.
[14] Gabriel Barata and Sandra Gama, Early Prediction of Student Profiles Based on
Performance and Gaming Preferences, IEEE Transactions on Learning Technologies, Vol.
9, No. 3, July-Sep 2016.
[15] Hector M. Romero Ugalde, Computational cost improvement of neural network models in
black box nonlinear system identification, Journal – Science Direct (Neuro Computing)
2015.
[16] Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques”, Second Edition,
Morgan Kaufmann publishers, 2006.
[17] Dr. T. Arumuga Maria Devi and S. Mariammal, SVM Based Performance of IRIS
Detection, Segmentation, Normalization, Classification and Authentication Using
Histogram Morphological Techniques, International Journal of Computer Engineering and
Technology, 7(4), 2016, pp. 1–11.
[18] Dr. M. Renuka Devi and S. Sridevi, Short-Term Wind Power Forecasting and
Predominant Wind Direction Using SVM Kernel Function, International Journal of Civil
Engineering and Technology (IJCIET) Volume 8, Issue 7, July 2017, pp. 256–263.
[19] Sandip S. Patil and Asha P. Chaudhari, Classification of Emotions from Text Using Svm
Based Opinion Mining, International Journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 1, January- June (2012), pp. 330-338
[20] V.Anandhi and Dr.R.Manicka Chezian, Comparison of the Forecasting Techniques –
Arima, Ann and Svm - A Review, International Journal of Computer Engineering &
Technology (IJCET), Volume 4, Issue 3, May-June (2013), pp. 370-376