mooc dropout prediction using a hybrid algorithm based ...mathematicalproblemsinengineering 12345...

12
Research Article MOOC Dropout Prediction Using a Hybrid Algorithm Based on Decision Tree and Extreme Learning Machine Jing Chen , 1 Jun Feng , 1 Xia Sun , 1 Nannan Wu, 1 Zhengzheng Yang , 1 and Sushing Chen 2 1 School of Information Science and Technology, Northwest University, Xi’an, China 2 Computer Information Science and Engineering, University of Florida, Gainesville, FL, USA Correspondence should be addressed to Xia Sun; [email protected] Received 1 November 2018; Revised 14 January 2019; Accepted 4 February 2019; Published 18 March 2019 Academic Editor: Eric Lefevre Copyright © 2019 Jing Chen et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Massive Open Online Courses (MOOCs) have boomed in recent years because learners can arrange learning at their own pace. High dropout rate is a universal but unsolved problem in MOOCs. Dropout prediction has received much attention recently. A previous study reported the problem of learning behavior discrepancy leading to a wide range of fluctuation of prediction results. Besides, previous methods require iterative training which is time intensive. To address these problems, we propose DT-ELM, a novel hybrid algorithm combining decision tree and extreme learning machine (ELM), which requires no iterative training. e decision tree selects features with good classification ability. Further, it determines enhanced weights of the selected features to strengthen their classification ability. To achieve accurate prediction results, we optimize ELM structure by mapping the decision tree to ELM based on the entropy theory. Experimental results on the benchmark KDD 2015 dataset demonstrate the effectiveness of DT-ELM, which is 12.78%, 22.19%, and 6.87% higher than baseline algorithms in terms of accuracy, AUC, and F1-score, respectively. 1. Introduction MOOCs emerged as the natural solution to offer distance education with online learning enormously changing over the past years. MOOCs are widely used because of a potentially unlimited enrollment, nongeographical limita- tion, free accessibility for majority of courses, and structure resemblance with traditional lectures [1, 2]. Simply, they allow learners to learn anytime and anywhere at their own pace. With MOOCs booming popular [3, 4], the enrollment number of participants has increased from 8 million in 2013 to 101 million in 2018 rapidly [5, 6]. However, one critical problem that should not be neglected is that only an extremely low percentage of par- ticipants can complete courses [7–10]. Meanwhile, due to the high ratio of learner-to-instructor in online learning environment [8], it is unrealistic for instructors to track learners’ learning behavior, which results in dropout or retention. Many educational institutions will benefit from accurate dropout prediction. at will help to improve the course design, content, and teaching quality [11–13]. On the other hand, it will also help instructors supply learners with effective interventions, such as proposing personalized recommendations of educational resources and guiding sug- gestions. Dropout prediction has recently received much attention. Previous studies applied traditional machine learning algo- rithms to it. ese algorithms include logistic regression [14– 18], support vector machine [19], decision tree [20], boosted decision trees [2, 21, 22], and hidden Markov model [23]. However, there exists the problem of low accuracy leading to misidentification of at-risk learners, those who may quit courses. Most recently, deep learning has become the state-of-the- art machine learning technique with a great potential for dropout prediction [24]. Jacob et al. applied a deep and fully connected feed forward neural network which is capitalized on nonlinear feature representations automatically. Fei et al. utilized a recurrent neural network model with long short-term memory (LSTM) cells which encoded features Hindawi Mathematical Problems in Engineering Volume 2019, Article ID 8404653, 11 pages https://doi.org/10.1155/2019/8404653

Upload: others

Post on 01-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Research ArticleMOOC Dropout Prediction Using a Hybrid Algorithm Based onDecision Tree and Extreme Learning Machine

Jing Chen 1 Jun Feng 1 Xia Sun 1 NannanWu1

Zhengzheng Yang 1 and Sushing Chen2

1School of Information Science and Technology Northwest University Xirsquoan China2Computer Information Science and Engineering University of Florida Gainesville FL USA

Correspondence should be addressed to Xia Sun raindynwueducn

Received 1 November 2018 Revised 14 January 2019 Accepted 4 February 2019 Published 18 March 2019

Academic Editor Eric Lefevre

Copyright copy 2019 JingChen et alThis is an open access article distributed under theCreative CommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Massive Open Online Courses (MOOCs) have boomed in recent years because learners can arrange learning at their own paceHigh dropout rate is a universal but unsolved problem in MOOCs Dropout prediction has received much attention recently Aprevious study reported the problem of learning behavior discrepancy leading to a wide range of fluctuation of prediction resultsBesides previous methods require iterative training which is time intensive To address these problems we propose DT-ELM anovel hybrid algorithm combining decision tree and extreme learning machine (ELM) which requires no iterative training Thedecision tree selects features with good classification ability Further it determines enhanced weights of the selected features tostrengthen their classification ability To achieve accurate prediction results we optimize ELM structure by mapping the decisiontree to ELMbased on the entropy theory Experimental results on the benchmarkKDD2015 dataset demonstrate the effectiveness ofDT-ELM which is 1278 2219 and 687 higher than baseline algorithms in terms of accuracy AUC and F1-score respectively

1 Introduction

MOOCs emerged as the natural solution to offer distanceeducation with online learning enormously changing overthe past years MOOCs are widely used because of apotentially unlimited enrollment nongeographical limita-tion free accessibility for majority of courses and structureresemblance with traditional lectures [1 2] Simply theyallow learners to learn anytime and anywhere at their ownpace With MOOCs booming popular [3 4] the enrollmentnumber of participants has increased from 8 million in 2013to 101 million in 2018 rapidly [5 6]

However one critical problem that should not beneglected is that only an extremely low percentage of par-ticipants can complete courses [7ndash10] Meanwhile due tothe high ratio of learner-to-instructor in online learningenvironment [8] it is unrealistic for instructors to tracklearnersrsquo learning behavior which results in dropout orretention Many educational institutions will benefit fromaccurate dropout prediction That will help to improve the

course design content and teaching quality [11ndash13] Onthe other hand it will also help instructors supply learnerswith effective interventions such as proposing personalizedrecommendations of educational resources and guiding sug-gestions

Dropout prediction has recently receivedmuch attentionPrevious studies applied traditional machine learning algo-rithms to it These algorithms include logistic regression [14ndash18] support vector machine [19] decision tree [20] boosteddecision trees [2 21 22] and hidden Markov model [23]However there exists the problem of low accuracy leadingto misidentification of at-risk learners those who may quitcourses

Most recently deep learning has become the state-of-the-art machine learning technique with a great potential fordropout prediction [24] Jacob et al applied a deep and fullyconnected feed forward neural network which is capitalizedon nonlinear feature representations automatically Fei etal utilized a recurrent neural network model with longshort-term memory (LSTM) cells which encoded features

HindawiMathematical Problems in EngineeringVolume 2019 Article ID 8404653 11 pageshttpsdoiorg10115520198404653

2 Mathematical Problems in Engineering

into continuous states [25] Although deep learning achievesmore accuracy than traditional machine learning methodsit should be noted that deep neural networks need iterativetraining and a large amount of training data

Moreover due to the design discrepancy of MOOCplatforms current research utilizes different learning behav-iors in dropout prediction [26] Lack of uniform definitionand understanding of learning behaviors in online learningenvironment [27 28] will lead to un-unified conclusion onbehavior features with better classification ability The rangeof result fluctuation is widely resulting from the learningbehavior discrepancy Feature selection is essential in dropoutprediction Nevertheless little attention has been devotedto it and most related studies utilize as many features aspossible Genetic algorithm is one of the common usedfeature selection methods with good scalability combiningwith other algorithms easily [29 30] However it needsiterative training

The goal of our approach is incorporating feature selec-tion and fast training to realize accurate dropout predictionTo address feature selection we adopt the decision treealgorithm due to its tree structure and theoretical basisFurther the selected features are enhanced with differentweights depending on the decision tree structure The aim isto strengthen the features with good classification ability

To realize fast training we choose the ELM algorithm fordropout prediction ELM is a single hidden layer feed forwardneural network which improves the gradient algorithm andrequires no updating parameters by repeated iterations [3132] However a theoretical guiding rule to determine thestructure of ELM is lacking Different structures lead todifferent prediction results

To achieve accurate results of drop prediction we mapthe decision tree structure to the ELM structure based onthe entropy theory The mapping rule takes full account ofthe impact of internal nodes on leaf nodes in the decisiontree It determines not only the neuron numbers of each layerin ELM but also the connections between input layer andhidden layer By this way reasonable information assignmentis realized at the initial stage of ELM

In line with common practice in dropout prediction weextract behavior features from raw learning records Unlikepast approaches feature selection and enhancement arerealized by decision tree Then decision tree is incorporatedwith ELM to realize fast training and accurate predictionMeanwhile it is noteworthy that we utilize the same treestructure to solve the different problems The core of ourproposed algorithm is how to design the mapping rule todetermine the structure of ELM

The main contribution of this paper can be summarizedas follows Firstly we define and extract several interpre-tive behavior features from raw learning behavior recordsSecondly we propose a novel hybrid algorithm combiningdecision tree and ELM for dropout prediction It solves theproblems of behavior discrepancy iterative training andstructure initialization of ELM It successfully makes fulluse of the same decision tree structure as a warm-start tothe whole algorithm Finally we verify the effectiveness ofour proposed algorithm by conducting experiments on the

benchmark KDD 2015 dataset and it performs much betterthan baseline algorithms in multiple criteria

2 Method

21 Problem Statement There are three definitions ofMOOCdropout prediction in the current studiesThe first is whethera learner will still participate in the last week of the course[33ndash35] The second is whether the current week is the lastweek a learner has activities [17 19 36]Those two definitionsare similar because they are related to the final state of alearner and the dropout label cannot be determined until theend of the course The third definition is whether a learnerwill still participate in the coming week of the course whichis related to the ongoing state of a learner [25 37] Thedropout label can be determined based on the behavior ofcurrent week which can help the instructors to make theinterventions timely Thus the third definition is used in ourpaper

The expectation confirmation model explains why userscontinue to use the information system [38] and then it isextended to explain why learners continue to use MOOCs[39] The studies find that there exist several significantfactors which can influence the continuing usage such asconfirmation of prior use perceived usefulness and learnersrsquosatisfaction According to that the current learningmay havemore impact on the intention of continuing usage For mostlearners if they confirm the usefulness and feel satisfied ofcurrent week learning they may have strong intention tocontinue the learning in the next week

Therefore the goal of this paper is to predict who maystop learning in the coming week based on the learningbehaviors of current week which helps instructors bettertrack the learning state of the leaner to take correspondinginterventions Assume there are 119903 behavior features extractedfrom learning behavior records for the current week which isrepresented as a 119903 dimensional vector 119909푖 isin 119877푟 119910푖 isin 119877푚 is thecorresponding dropout label If there are activities associatedwith 119894푡ℎ learner in the coming week the dropout label of thislearner is 119910푖 = 0which indicates that the learner will continueto learn Otherwise the dropout label is 119910푖 = 1 which meansthe learner will quit the course next week

22 Framework of MOOC Dropout Prediction To addressthe problem of MOOC dropout prediction we proposea framework which is shown in Figure 1 To be specificthe first module designs and extracts several features fromlearnersrsquo learning behavior recordsThe feature quantificationis realized by calculating the number of each feature whichreflects the engagement of learners The outputs of thismodule are feature matrix and label matrix

The secondmodule implements dropout prediction usingDT-ELM algorithm based on the extracted behavior featuresThedecision layer is designed to select features and determinethe ELM structure based on decision tree It outputs thetree structure to the mapping layer and the selected featuresto the enhancement layer The enhancement layer targetsthe strengthening of the classification ability of the selectedfeatures It outputs the enhanced features to the improved

Mathematical Problems in Engineering 3

Learning BehaviorRecords

Raw Data

Dropout orRetention

Label

Feature Extraction

Feature Design andQuantification

Featuredefinitions Feature Extraction

Processing

Form Matrices

Feature Matrix1 r m1

1

Feature

rFeature

Feature

1Featur Label

Label Label

mLabele

Output Matrices

Dropout Prediction Based on DT-ELM Algorithm

Decision LayerSelectedFeatures Enhancement

Layer

TreeStructure

EnhancedFeatures

Mapping LayerNeuron Numbers

Of LayersConnections

Between LayersImproved ELM

Figure 1 Overall framework of dropout prediction

ELM The function of mapping layer is to determine theELM structure according to the tree structure It outputsthe neuron numbers of each layer and connections betweenlayers By the improved ELM the dropout or retention can beobtained

23 Feature Extraction We extract features from learningbehavior records Courses generally launch weekly It isbetter to utilize the numbers of learning behavior recordsby week as features [40] The set for each type of learn-ing behavior records is represented as 119877푡 Here 119905 denotesthe 119905푡ℎ type of learning behavior and 119905 = 1 2 sdot sdot sdot 119903The record number of 119905푡ℎ type of learning behavior dur-ing the duration for each course is expressed as a vector119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] where 119894 represents the 119894푡ℎleaner 119909(푞)푖푡 is the number of learning behavior records in the119902푡ℎ week and119882 is the number of weeks a course lasts Thefeature extraction process is shown in Algorithm 1 It outputsthe feature matrix and label matrix

After feature extraction the feature matrix 119909 =[1199091 1199092 sdot sdot sdot 119909푒] is obtained where 119890 is the number of enroll-ment learners 119909푖 = [119909푖1 119909푖2 119909푖푟] represents the behaviorfeatures of the 119894푡ℎ learner 119910 = [1199101 1199102 sdot sdot sdot 119910푒] is the labelmatrix where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚 is the dropoutlabel of the 119894푡ℎ learner

Effective learning time is another kind of behavior featureand represents the actual time that a learner spends on

learning In practice a learner may click a video and thenleaves for something else Therefore we set a thresholdbetween two activity clicksThe time exceeding the thresholdwill not be counted

24 Dropout Prediction Based on DT-ELM Algorithm Deci-sion Layer The decision layer implements the feature selec-tion using decision tree based on the maximum informationgain ratio [41]119863 is the input of decision layer Each instancein 119863 is represented as 119909푖 = [119909푖1 119909푖2 119909푖푟] isin 119877푟 119910푖 =[119910푖1 119910푖2 119910푖푚] isin 119877푚 is the class label of 119909푖 119863耠 is theoutput of decision layer only containing the selected featuresEach instance in119863耠 is represented as 119909푖119899119890119908 = [119909푖1 119909푖2 119909푖푛]which means there are 119899 selected features in119863耠

Decision tree is constructed by recursive partitioning119863 into smaller subsets 1198631 1198632 119863퐾 until reaching thespecified stopping criterion for example that all the subsetsbelong to a single class A single feature split is recursivelydefined for all nodes of the tree using some criterionInformation gain ratio is one of the most widely used criteriafor decision treeThe entropy which comes from informationtheory is described as

Info (119863) = minus 푚sum푐=1

119901푐 log2 (119901푐) (1)

where 119898 represents the number of classes 119901푐 is theprobability that an instance 119909푖 belongs to the class 119888 The

4 Mathematical Problems in Engineering

Inputs119877 Learning behavior records of a course119890 Enrollment number of learners119903 Number of behavior features119889 Duration of the courseOutputs119909 Feature matrix with size of 119890 times 119903119910 Label matrix with size of 119890 times 1198981 119877 is the set of learning behavior records for each course It is grouped by

the behavior types Let 119877 = [1198771 1198772 sdot sdot sdot 119877푡 sdot sdot sdot 119877푢] be the record set oflearning behaviors where 119905 = 1 2 sdot sdot sdot 119903

2 Divide the duration of this course into119882 weeks3 For each learning behavior record in 119877푡4 If this record occurred in week 119902 generated by learner 1198945 119909(푞)푖푡 + = 16 For each learner the 119905푡ℎ learning behavior feature119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] is obtained7 Form the feature matrix 119909 = [1199091 1199092 sdot sdot sdot 119909푒] where 119909푖 = [119909푖1 119909푖2 119909푖푟] is 119903

types of behavior features of the 119894푡ℎ learner8 Form the label matrix 119910 = [1199101 1199102 sdot sdot sdot 119910푒] where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚

Algorithm 1 Feature Extraction Processing

split rule is defined by information gain which represents theexpected reduction in entropy after the split according to agiven feature The information gain is described as follows

119866119886119894119899 (119860) = Info (119863) minus Info퐴 (119863) (2)

Info퐴(119863) is the conditional entropy which represents theentropy of 119863 based on the partitioning by feature 119860 It iscomputed by the weighted average over all sets resulting fromthe split shown in (3) where |119863푘||119863| acts as the weight of the119896푡ℎ partition

Info퐴 (119863) =퐾sum푘=1

1003816100381610038161003816119863푘1003816100381610038161003816|119863| times Info (119863푘) (3)

The information gain ratio extends the information gainwhich applies a kind of normalization to information gainusing a ldquosplit informationrdquo value

SplitInfo퐴 (119863) = minus퐾sum푘=1

1003816100381610038161003816119863퐾1003816100381610038161003816|119863| times log2 (1003816100381610038161003816119863퐾1003816100381610038161003816|119863| ) (4)

The feature with the maximum information gain ratio isselected as the splitting feature which is defined as follows

119866119886119894119899119877119886119905119894119900 (119860) = 119866119886119894119899 (119860)SplitInfo퐴 (119863) (5)

The decision tree is constructed by this way Each internalnode of the decision tree corresponds to one of the selectedfeatures Each terminal node represents the specific class of acategorical variable

Enhancement Layer Because the classification ability ofeach selected feature is different the impact of this featureon each leaf node is different The root node has the best

Table 1 Mapping rules between DT and ELM

DT ELMInternal nodes Input neuronsTerminal nodes Hidden neuronsDistinct classes Output neurons

Paths between nodes Connections between inputlayer and hidden layer

classification ability and it connects to all leaf nodes Thatmeans the root node has the greatest impact on all leaf nodesEach internal node except the root node connects to fewerleaf nodes and has less impact on the connected leaf nodesEach value of the selected feature is multiplied by a number119897푠 which is equal to the number of leaf nodes it connected toin the decision tree It is represented as follows

119909푖푠119899119890119908 = 119909푖푠119900119903119894 times 119897푠 119904 = 1 2 119899 (6)

By this step we enhance the impact of the selectedfeatures on leaf nodes based on the tree structure

Mapping Layer In the mapping layer inspired by theentropy net [42] wemap the decision tree to the ELM Table 1shows the corresponding mapping rules between nodes indecision tree and neurons in ELM

The number of internal nodes in decision tree equals thenumber of neurons in the input layer of ELM Each leaf nodein decision tree is mapped to a corresponding neuron inthe hidden layer of ELM The number of distinct classes indecision tree equals the number of neurons in the output layerof ELMThe paths between nodes in decision tree decide theconnections between input layer and hidden layer of ELMThe result of this mapping principle determines the numbers

Mathematical Problems in Engineering 5

x1get1 x1get1

x4 get4

x3 get4

x1get3x1get3

x2ge t2 x2ge t2

nn internal l terminal m distinct n input neurons l hidden neurons m output neurons

Improved ELMDecision Tree

Mapping

odes nodes classes

xi isin Rn yi isin Rm

(wb)

Figure 2 An illustration of mapping the decision tree to ELM The internal nodes leaf nodes and distinct classes in the decision tree aremapped to the corresponding neurons in the input layer hidden layer and output layer of ELM respectively The paths between nodesdetermine the connections between input layer and hidden layer

of neurons in each layer Meanwhile it improves ELM withfewer connections

Figure 2 shows an illustration of mapping the decisiontree to ELM The first neuron of the input layer is mappedfrom the root node of the decision tree It connects to allhidden neurons mapped from all leaf nodes That meansthe first neuron has impact on every hidden neuron Thesecond neuron of the input layer connects to the four hiddenneurons according to the decision tree structure That meansthe second neuron has impact on the four hidden neuronsThedashed lines show that there exist no corresponding pathsbetween the internal nodes and leaf nodes in the decisiontree Therefore there exist no connections between the cor-responding neurons in the input layer and the correspondingneurons in the hidden layer of ELM

Improved ELMOnce the structure of ELM is determinedthe enhanced features are input into the ELM The connec-tionless weights between input layer and hidden layer areinitialized with zero or extremely small values very close tozero Other connection weights as well as biases of the hiddenlayer are initialized randomlyUnique optimal solution can beobtained once the numbers of hidden neurons and initializedparameters are determined

There are 119873 random instances (119909푖 119910푖) where 119909푖 =[119909푖1 119909푖2 119909푖푛]푇 isin 119877푛 119910푖 = [119910푖1 119910푖2 119910푖푚]푇 isin 119877푚 ASLFN with 119871 hidden neurons can be represented as follows

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119900푖 119894 = 1 2 119873 (7)

where 119892(119909) is the activation function of hidden neuron119908푗 = [119908푗1 119908푗2 119908푗푛]푇 is the weight vector of inputneurons connecting to 119894푡ℎ hidden neuron The inner product

of 119908푗 and 119909푖 is 119908푗 sdot 119909푖 The bias of the 119895푡ℎ hidden neuron is 119887푗The weight vector of the 119895푡ℎ hidden neuron connecting to theoutput neurons is 120573푗 = [120573푗1 120573푗2 120573푗푚]푇

The target of a standard SLFN is to approximate these119873instances with zero error which is represented as (8) wherethe desired output is 119900푖 and the actual output is 119910푖

푁sum푖=1

1003817100381710038171003817119900푖 minus 119910푖1003817100381710038171003817 = 0 (8)

In other words there exist proper 119908푗 119887푗 and 120573j such that

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119910푖 119894 = 1 2 119873 (9)

Equation (9) can be represented completely as follows

119867120573 = 119884 (10)

where

119867 = [[[[[

119892 (1199081 sdot 1199091 + 1198871) sdot sdot sdot 119892 (119908퐿 sdot 1199091 + 119887퐿) 119892 (1199081 sdot 119909푁 + 119887퐿) sdot sdot sdot 119892 (119908퐿 sdot 119909N + 119887퐿)

]]]]]푁times퐿

(11)

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

2 Mathematical Problems in Engineering

into continuous states [25] Although deep learning achievesmore accuracy than traditional machine learning methodsit should be noted that deep neural networks need iterativetraining and a large amount of training data

Moreover due to the design discrepancy of MOOCplatforms current research utilizes different learning behav-iors in dropout prediction [26] Lack of uniform definitionand understanding of learning behaviors in online learningenvironment [27 28] will lead to un-unified conclusion onbehavior features with better classification ability The rangeof result fluctuation is widely resulting from the learningbehavior discrepancy Feature selection is essential in dropoutprediction Nevertheless little attention has been devotedto it and most related studies utilize as many features aspossible Genetic algorithm is one of the common usedfeature selection methods with good scalability combiningwith other algorithms easily [29 30] However it needsiterative training

The goal of our approach is incorporating feature selec-tion and fast training to realize accurate dropout predictionTo address feature selection we adopt the decision treealgorithm due to its tree structure and theoretical basisFurther the selected features are enhanced with differentweights depending on the decision tree structure The aim isto strengthen the features with good classification ability

To realize fast training we choose the ELM algorithm fordropout prediction ELM is a single hidden layer feed forwardneural network which improves the gradient algorithm andrequires no updating parameters by repeated iterations [3132] However a theoretical guiding rule to determine thestructure of ELM is lacking Different structures lead todifferent prediction results

To achieve accurate results of drop prediction we mapthe decision tree structure to the ELM structure based onthe entropy theory The mapping rule takes full account ofthe impact of internal nodes on leaf nodes in the decisiontree It determines not only the neuron numbers of each layerin ELM but also the connections between input layer andhidden layer By this way reasonable information assignmentis realized at the initial stage of ELM

In line with common practice in dropout prediction weextract behavior features from raw learning records Unlikepast approaches feature selection and enhancement arerealized by decision tree Then decision tree is incorporatedwith ELM to realize fast training and accurate predictionMeanwhile it is noteworthy that we utilize the same treestructure to solve the different problems The core of ourproposed algorithm is how to design the mapping rule todetermine the structure of ELM

The main contribution of this paper can be summarizedas follows Firstly we define and extract several interpre-tive behavior features from raw learning behavior recordsSecondly we propose a novel hybrid algorithm combiningdecision tree and ELM for dropout prediction It solves theproblems of behavior discrepancy iterative training andstructure initialization of ELM It successfully makes fulluse of the same decision tree structure as a warm-start tothe whole algorithm Finally we verify the effectiveness ofour proposed algorithm by conducting experiments on the

benchmark KDD 2015 dataset and it performs much betterthan baseline algorithms in multiple criteria

2 Method

21 Problem Statement There are three definitions ofMOOCdropout prediction in the current studiesThe first is whethera learner will still participate in the last week of the course[33ndash35] The second is whether the current week is the lastweek a learner has activities [17 19 36]Those two definitionsare similar because they are related to the final state of alearner and the dropout label cannot be determined until theend of the course The third definition is whether a learnerwill still participate in the coming week of the course whichis related to the ongoing state of a learner [25 37] Thedropout label can be determined based on the behavior ofcurrent week which can help the instructors to make theinterventions timely Thus the third definition is used in ourpaper

The expectation confirmation model explains why userscontinue to use the information system [38] and then it isextended to explain why learners continue to use MOOCs[39] The studies find that there exist several significantfactors which can influence the continuing usage such asconfirmation of prior use perceived usefulness and learnersrsquosatisfaction According to that the current learningmay havemore impact on the intention of continuing usage For mostlearners if they confirm the usefulness and feel satisfied ofcurrent week learning they may have strong intention tocontinue the learning in the next week

Therefore the goal of this paper is to predict who maystop learning in the coming week based on the learningbehaviors of current week which helps instructors bettertrack the learning state of the leaner to take correspondinginterventions Assume there are 119903 behavior features extractedfrom learning behavior records for the current week which isrepresented as a 119903 dimensional vector 119909푖 isin 119877푟 119910푖 isin 119877푚 is thecorresponding dropout label If there are activities associatedwith 119894푡ℎ learner in the coming week the dropout label of thislearner is 119910푖 = 0which indicates that the learner will continueto learn Otherwise the dropout label is 119910푖 = 1 which meansthe learner will quit the course next week

22 Framework of MOOC Dropout Prediction To addressthe problem of MOOC dropout prediction we proposea framework which is shown in Figure 1 To be specificthe first module designs and extracts several features fromlearnersrsquo learning behavior recordsThe feature quantificationis realized by calculating the number of each feature whichreflects the engagement of learners The outputs of thismodule are feature matrix and label matrix

The secondmodule implements dropout prediction usingDT-ELM algorithm based on the extracted behavior featuresThedecision layer is designed to select features and determinethe ELM structure based on decision tree It outputs thetree structure to the mapping layer and the selected featuresto the enhancement layer The enhancement layer targetsthe strengthening of the classification ability of the selectedfeatures It outputs the enhanced features to the improved

Mathematical Problems in Engineering 3

Learning BehaviorRecords

Raw Data

Dropout orRetention

Label

Feature Extraction

Feature Design andQuantification

Featuredefinitions Feature Extraction

Processing

Form Matrices

Feature Matrix1 r m1

1

Feature

rFeature

Feature

1Featur Label

Label Label

mLabele

Output Matrices

Dropout Prediction Based on DT-ELM Algorithm

Decision LayerSelectedFeatures Enhancement

Layer

TreeStructure

EnhancedFeatures

Mapping LayerNeuron Numbers

Of LayersConnections

Between LayersImproved ELM

Figure 1 Overall framework of dropout prediction

ELM The function of mapping layer is to determine theELM structure according to the tree structure It outputsthe neuron numbers of each layer and connections betweenlayers By the improved ELM the dropout or retention can beobtained

23 Feature Extraction We extract features from learningbehavior records Courses generally launch weekly It isbetter to utilize the numbers of learning behavior recordsby week as features [40] The set for each type of learn-ing behavior records is represented as 119877푡 Here 119905 denotesthe 119905푡ℎ type of learning behavior and 119905 = 1 2 sdot sdot sdot 119903The record number of 119905푡ℎ type of learning behavior dur-ing the duration for each course is expressed as a vector119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] where 119894 represents the 119894푡ℎleaner 119909(푞)푖푡 is the number of learning behavior records in the119902푡ℎ week and119882 is the number of weeks a course lasts Thefeature extraction process is shown in Algorithm 1 It outputsthe feature matrix and label matrix

After feature extraction the feature matrix 119909 =[1199091 1199092 sdot sdot sdot 119909푒] is obtained where 119890 is the number of enroll-ment learners 119909푖 = [119909푖1 119909푖2 119909푖푟] represents the behaviorfeatures of the 119894푡ℎ learner 119910 = [1199101 1199102 sdot sdot sdot 119910푒] is the labelmatrix where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚 is the dropoutlabel of the 119894푡ℎ learner

Effective learning time is another kind of behavior featureand represents the actual time that a learner spends on

learning In practice a learner may click a video and thenleaves for something else Therefore we set a thresholdbetween two activity clicksThe time exceeding the thresholdwill not be counted

24 Dropout Prediction Based on DT-ELM Algorithm Deci-sion Layer The decision layer implements the feature selec-tion using decision tree based on the maximum informationgain ratio [41]119863 is the input of decision layer Each instancein 119863 is represented as 119909푖 = [119909푖1 119909푖2 119909푖푟] isin 119877푟 119910푖 =[119910푖1 119910푖2 119910푖푚] isin 119877푚 is the class label of 119909푖 119863耠 is theoutput of decision layer only containing the selected featuresEach instance in119863耠 is represented as 119909푖119899119890119908 = [119909푖1 119909푖2 119909푖푛]which means there are 119899 selected features in119863耠

Decision tree is constructed by recursive partitioning119863 into smaller subsets 1198631 1198632 119863퐾 until reaching thespecified stopping criterion for example that all the subsetsbelong to a single class A single feature split is recursivelydefined for all nodes of the tree using some criterionInformation gain ratio is one of the most widely used criteriafor decision treeThe entropy which comes from informationtheory is described as

Info (119863) = minus 푚sum푐=1

119901푐 log2 (119901푐) (1)

where 119898 represents the number of classes 119901푐 is theprobability that an instance 119909푖 belongs to the class 119888 The

4 Mathematical Problems in Engineering

Inputs119877 Learning behavior records of a course119890 Enrollment number of learners119903 Number of behavior features119889 Duration of the courseOutputs119909 Feature matrix with size of 119890 times 119903119910 Label matrix with size of 119890 times 1198981 119877 is the set of learning behavior records for each course It is grouped by

the behavior types Let 119877 = [1198771 1198772 sdot sdot sdot 119877푡 sdot sdot sdot 119877푢] be the record set oflearning behaviors where 119905 = 1 2 sdot sdot sdot 119903

2 Divide the duration of this course into119882 weeks3 For each learning behavior record in 119877푡4 If this record occurred in week 119902 generated by learner 1198945 119909(푞)푖푡 + = 16 For each learner the 119905푡ℎ learning behavior feature119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] is obtained7 Form the feature matrix 119909 = [1199091 1199092 sdot sdot sdot 119909푒] where 119909푖 = [119909푖1 119909푖2 119909푖푟] is 119903

types of behavior features of the 119894푡ℎ learner8 Form the label matrix 119910 = [1199101 1199102 sdot sdot sdot 119910푒] where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚

Algorithm 1 Feature Extraction Processing

split rule is defined by information gain which represents theexpected reduction in entropy after the split according to agiven feature The information gain is described as follows

119866119886119894119899 (119860) = Info (119863) minus Info퐴 (119863) (2)

Info퐴(119863) is the conditional entropy which represents theentropy of 119863 based on the partitioning by feature 119860 It iscomputed by the weighted average over all sets resulting fromthe split shown in (3) where |119863푘||119863| acts as the weight of the119896푡ℎ partition

Info퐴 (119863) =퐾sum푘=1

1003816100381610038161003816119863푘1003816100381610038161003816|119863| times Info (119863푘) (3)

The information gain ratio extends the information gainwhich applies a kind of normalization to information gainusing a ldquosplit informationrdquo value

SplitInfo퐴 (119863) = minus퐾sum푘=1

1003816100381610038161003816119863퐾1003816100381610038161003816|119863| times log2 (1003816100381610038161003816119863퐾1003816100381610038161003816|119863| ) (4)

The feature with the maximum information gain ratio isselected as the splitting feature which is defined as follows

119866119886119894119899119877119886119905119894119900 (119860) = 119866119886119894119899 (119860)SplitInfo퐴 (119863) (5)

The decision tree is constructed by this way Each internalnode of the decision tree corresponds to one of the selectedfeatures Each terminal node represents the specific class of acategorical variable

Enhancement Layer Because the classification ability ofeach selected feature is different the impact of this featureon each leaf node is different The root node has the best

Table 1 Mapping rules between DT and ELM

DT ELMInternal nodes Input neuronsTerminal nodes Hidden neuronsDistinct classes Output neurons

Paths between nodes Connections between inputlayer and hidden layer

classification ability and it connects to all leaf nodes Thatmeans the root node has the greatest impact on all leaf nodesEach internal node except the root node connects to fewerleaf nodes and has less impact on the connected leaf nodesEach value of the selected feature is multiplied by a number119897푠 which is equal to the number of leaf nodes it connected toin the decision tree It is represented as follows

119909푖푠119899119890119908 = 119909푖푠119900119903119894 times 119897푠 119904 = 1 2 119899 (6)

By this step we enhance the impact of the selectedfeatures on leaf nodes based on the tree structure

Mapping Layer In the mapping layer inspired by theentropy net [42] wemap the decision tree to the ELM Table 1shows the corresponding mapping rules between nodes indecision tree and neurons in ELM

The number of internal nodes in decision tree equals thenumber of neurons in the input layer of ELM Each leaf nodein decision tree is mapped to a corresponding neuron inthe hidden layer of ELM The number of distinct classes indecision tree equals the number of neurons in the output layerof ELMThe paths between nodes in decision tree decide theconnections between input layer and hidden layer of ELMThe result of this mapping principle determines the numbers

Mathematical Problems in Engineering 5

x1get1 x1get1

x4 get4

x3 get4

x1get3x1get3

x2ge t2 x2ge t2

nn internal l terminal m distinct n input neurons l hidden neurons m output neurons

Improved ELMDecision Tree

Mapping

odes nodes classes

xi isin Rn yi isin Rm

(wb)

Figure 2 An illustration of mapping the decision tree to ELM The internal nodes leaf nodes and distinct classes in the decision tree aremapped to the corresponding neurons in the input layer hidden layer and output layer of ELM respectively The paths between nodesdetermine the connections between input layer and hidden layer

of neurons in each layer Meanwhile it improves ELM withfewer connections

Figure 2 shows an illustration of mapping the decisiontree to ELM The first neuron of the input layer is mappedfrom the root node of the decision tree It connects to allhidden neurons mapped from all leaf nodes That meansthe first neuron has impact on every hidden neuron Thesecond neuron of the input layer connects to the four hiddenneurons according to the decision tree structure That meansthe second neuron has impact on the four hidden neuronsThedashed lines show that there exist no corresponding pathsbetween the internal nodes and leaf nodes in the decisiontree Therefore there exist no connections between the cor-responding neurons in the input layer and the correspondingneurons in the hidden layer of ELM

Improved ELMOnce the structure of ELM is determinedthe enhanced features are input into the ELM The connec-tionless weights between input layer and hidden layer areinitialized with zero or extremely small values very close tozero Other connection weights as well as biases of the hiddenlayer are initialized randomlyUnique optimal solution can beobtained once the numbers of hidden neurons and initializedparameters are determined

There are 119873 random instances (119909푖 119910푖) where 119909푖 =[119909푖1 119909푖2 119909푖푛]푇 isin 119877푛 119910푖 = [119910푖1 119910푖2 119910푖푚]푇 isin 119877푚 ASLFN with 119871 hidden neurons can be represented as follows

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119900푖 119894 = 1 2 119873 (7)

where 119892(119909) is the activation function of hidden neuron119908푗 = [119908푗1 119908푗2 119908푗푛]푇 is the weight vector of inputneurons connecting to 119894푡ℎ hidden neuron The inner product

of 119908푗 and 119909푖 is 119908푗 sdot 119909푖 The bias of the 119895푡ℎ hidden neuron is 119887푗The weight vector of the 119895푡ℎ hidden neuron connecting to theoutput neurons is 120573푗 = [120573푗1 120573푗2 120573푗푚]푇

The target of a standard SLFN is to approximate these119873instances with zero error which is represented as (8) wherethe desired output is 119900푖 and the actual output is 119910푖

푁sum푖=1

1003817100381710038171003817119900푖 minus 119910푖1003817100381710038171003817 = 0 (8)

In other words there exist proper 119908푗 119887푗 and 120573j such that

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119910푖 119894 = 1 2 119873 (9)

Equation (9) can be represented completely as follows

119867120573 = 119884 (10)

where

119867 = [[[[[

119892 (1199081 sdot 1199091 + 1198871) sdot sdot sdot 119892 (119908퐿 sdot 1199091 + 119887퐿) 119892 (1199081 sdot 119909푁 + 119887퐿) sdot sdot sdot 119892 (119908퐿 sdot 119909N + 119887퐿)

]]]]]푁times퐿

(11)

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Mathematical Problems in Engineering 3

Learning BehaviorRecords

Raw Data

Dropout orRetention

Label

Feature Extraction

Feature Design andQuantification

Featuredefinitions Feature Extraction

Processing

Form Matrices

Feature Matrix1 r m1

1

Feature

rFeature

Feature

1Featur Label

Label Label

mLabele

Output Matrices

Dropout Prediction Based on DT-ELM Algorithm

Decision LayerSelectedFeatures Enhancement

Layer

TreeStructure

EnhancedFeatures

Mapping LayerNeuron Numbers

Of LayersConnections

Between LayersImproved ELM

Figure 1 Overall framework of dropout prediction

ELM The function of mapping layer is to determine theELM structure according to the tree structure It outputsthe neuron numbers of each layer and connections betweenlayers By the improved ELM the dropout or retention can beobtained

23 Feature Extraction We extract features from learningbehavior records Courses generally launch weekly It isbetter to utilize the numbers of learning behavior recordsby week as features [40] The set for each type of learn-ing behavior records is represented as 119877푡 Here 119905 denotesthe 119905푡ℎ type of learning behavior and 119905 = 1 2 sdot sdot sdot 119903The record number of 119905푡ℎ type of learning behavior dur-ing the duration for each course is expressed as a vector119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] where 119894 represents the 119894푡ℎleaner 119909(푞)푖푡 is the number of learning behavior records in the119902푡ℎ week and119882 is the number of weeks a course lasts Thefeature extraction process is shown in Algorithm 1 It outputsthe feature matrix and label matrix

After feature extraction the feature matrix 119909 =[1199091 1199092 sdot sdot sdot 119909푒] is obtained where 119890 is the number of enroll-ment learners 119909푖 = [119909푖1 119909푖2 119909푖푟] represents the behaviorfeatures of the 119894푡ℎ learner 119910 = [1199101 1199102 sdot sdot sdot 119910푒] is the labelmatrix where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚 is the dropoutlabel of the 119894푡ℎ learner

Effective learning time is another kind of behavior featureand represents the actual time that a learner spends on

learning In practice a learner may click a video and thenleaves for something else Therefore we set a thresholdbetween two activity clicksThe time exceeding the thresholdwill not be counted

24 Dropout Prediction Based on DT-ELM Algorithm Deci-sion Layer The decision layer implements the feature selec-tion using decision tree based on the maximum informationgain ratio [41]119863 is the input of decision layer Each instancein 119863 is represented as 119909푖 = [119909푖1 119909푖2 119909푖푟] isin 119877푟 119910푖 =[119910푖1 119910푖2 119910푖푚] isin 119877푚 is the class label of 119909푖 119863耠 is theoutput of decision layer only containing the selected featuresEach instance in119863耠 is represented as 119909푖119899119890119908 = [119909푖1 119909푖2 119909푖푛]which means there are 119899 selected features in119863耠

Decision tree is constructed by recursive partitioning119863 into smaller subsets 1198631 1198632 119863퐾 until reaching thespecified stopping criterion for example that all the subsetsbelong to a single class A single feature split is recursivelydefined for all nodes of the tree using some criterionInformation gain ratio is one of the most widely used criteriafor decision treeThe entropy which comes from informationtheory is described as

Info (119863) = minus 푚sum푐=1

119901푐 log2 (119901푐) (1)

where 119898 represents the number of classes 119901푐 is theprobability that an instance 119909푖 belongs to the class 119888 The

4 Mathematical Problems in Engineering

Inputs119877 Learning behavior records of a course119890 Enrollment number of learners119903 Number of behavior features119889 Duration of the courseOutputs119909 Feature matrix with size of 119890 times 119903119910 Label matrix with size of 119890 times 1198981 119877 is the set of learning behavior records for each course It is grouped by

the behavior types Let 119877 = [1198771 1198772 sdot sdot sdot 119877푡 sdot sdot sdot 119877푢] be the record set oflearning behaviors where 119905 = 1 2 sdot sdot sdot 119903

2 Divide the duration of this course into119882 weeks3 For each learning behavior record in 119877푡4 If this record occurred in week 119902 generated by learner 1198945 119909(푞)푖푡 + = 16 For each learner the 119905푡ℎ learning behavior feature119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] is obtained7 Form the feature matrix 119909 = [1199091 1199092 sdot sdot sdot 119909푒] where 119909푖 = [119909푖1 119909푖2 119909푖푟] is 119903

types of behavior features of the 119894푡ℎ learner8 Form the label matrix 119910 = [1199101 1199102 sdot sdot sdot 119910푒] where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚

Algorithm 1 Feature Extraction Processing

split rule is defined by information gain which represents theexpected reduction in entropy after the split according to agiven feature The information gain is described as follows

119866119886119894119899 (119860) = Info (119863) minus Info퐴 (119863) (2)

Info퐴(119863) is the conditional entropy which represents theentropy of 119863 based on the partitioning by feature 119860 It iscomputed by the weighted average over all sets resulting fromthe split shown in (3) where |119863푘||119863| acts as the weight of the119896푡ℎ partition

Info퐴 (119863) =퐾sum푘=1

1003816100381610038161003816119863푘1003816100381610038161003816|119863| times Info (119863푘) (3)

The information gain ratio extends the information gainwhich applies a kind of normalization to information gainusing a ldquosplit informationrdquo value

SplitInfo퐴 (119863) = minus퐾sum푘=1

1003816100381610038161003816119863퐾1003816100381610038161003816|119863| times log2 (1003816100381610038161003816119863퐾1003816100381610038161003816|119863| ) (4)

The feature with the maximum information gain ratio isselected as the splitting feature which is defined as follows

119866119886119894119899119877119886119905119894119900 (119860) = 119866119886119894119899 (119860)SplitInfo퐴 (119863) (5)

The decision tree is constructed by this way Each internalnode of the decision tree corresponds to one of the selectedfeatures Each terminal node represents the specific class of acategorical variable

Enhancement Layer Because the classification ability ofeach selected feature is different the impact of this featureon each leaf node is different The root node has the best

Table 1 Mapping rules between DT and ELM

DT ELMInternal nodes Input neuronsTerminal nodes Hidden neuronsDistinct classes Output neurons

Paths between nodes Connections between inputlayer and hidden layer

classification ability and it connects to all leaf nodes Thatmeans the root node has the greatest impact on all leaf nodesEach internal node except the root node connects to fewerleaf nodes and has less impact on the connected leaf nodesEach value of the selected feature is multiplied by a number119897푠 which is equal to the number of leaf nodes it connected toin the decision tree It is represented as follows

119909푖푠119899119890119908 = 119909푖푠119900119903119894 times 119897푠 119904 = 1 2 119899 (6)

By this step we enhance the impact of the selectedfeatures on leaf nodes based on the tree structure

Mapping Layer In the mapping layer inspired by theentropy net [42] wemap the decision tree to the ELM Table 1shows the corresponding mapping rules between nodes indecision tree and neurons in ELM

The number of internal nodes in decision tree equals thenumber of neurons in the input layer of ELM Each leaf nodein decision tree is mapped to a corresponding neuron inthe hidden layer of ELM The number of distinct classes indecision tree equals the number of neurons in the output layerof ELMThe paths between nodes in decision tree decide theconnections between input layer and hidden layer of ELMThe result of this mapping principle determines the numbers

Mathematical Problems in Engineering 5

x1get1 x1get1

x4 get4

x3 get4

x1get3x1get3

x2ge t2 x2ge t2

nn internal l terminal m distinct n input neurons l hidden neurons m output neurons

Improved ELMDecision Tree

Mapping

odes nodes classes

xi isin Rn yi isin Rm

(wb)

Figure 2 An illustration of mapping the decision tree to ELM The internal nodes leaf nodes and distinct classes in the decision tree aremapped to the corresponding neurons in the input layer hidden layer and output layer of ELM respectively The paths between nodesdetermine the connections between input layer and hidden layer

of neurons in each layer Meanwhile it improves ELM withfewer connections

Figure 2 shows an illustration of mapping the decisiontree to ELM The first neuron of the input layer is mappedfrom the root node of the decision tree It connects to allhidden neurons mapped from all leaf nodes That meansthe first neuron has impact on every hidden neuron Thesecond neuron of the input layer connects to the four hiddenneurons according to the decision tree structure That meansthe second neuron has impact on the four hidden neuronsThedashed lines show that there exist no corresponding pathsbetween the internal nodes and leaf nodes in the decisiontree Therefore there exist no connections between the cor-responding neurons in the input layer and the correspondingneurons in the hidden layer of ELM

Improved ELMOnce the structure of ELM is determinedthe enhanced features are input into the ELM The connec-tionless weights between input layer and hidden layer areinitialized with zero or extremely small values very close tozero Other connection weights as well as biases of the hiddenlayer are initialized randomlyUnique optimal solution can beobtained once the numbers of hidden neurons and initializedparameters are determined

There are 119873 random instances (119909푖 119910푖) where 119909푖 =[119909푖1 119909푖2 119909푖푛]푇 isin 119877푛 119910푖 = [119910푖1 119910푖2 119910푖푚]푇 isin 119877푚 ASLFN with 119871 hidden neurons can be represented as follows

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119900푖 119894 = 1 2 119873 (7)

where 119892(119909) is the activation function of hidden neuron119908푗 = [119908푗1 119908푗2 119908푗푛]푇 is the weight vector of inputneurons connecting to 119894푡ℎ hidden neuron The inner product

of 119908푗 and 119909푖 is 119908푗 sdot 119909푖 The bias of the 119895푡ℎ hidden neuron is 119887푗The weight vector of the 119895푡ℎ hidden neuron connecting to theoutput neurons is 120573푗 = [120573푗1 120573푗2 120573푗푚]푇

The target of a standard SLFN is to approximate these119873instances with zero error which is represented as (8) wherethe desired output is 119900푖 and the actual output is 119910푖

푁sum푖=1

1003817100381710038171003817119900푖 minus 119910푖1003817100381710038171003817 = 0 (8)

In other words there exist proper 119908푗 119887푗 and 120573j such that

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119910푖 119894 = 1 2 119873 (9)

Equation (9) can be represented completely as follows

119867120573 = 119884 (10)

where

119867 = [[[[[

119892 (1199081 sdot 1199091 + 1198871) sdot sdot sdot 119892 (119908퐿 sdot 1199091 + 119887퐿) 119892 (1199081 sdot 119909푁 + 119887퐿) sdot sdot sdot 119892 (119908퐿 sdot 119909N + 119887퐿)

]]]]]푁times퐿

(11)

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

4 Mathematical Problems in Engineering

Inputs119877 Learning behavior records of a course119890 Enrollment number of learners119903 Number of behavior features119889 Duration of the courseOutputs119909 Feature matrix with size of 119890 times 119903119910 Label matrix with size of 119890 times 1198981 119877 is the set of learning behavior records for each course It is grouped by

the behavior types Let 119877 = [1198771 1198772 sdot sdot sdot 119877푡 sdot sdot sdot 119877푢] be the record set oflearning behaviors where 119905 = 1 2 sdot sdot sdot 119903

2 Divide the duration of this course into119882 weeks3 For each learning behavior record in 119877푡4 If this record occurred in week 119902 generated by learner 1198945 119909(푞)푖푡 + = 16 For each learner the 119905푡ℎ learning behavior feature119909푖푡 = [119909(1)푖푡 119909(2)푖푡 sdot sdot sdot 119909(푞)푖푡 sdot sdot sdot 119909(푊)푖푡 ] is obtained7 Form the feature matrix 119909 = [1199091 1199092 sdot sdot sdot 119909푒] where 119909푖 = [119909푖1 119909푖2 119909푖푟] is 119903

types of behavior features of the 119894푡ℎ learner8 Form the label matrix 119910 = [1199101 1199102 sdot sdot sdot 119910푒] where 119910푖 = [119910푖1 119910푖2 119910푖푚] isin 119877푚

Algorithm 1 Feature Extraction Processing

split rule is defined by information gain which represents theexpected reduction in entropy after the split according to agiven feature The information gain is described as follows

119866119886119894119899 (119860) = Info (119863) minus Info퐴 (119863) (2)

Info퐴(119863) is the conditional entropy which represents theentropy of 119863 based on the partitioning by feature 119860 It iscomputed by the weighted average over all sets resulting fromthe split shown in (3) where |119863푘||119863| acts as the weight of the119896푡ℎ partition

Info퐴 (119863) =퐾sum푘=1

1003816100381610038161003816119863푘1003816100381610038161003816|119863| times Info (119863푘) (3)

The information gain ratio extends the information gainwhich applies a kind of normalization to information gainusing a ldquosplit informationrdquo value

SplitInfo퐴 (119863) = minus퐾sum푘=1

1003816100381610038161003816119863퐾1003816100381610038161003816|119863| times log2 (1003816100381610038161003816119863퐾1003816100381610038161003816|119863| ) (4)

The feature with the maximum information gain ratio isselected as the splitting feature which is defined as follows

119866119886119894119899119877119886119905119894119900 (119860) = 119866119886119894119899 (119860)SplitInfo퐴 (119863) (5)

The decision tree is constructed by this way Each internalnode of the decision tree corresponds to one of the selectedfeatures Each terminal node represents the specific class of acategorical variable

Enhancement Layer Because the classification ability ofeach selected feature is different the impact of this featureon each leaf node is different The root node has the best

Table 1 Mapping rules between DT and ELM

DT ELMInternal nodes Input neuronsTerminal nodes Hidden neuronsDistinct classes Output neurons

Paths between nodes Connections between inputlayer and hidden layer

classification ability and it connects to all leaf nodes Thatmeans the root node has the greatest impact on all leaf nodesEach internal node except the root node connects to fewerleaf nodes and has less impact on the connected leaf nodesEach value of the selected feature is multiplied by a number119897푠 which is equal to the number of leaf nodes it connected toin the decision tree It is represented as follows

119909푖푠119899119890119908 = 119909푖푠119900119903119894 times 119897푠 119904 = 1 2 119899 (6)

By this step we enhance the impact of the selectedfeatures on leaf nodes based on the tree structure

Mapping Layer In the mapping layer inspired by theentropy net [42] wemap the decision tree to the ELM Table 1shows the corresponding mapping rules between nodes indecision tree and neurons in ELM

The number of internal nodes in decision tree equals thenumber of neurons in the input layer of ELM Each leaf nodein decision tree is mapped to a corresponding neuron inthe hidden layer of ELM The number of distinct classes indecision tree equals the number of neurons in the output layerof ELMThe paths between nodes in decision tree decide theconnections between input layer and hidden layer of ELMThe result of this mapping principle determines the numbers

Mathematical Problems in Engineering 5

x1get1 x1get1

x4 get4

x3 get4

x1get3x1get3

x2ge t2 x2ge t2

nn internal l terminal m distinct n input neurons l hidden neurons m output neurons

Improved ELMDecision Tree

Mapping

odes nodes classes

xi isin Rn yi isin Rm

(wb)

Figure 2 An illustration of mapping the decision tree to ELM The internal nodes leaf nodes and distinct classes in the decision tree aremapped to the corresponding neurons in the input layer hidden layer and output layer of ELM respectively The paths between nodesdetermine the connections between input layer and hidden layer

of neurons in each layer Meanwhile it improves ELM withfewer connections

Figure 2 shows an illustration of mapping the decisiontree to ELM The first neuron of the input layer is mappedfrom the root node of the decision tree It connects to allhidden neurons mapped from all leaf nodes That meansthe first neuron has impact on every hidden neuron Thesecond neuron of the input layer connects to the four hiddenneurons according to the decision tree structure That meansthe second neuron has impact on the four hidden neuronsThedashed lines show that there exist no corresponding pathsbetween the internal nodes and leaf nodes in the decisiontree Therefore there exist no connections between the cor-responding neurons in the input layer and the correspondingneurons in the hidden layer of ELM

Improved ELMOnce the structure of ELM is determinedthe enhanced features are input into the ELM The connec-tionless weights between input layer and hidden layer areinitialized with zero or extremely small values very close tozero Other connection weights as well as biases of the hiddenlayer are initialized randomlyUnique optimal solution can beobtained once the numbers of hidden neurons and initializedparameters are determined

There are 119873 random instances (119909푖 119910푖) where 119909푖 =[119909푖1 119909푖2 119909푖푛]푇 isin 119877푛 119910푖 = [119910푖1 119910푖2 119910푖푚]푇 isin 119877푚 ASLFN with 119871 hidden neurons can be represented as follows

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119900푖 119894 = 1 2 119873 (7)

where 119892(119909) is the activation function of hidden neuron119908푗 = [119908푗1 119908푗2 119908푗푛]푇 is the weight vector of inputneurons connecting to 119894푡ℎ hidden neuron The inner product

of 119908푗 and 119909푖 is 119908푗 sdot 119909푖 The bias of the 119895푡ℎ hidden neuron is 119887푗The weight vector of the 119895푡ℎ hidden neuron connecting to theoutput neurons is 120573푗 = [120573푗1 120573푗2 120573푗푚]푇

The target of a standard SLFN is to approximate these119873instances with zero error which is represented as (8) wherethe desired output is 119900푖 and the actual output is 119910푖

푁sum푖=1

1003817100381710038171003817119900푖 minus 119910푖1003817100381710038171003817 = 0 (8)

In other words there exist proper 119908푗 119887푗 and 120573j such that

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119910푖 119894 = 1 2 119873 (9)

Equation (9) can be represented completely as follows

119867120573 = 119884 (10)

where

119867 = [[[[[

119892 (1199081 sdot 1199091 + 1198871) sdot sdot sdot 119892 (119908퐿 sdot 1199091 + 119887퐿) 119892 (1199081 sdot 119909푁 + 119887퐿) sdot sdot sdot 119892 (119908퐿 sdot 119909N + 119887퐿)

]]]]]푁times퐿

(11)

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Mathematical Problems in Engineering 5

x1get1 x1get1

x4 get4

x3 get4

x1get3x1get3

x2ge t2 x2ge t2

nn internal l terminal m distinct n input neurons l hidden neurons m output neurons

Improved ELMDecision Tree

Mapping

odes nodes classes

xi isin Rn yi isin Rm

(wb)

Figure 2 An illustration of mapping the decision tree to ELM The internal nodes leaf nodes and distinct classes in the decision tree aremapped to the corresponding neurons in the input layer hidden layer and output layer of ELM respectively The paths between nodesdetermine the connections between input layer and hidden layer

of neurons in each layer Meanwhile it improves ELM withfewer connections

Figure 2 shows an illustration of mapping the decisiontree to ELM The first neuron of the input layer is mappedfrom the root node of the decision tree It connects to allhidden neurons mapped from all leaf nodes That meansthe first neuron has impact on every hidden neuron Thesecond neuron of the input layer connects to the four hiddenneurons according to the decision tree structure That meansthe second neuron has impact on the four hidden neuronsThedashed lines show that there exist no corresponding pathsbetween the internal nodes and leaf nodes in the decisiontree Therefore there exist no connections between the cor-responding neurons in the input layer and the correspondingneurons in the hidden layer of ELM

Improved ELMOnce the structure of ELM is determinedthe enhanced features are input into the ELM The connec-tionless weights between input layer and hidden layer areinitialized with zero or extremely small values very close tozero Other connection weights as well as biases of the hiddenlayer are initialized randomlyUnique optimal solution can beobtained once the numbers of hidden neurons and initializedparameters are determined

There are 119873 random instances (119909푖 119910푖) where 119909푖 =[119909푖1 119909푖2 119909푖푛]푇 isin 119877푛 119910푖 = [119910푖1 119910푖2 119910푖푚]푇 isin 119877푚 ASLFN with 119871 hidden neurons can be represented as follows

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119900푖 119894 = 1 2 119873 (7)

where 119892(119909) is the activation function of hidden neuron119908푗 = [119908푗1 119908푗2 119908푗푛]푇 is the weight vector of inputneurons connecting to 119894푡ℎ hidden neuron The inner product

of 119908푗 and 119909푖 is 119908푗 sdot 119909푖 The bias of the 119895푡ℎ hidden neuron is 119887푗The weight vector of the 119895푡ℎ hidden neuron connecting to theoutput neurons is 120573푗 = [120573푗1 120573푗2 120573푗푚]푇

The target of a standard SLFN is to approximate these119873instances with zero error which is represented as (8) wherethe desired output is 119900푖 and the actual output is 119910푖

푁sum푖=1

1003817100381710038171003817119900푖 minus 119910푖1003817100381710038171003817 = 0 (8)

In other words there exist proper 119908푗 119887푗 and 120573j such that

퐿sum푗=1

120573푗119892 (119908푗 sdot 119909푖 + 119887푗) = 119910푖 119894 = 1 2 119873 (9)

Equation (9) can be represented completely as follows

119867120573 = 119884 (10)

where

119867 = [[[[[

119892 (1199081 sdot 1199091 + 1198871) sdot sdot sdot 119892 (119908퐿 sdot 1199091 + 119887퐿) 119892 (1199081 sdot 119909푁 + 119887퐿) sdot sdot sdot 119892 (119908퐿 sdot 119909N + 119887퐿)

]]]]]푁times퐿

(11)

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

6 Mathematical Problems in Engineering

1 Give the training set 119879 = (119909푖 119910푖) | 119909푖 isin 119877푛 119910푖 isin 119877푚 activationfunction 119892(119909) number of hidden neurons 119871

2Randomly assign input weight vector 119908푗 and the bias 119887푗except the connectionless weights and biases betweeninput layer and hidden layer with zero

3 Calculate the hidden layer output matrix1198674 Calculate the output weight vector 120573 = Hdagger119884 where Hdagger is

the Moore-Penrose generalized inverse of matrix1198675 Obtain the predicted values based on the input variables

Algorithm 2 Improved ELM

Table 2 Dataset description

Information Description

Enrollment Records denoting what course each learner hasenrolled (120542 entries)

Object Relationships of courses and modulesLog Learning behavior records (8157277 entries)Date The start and end time of courses

Truth Labels indicating whether learners dropout orcomplete the whole courses

120573 = [[[[[

120573푇1120573푇퐿

]]]]]퐿times푀

(12)

119884 = [[[[[

119910푇1119910푇푁

]]]]]푁times푀

(13)

Once 119908푗 and 119887푗 are determined the output matrix of thehidden layer 119867 is uniquely calculated The training processcan be transformed into finding a least-squares solution ofthe linear system in (9) The algorithm can be described as inAlgorithm 2

3 Experimental Results

31 Dataset and Preprocess The effectiveness of our pro-posed algorithm is tested on the benchmark dataset KDD2015 which contains five kinds of information The detaileddescription of the dataset is shown in Table 2 From the rawdata we define and extract several behavior features Thedescription is shown in Table 3

The next step is to label each record according to thebehavior features If a learner has activities in the comingweek the dropout label is 0 Otherwise the dropout label is1 A learner may begin learning in the later week but not thefirst week and in the first several weeks the learnerwill not belabeled as dropout the weekwhen the learner begins learningis seen as the first actually learning week for this learner The

Table 3 Extracted behavior features

Features Description

1199091 The number of participating objects of course perweek

1199092-1199098 The behavior numbers of access page close problemvideo discussion navigating wiki per week

1199099-11990910 The total and average numbers of all behaviors perweek

11990911 The number of active days per week11990912 The time consumption per week

11990913-11990916 The behavior numbers of access page closeproblem video from browser per week respectively

11990917-11990921The behavior numbers of access discussion

problem navigating wiki from server per weekrespectively

11990922-11990923 The numbers of all behaviors from browser andserver per week respectively

first several weeks data will be deleted and then other weeksdata will be labeled

32 Experimental Setting and Evaluation Experiments arecarried out in MATLAB R2016b and Python 27 under adesktop computer with Intel 25GHz CPU and 8G RAMThe LIBSVM library [43] and Keras library [44] are used toimplement the support vector and LSTM respectively

In order to evaluate the effectiveness of the proposedalgorithm accuracy area under curve (AUC) F1-score andtraining time are used as evaluation criteria Accuracy isthe proportion of correct prediction including dropout andretention Precision is the proportion of dropout learnerspredicted correctly by the classifier in all predicted dropoutlearners Recall is the proportion of dropout learners pre-dicted correctly by the classifier in all real dropout learnersF1-score is the harmonic mean of precision and recall

AUC depicts the degree to which a classifier makesa distinction between positive and negative samples It isinvariant to imbalanced data [45] The receiver operatingcharacteristics (ROC) plot the trained classifierrsquos true positiverate against the false positive rate The AUC is the integralover the interval [0 1] of the ROC curve The closer thenumber to 1 the better the classification performance

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Mathematical Problems in Engineering 7

1 2 3 4 5Week

06

07

08

09

10Ac

cura

cy

DT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

WeekDT-ELMTA

DLOA

1 2 3 4 506

07

08

09

10

AUC

1 2 3 4 506

07

08

09

10

F1-s

core

Figure 3 The overall performance of DT-ELM traditional machine learning algorithms (TA) deep learning (DL) and optimizationalgorithm (OA) weekly DT-ELM is about 1278 2219 and 687 higher than baseline algorithms in terms of overall accuracy AUCand F1-score respectively

Table 4 The overall average training time (s) of DT-ELM tradi-tional machine learning algorithms (TA) deep learning algorithm(DL) and optimization algorithm (OA)

Algorithms DT-ELM TA DL OATime 16383 2427 gt100 33649

33 Overall Performance We choose ten courses for experi-mentsThe enrollment number ranges from several hundredsto about ten thousands We divide the baselines into threecategories separately traditional machine learning deeplearning and optimization algorithm Traditional machinelearning algorithms include logistic regression support vec-tor machine decision tree back propagation neural networkand entropy net LSTM is adopted as the deep learning algo-rithm Genetic algorithm and ELM (GA-ELM) are combinedas the optimization algorithm aiming to improve the ELM

The results of overall performance in terms of accuracyAUC and F1-score are shown in Figure 3 The results ofoverall average training time are shown in Table 4

Although there exists a wide range of course enrollmentsthe proposed DT-ELM algorithm performs much betterthan the three categories of baseline algorithms DT-ELM is8928 8586 and 9148 and about 1278 2219 and687 higher than baseline algorithms in terms of overallaccuracy AUC and F1-score respectively

To be specific the traditionalmachine learning algorithmperforms the worst in terms of the three criteria The resultsof the deep learning algorithm are much better Howeverthe deep learning algorithm has the longest training timeAlthough the optimization algorithm performs better thanthe deep learning algorithm it does not perform as good asDT-ELM DT-ELM performs the best in terms of accuracyAUC and F1-score Meanwhile it requires the least trainingtime due to noniterative training process The results haveproved that DT-ELM reaches the goal of dropout predictionaccurately and timely

Another conclusion is that the last two weeks get betterperformance than the other weeks To identify the reasonwe make a statistical analysis of dropout rate weekly Wefind that compared to the first three weeks the averagedropout rate of the last two weeks of courses is higher It

means the behavior of learners is more likely to follow apattern On the other hand it also illustrates the importanceof dropout prediction The dropout rates in the later stageof courses are higher than the initial stage generally So it isbetter to find at-risk learners early in order to make effectiveinterventions

34 Impact of Feature Selection To verify the effectivenessof feature selection we make a comparison between DT-ELM and ELM The results are shown in Figure 4 DT-ELM is about 278 287 and 241 higher than ELM interms of accuracy AUC and F1-score respectively It provesthat feature selection has promoted the prediction resultsChoosing asmany features as possiblemay not be appropriatefor dropout prediction According to the entropy theorymentioned previously features with different gain ratios havedifferent classification ability Features with low gain ratiosmay weaken the classification ability

Although each course has different behavior featurestwo conclusions can be obtained The average number ofselected features is 12 which is less than the number ofextracted features It proves that using fewer features fordropout prediction can achieve better results than using allextracted features Moreover we find that discussion activedays and time consumption are the three most importantfactors affecting prediction results

35 Impact of Feature Enhancement and Connection Initial-ization To verify the impact of feature enhancement wemake a comparison between DT-ELM and itself withoutfeature enhancement (Without-FE) Similarly to verity theimpact of connection initialization we make a comparisonbetween DT-ELM with itself without connection initializa-tion (Without-IC) The results of the three algorithms areshown in Figure 5

The results of Without-FE and Without-IC are not asgood as DT-ELM DT-ELM is about 094 121 and 09higher than Without-IC in terms of accuracy AUC andF1-score respectively It is also about 213 225 and198 higher thanWithout-FEThe values of Without-IC arehigher than Without-FE in terms of the three criteria whichindicates that feature enhancement plays a more important

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

8 Mathematical Problems in Engineering

1 2 3 4 5Week

080

085

090

095

100Ac

cura

cy

DT-ELMELM

1 2 3 4 5Week

070

080

085

090

095

AUC

DT-ELMELM

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

DT-ELMELM

Figure 4 Impact of feature selection DT-ELM is about 278 287 and 241 higher than ELM in terms of accuracy AUC and F1-scorerespectively

1 2 3 4 5Week

080

085

090

095

100

Accu

racy

1 2 3 4 5Week

080

085

090

095

AUC

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

DT-ELMWithout-ICWithout-EF

1 2 3 4 5Week

080

085

090

095

100

F1-s

core

Figure 5 Impact of feature enhancement and connection initialization Without-IC means DT-ELM without connection initialization andWithout-FE means DT-ELM without feature enhancement The results indicate that feature enhancement plays a more important role thanconnection initialization

1 2 3 4 5075

080

085

090

095

100

Accu

racy

1 2 3 4 5075

080

085

090

095

AUC

1 2 3 4 5Week

075

080

085

090

095

100

F1-s

core

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Week

DT-ELM-SEDT-ELM-LE

Figure 6 Comparison of different scales of enrollment in terms of accuracy AUC and F1-score DT-ELM-SE contains courses with smallernumbers of enrollments and DT-ELM-LE contains courses with larger numbers of enrollments

role than connection initialization That is because featureswith better classification ability are enhanced depending onhow much impact each feature has on other neurons

36 Comparison of Different Numbers of Enrollments Toverify the effectiveness of DT-ELM on different numbers ofenrollments two groups of experiments are conducted andthe results are shown in Figure 6 The first group (DT-ELM-SE) contains courses with smaller numbers of enrollmentsranging from several hundreds to about one thousand Thesecond group (DT-ELM-LE) contains courses with larger

numbers of enrollments ranging from several thousands toabout then thousands

Generally the more the data used for training the betterthe classification results DT-ELM-LE is about 259 341and 254 higher than DT-ELM-SE in terms of accuracyAUC and F1-score respectively The average training timeof DT-ELM-SE and DT-ELM-LE is 16014s and 16752srespectively Although courses with less enrollments achievelower values than courses withmore enrollments for the threecriteria the proposed algorithm still performs better thanthe other algorithms That means dropout can be predicted

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Mathematical Problems in Engineering 9

Table 5 The average accuracy of different algorithms weekly including logistic regression (LR) support vector machine (SVM) backpropagation neural network (BP) decision tree (DT) entropy net (EN) LSTM ELM and GA-ELM

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08303 06633 05643 06992 06876 07067 07067 08103 08149Week2 08568 0686 06581 0705 06957 07157 07218 08367 08462Week3 0872 07284 07218 07443 07325 0757 07758 08401 08467Week4 09642 08726 0827 08892 08627 08773 08773 09492 09507Week5 0941 08498 08618 08447 08526 08656 08656 09041 09154

Table 6 The average AUC of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08182 0632 06272 06189 05979 06256 06295 07984 08066Week2 08357 06255 05914 06216 06243 06264 06844 08181 08243Week3 08383 06527 06109 06586 06438 0667 07113 08176 08261Week4 09412 07103 06454 06117 06507 06246 07698 09079 09263Week5 08596 06652 06918 07148 06954 07166 07701 08268 08413

Table 7 The average F1-score of different algorithms weekly

WeekAlgorithms

DT-ELM LR SVM BP DT EN LSTM ELM GA-ELMWeek1 08542 07378 05907 07719 0798 0788 08025 08342 08453Week2 08956 07913 07341 08091 08 08238 08287 08677 08893Week3 09021 08264 08107 08392 08321 08511 08508 08751 08878Week4 09667 09315 08982 09348 09222 09358 09301 09488 09565Week5 09558 09118 09185 09092 09149 0905 09191 09389 09457

accurately and timely in courses with different numbers ofenrollments

37 Performance with Different Algorithms The detailedresults of different algorithms are shown in Tables 5ndash7Observing the results logistic regression and support vectormachine achieve lower values in accuracy AUC and F1-scorethan the other algorithms Back propagation neural networkand decision tree perform better than logistic regressionand support vector machine Entropy net utilizes decisiontree to optimize performance and it performs better thanback propagation neural network Different from entropynet we utilize decision tree to optimize ELM Although theperformance of LSTM is much better than the traditionalalgorithms it is time intensive based on previous results

It is obvious that ELM-based algorithms perform betterthan other algorithmsThat is because ELM can get the small-est training error by calculating the least-squares solutions ofthe network output weights GA-ELM achieves good resultsdue to its function of feature selection However it also needsiterative training and lacks structure initialization of the ELMDT-ELMoptimizes the structure of ELM and performsmuchbetter than ELM and GA-ELM

4 Discussion

Sufficient experiments are designed and implemented fromvarious perspectives For the overall performance it achievesthe best performance compared to different algorithmsBesides we also explain why the last two weeks get betterperformance than the other weeks The experimental resultsof feature selection demonstrate the importance of the featureselectionThe reason is that features with different gain ratioshave different classification ability which helps get a higherperformance Feature enhancement and connection initial-ization both contribute to results and feature enhancementplays a more important role than connection initializationdue to its higher promotion on three criteria The results ofdifferent numbers of enrollments prove the universality ofour algorithm

The experimental results have proved the effectivenessand universality of our algorithm However there is stillan important question that needs to be considered Dothe instructors need to make interventions for all dropoutlearners Our goal is to find the at-risk learners who maystop learning in the coming week and help instructorsto take corresponding interventions From the perspectiveof behavior break means dropout However interventions

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

10 Mathematical Problems in Engineering

would not be taken for all dropout learners because whilemaking interventions besides the behavior factor some otherfactors [46] such as age occupation motivation and learnertype [47] should be taken into consideration Our futurework is to make interventions for at-risk learners based onlearnersrsquo behaviors and background information

5 Conclusion

Dropout prediction is an essential prerequisite to makeinterventions for at-risk learners To address this issue wepropose a hybrid algorithmwhich combines the decision treeand ELM which successfully settles the unsolved problemsincluding behavior discrepancy iterative training and struc-ture initialization

Compared to the evolutionary methods DT-ELM selectsfeatures based on the entropy theory Different from theneuron network basedmethods it can analytically determinethe output weights without updating parameters by repeatediterationsThebenchmark dataset inmultiple criteria demon-strates the effectiveness of DT-ELM and the results show thatour algorithm can make dropout prediction accurately

Data Availability

The dataset used to support this study is the open datasetKDD2015which is available fromhttpdata-miningphilippe-fournier-vigercomthe-kddcup-2015-dataset-download-link

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The research for this paper is supported by National NaturalScience Foundation of China (Grant no 61877050) and openproject fund of Shaanxi Province Key Lab of Satellite andTerrestrial Network Tech Shaanxi province financed projectsfor scientific and technological activities of overseas students(Grant no 202160002)

References

[1] M Vitiello S Walk D Helic V Chang and C Guetl ldquoPredict-ing dropouts on the successive offering of a MOOCrdquo in Pro-ceedings of the 2017 International Conference MOOC-MAKERMOOC-MAKER 2017 pp 11ndash20 Guatemala November 2017

[2] M Vitiello S Walk V Chang R Hernandez D Helic and CGuetl ldquoMOOCdropouts amulti-system classifierrdquo inEuropeanConference on Technology Enhanced Learning pp 300ndash314Springer 2017

[3] E J Emanuel ldquoOnline education MOOCs taken by educatedfewrdquo Nature vol 503 no 7476 p 342 2013

[4] G Creeddikeogu and C Carolyn ldquoAre you mooc-ing yeta review for academic librariesrdquo Kansas Library AssociationCollege amp University Libraries vol 3 no 1 pp 9ndash13 2013

[5] S Dhawal ldquoBy the numbers Moocs in 2013rdquo Class Central2013

[6] S Dhawal ldquoBy the numbers Moocs in 2018rdquo Class Central2018

[7] H Khalil and M Ebner ldquoMoocs completion rates and pos-sible methods to improve retention - a literature reviewrdquo inWorld Conference on Educational Multimedia Hypermedia andTelecommunications 2014

[8] D F O Onah J E Sinclair and R Boyatt ldquoDropout rates ofmassive open online courses Behavioural patternsrdquo in Interna-tional Conference on Education and New Learning Technologiespp 5825ndash5834 2014

[9] R Rivard Measuring the mooc dropout rate Inside Higher Ed8 2013 2013

[10] J Qiu J Tang T X Liu et al ldquoModeling and predicting learningbehavior in MOOCsrdquo in ACM International Conference onWebSearch and Data Mining pp 93ndash102 2016

[11] Y Zhu L Pei and J Shang ldquoImproving video engagementby gamification a proposed Design of MOOC videosrdquo inInternational Conference on Blended Learning pp 433ndash444Springer 2017

[12] R Bartoletti ldquoLearning through design Mooc development asa method for exploring teaching methodsrdquo Current Issues inEmerging eLearning vol 3 no 1 p 2 2016

[13] S L Watson J Loizzo W R Watson C Mueller J Lim andP A Ertmer ldquoInstructional design facilitation and perceivedlearning outcomes an exploratory case study of a human traf-ficking MOOC for attitudinal changerdquo Educational TechnologyResearch and Development vol 64 no 6 pp 1273ndash1300 2016

[14] S Halawa ldquoAttrition and achievement gaps in online learningrdquoin Proceedings of the Second (2015) ACMConference on Learning Scale pp 57ndash66 March 2015

[15] K R Koedinger E A McLaughlin J Kim J Z Jia and N LBier ldquoLearning is not a spectator sport doing is better thanwatching for learning from a MOOCrdquo in Proceedings of theSecond (2015) ACMConference on Learning Scale pp 111ndash120Canada March 2015

[16] C Robinson M Yeomans J Reich C Hulleman and HGehlbach ldquoForecasting student achievement in MOOCs withnatural language processingrdquo in Proceedings of the 6th Interna-tional Conference on Learning Analytics and Knowledge LAK2016 pp 383ndash387 UK April 2016

[17] C Taylor K Veeramachaneni and U M OrsquoReilly ldquoLikelyto stop predicting stopout in massive open online coursesrdquoComputer Science 2014

[18] C Ye and G Biswas ldquoEarly prediction of student dropoutand performance inMOOCs using higher granularity temporalinformationrdquo Journal of Learning Analytics vol 1 no 3 pp 169ndash172 2014

[19] M Kloft F Stiehler Z Zheng and N Pinkwart ldquoPredictingMOOC dropout over weeks using machine learning methodsrdquoin EMNLP 2014 Workshop on Analysis of Large Scale SocialInteraction in Moocs pp 60ndash65 October 2014

[20] S Nagrecha J Z Dillon and N V Chawla ldquoMooc dropout pre-diction Lessons learned from making pipelines interpretablerdquoin Proceedings of the 26th International Conference on WorldWide Web Companion pp 351ndash359 International World WideWeb Conferences Steering Committee 2017

[21] D Peng and G Aggarwal ldquoModeling mooc dropoutsrdquo Entropyvol 10 no 114 pp 1ndash5 2015

[22] J Liang J Yang YWuC Li andL Zheng ldquoBig data applicationin education dropout prediction in edx MOOCsrdquo in 2016IEEE Second International Conference on Multimedia Big Data(BigMM) pp 440ndash443 IEEE April 2016

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Mathematical Problems in Engineering 11

[23] G Balakrishnan and D Coetzee ldquoPredicting student retentionin massive open online courses using hidden markov modelsrdquoElectrical Engineering and Computer Sciences University ofCalifornia at Berkeley 2013

[24] C Lang G Siemens A Wise and D Gasevic Handbook ofLearning Analytics Society for Learning Analytics Research(SoLAR) 2017

[25] M Fei and D-Y Yeung ldquoTemporal models for predictingstudent dropout in massive open online coursesrdquo in IEEEInternational Conference on Data Mining Workshop pp 256ndash263 USA November 2015

[26] W Li M Gao H Li Q Xiong J Wen and Z Wu ldquoDropoutprediction in MOOCs using behavior features and multi-view semi-supervised learningrdquo in Proceedings of the 2016International Joint Conference on Neural Networks IJCNN 2016pp 3130ndash3137 Canada July 2016

[27] MWen and C P Rose ldquoIdentifying latent study habits by min-ing learner behavior patterns in massive open online coursesrdquoin Proceedings of the 23rd ACM International Conference onInformation andKnowledgeManagement CIKM2014 pp 1983ndash1986 China November 2014

[28] S Lei A I Cristea M S Awan C Stewart and M HendrixldquoTowards understanding learning behavior patterns in socialadaptive personalized e-learningrdquo in Proceedings of the Nine-teenth Americas Conference on Information Systems pp 15ndash172003

[29] J H Yang and V Honavar ldquoFeature subset selection usinggenetic algorithmrdquo IEEE Intelligent Systems amp Their Applica-tions vol 13 no 2 pp 44ndash48 1998

[30] A K Das S Das and A Ghosh ldquoEnsemble feature selectionusing bi-objective genetic algorithmrdquo Knowledge-Based Sys-tems vol 123 pp 116ndash127 2017

[31] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine a new learning scheme of feedforward neural net-worksrdquo in 2004 IEEE International Joint Conference on NeuralNetworks vol 2 pp 985ndash990 IEEE July 2004

[32] G B Huang Q Y Zhu and C K Siew ldquoExtreme learningmachine theory and applicationsrdquoNeurocomputing vol 70 no1ndash3 pp 489ndash501 2006

[33] A Ramesh D Goldwasser B Huang H Daume and LGetoor ldquoLearning latent engagement patterns of students inonline coursesrdquo in Twenty-Eighth AAAI Conference on ArtificialIntelligence 2014

[34] J He J Bailey B I P Rubinstein and R Zhang ldquoIdentifyingat-risk students in massive open online coursesrdquo in Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence2005

[35] D Yang T Sinha D Adamson and C P Rose ldquoTurn on tunein drop out Anticipating student dropouts in massive openonline coursesrdquo in Proceedings of the 2013 NIPS Data-driveneducation workshop vol 11 p 14 2013

[36] M Sharkey and R Sanders ldquoA process for predicting MOOCattritionrdquo in Proceedings of the EMNLP 2014 Workshop onAnalysis of Large Scale Social Interaction in MOOCs pp 50ndash54Doha Qatar October 2014

[37] F Wang and L Chen ldquoA nonlinear state space model for iden-tifying at-risk students in open online coursesrdquo in Proceedingsof the 9th International Conference on Educational DataMiningvol 16 pp 527ndash532 2016

[38] A Bhattacherjee ldquoUnderstanding information systems contin-uance an expectation-confirmationmodelrdquoMISQuarterly vol25 no 3 pp 351ndash370 2001

[39] Z Junjie ldquoExploring the factors affecting learners continu-ance intention of moocs for online collaborative learning anextended ecm perspectiverdquo Australasian Journal of EducationalTechnology vol 33 no 5 pp 123ndash135 2017

[40] J K T Tang H Xie and T-L Wong ldquoA big data frameworkfor early identification of dropout students in MOOCrdquo Com-munications in Computer and Information Science vol 559 pp127ndash132 2015

[41] J R Quinlan ldquoInduction of decision treesrdquo Machine Learningvol 1 no 1 pp 81ndash106 1986

[42] I K Sethi ldquoEntropy nets from decision trees to neural net-worksrdquo Proceedings of the IEEE vol 78 no 10 pp 1605ndash16131990

[43] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology (TIST) vol 2 no 3 p 27 2011

[44] C Francois et al Keras The python deep learning libraryAstrophysics Source Code Library 2018

[45] J Whitehill K Mohan D Seaton Y Rosen and D TingleyldquoMOOC dropout prediction How to measure accuracyrdquo inProceedings of the Fourth (2017) ACM Conference on LearningScale pp 161ndash164 ACM 2017

[46] J DeBoer G S Stump D Seaton and L Breslow ldquoDiversityin mooc students backgrounds and behaviors in relationship toperformance in 6002 xrdquo in Proceedings of the Sixth LearningInternational Networks Consortium Conference vol 4 pp 16ndash19 2013

[47] P G de Barba G E Kennedy andMDAinley ldquoThe role of stu-dentsrsquo motivation and participation in predicting performancein aMOOCrdquo Journal of Computer Assisted Learning vol 32 no3 pp 218ndash231 2016

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: MOOC Dropout Prediction Using a Hybrid Algorithm Based ...MathematicalProblemsinEngineering 12345 Week 0.6 0.7 0.8 0.9 1.0 Accuracy DT-ELM TA DL OA Week DT-ELM TA Week DT-ELM TA DL

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom