icmscsme computer sciences,
TRANSCRIPT
i
ICMSCSME 2015 ISBN 978-602-72198-2-3
PROCEEDING
โExploring Mathematics and its Application in the Futureโ
2-3 October 2015
Hasanuddin University
Makassar, Indonesia
International
Conference on
Mathematics, Statistics,
Computer Sciences,
and Mathematics
Education
ii
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Proceeding
International Conference on Mathematics,
Statistics, Computer Sciences, and Mathematics
Education
(ICMSCSME) 2015
Editors
Dr. Nurdin, S.Si., M.Si.
Sri Astuti Thamrin, S.Si., M.Stat., Ph.D.
Edy Saputra, S.Si.
Reviewer
Prof. Dr. S. Arumugam
Dr. Kiki A. Sugeng
Dr. Loeky Haryanto, M.S., M.Sc., M.A.T.
Utami Dyah Syafitri, Ph.D.
Sri Astuti Thamrin, S.Si., M.Stat., Ph.D.
___________________________________________________________
Publisher: Fakultas MIPA UNHAS
Address: Jl. Perintis Kemerdekaan KM 10 Tamalanrea 90245 Makassar
___________________________________________________________
iii
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
FOREWORD FROM CHAIRPERSON OF ICMSCSME 2015
Assalamu โalaikum warahmatullahi wabarakatuh
And sincerely greetings to all.
It is my great pleasure to welcome all our invited
speakers and participants to International Conference
on Mathematics, Statistics, Computer Sciences, and
Mathematics Educations 2015 (ICMSCSME 2015)
jointly organized by Mathematics Department
Faculty of Mathematics & Natural Sciences
Hasanuddin University, and Indonesian
Mathematical Society (IndoMS) Sulawesi Region.
The conference is attended by around 200
participants, they are from Nepal, Philippines, India, Slovakia, Malaysia,
and Indonesia.
It is hoped that the ICMSCSME 2015 will catalyze and increase academic
and research collaborations between institutions involved, internationally
and also locally. I sincerely hope that this will spur further advancement of
scientific research and fruitful collaborations between organizations.
Finally, I would like to congratulate all the speakers and participants for
their participation in this ICMSCSME 2015. On behalf of the conferences
organizing committee, I would like to take this opportunity to thank all who
have contributed either directly of indirectly to the success of the event for
their generous contributions.
Finally, to all ICMSCSME 2015 committee thumbs up for a job well done.
May Allahโs blessing be upon you, Aamin.
Thank you,
Wassalam,
Dr. Nurdin
Chair of ICMSCSME 2015
iv
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
FOREWORD BY DEAN OF MATHEMATICS AND NATURAL
SCIENCES FACULTY HASANUDDIN UNIVERSITY
I would like to congratulate the Mathematics Department, Mathematics and
Natural Sciences Faculty, Hasanuddin University and Indonesian Mathematical
Society Region Sulawesi (IndoMS) for successfully organizing this joint
conference of the International Conference on Mathematics, Statistics, Computer
Sciences, and Education Mathematics 2015 (ICMSCSEM-2015) and the South
East Asian Mathematical Society School (SEAMS School) on Coding and Graphs
2015.
I give me great pleasure to welcome all distinguished guests, invited speakers,
invited lecturer, and participants to UNHAS and Makassar Indonesia. For some of
you, this visit may probably be your first visit to Makassar and I wish you
SELAMAT DATANG. I hope your brief visit to Makassar, in particular Makassar
will be a memorable and fruitful one.
UNHAS is committed towards fulfilling the strategy set forth in the National
Higher Education Plan for Indonesia Higher Education Institution. This
conference demonstrates the commitment of UNHAS to promote
internationalization as one of its main agenda. International research collaboration
commitment includes collaboration in building new findings, teaching, and
learning, and service activities to create opportunities for collaborative efforts,
thus enhancing research and possible research exchange.
It is the aspiration of UNHAS to be an established research university and
UNHAS is continuously promoting international research collaboration. I
sincerely hope that this joint conference will be a platform where international
research collaborations can be fostered and consequently nurtured.
Hopefully is of benefit to all readers.
Yours faithfully,
Dr.Eng. Amiruddin
v
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
TABLE OF CONTENTS
COVER โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. i
PREFACE โฆโฆโฆโฆ..โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. iii
TABLE OF CONTENTS โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ v
KEYNOTE SPEECH
Ismail Mohd and Ahmad Kadri Junoh, Non Usury Model For Conventional And
Islamic Banking System
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 1-19
Stephanus Suwarsono, Involving Culture in the Teaching and Learning of
Mathematics, as an Approach for Exploring and Understanding the Applications
of Mathematics
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 20-28
ORAL PRESENTATION
I. MATHEMATICS
M.1. Idha Sihwaningrum, Ari Wardayani, Suroto. Weak Type Inequality for
Maximal Operators on Morrey Spaces over Metric Measure Spaces
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 29-33
M.2. M.Imran, Naimah. Aris Existence of global attractor in strongly
continuous semigroups {๐๐ก} that has Lyapunov function of a metric space
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 34-40
M.3. Marjo-Anne B., Acob, Loyola, Jean O. An Algorithm for Propagating
Graceful Trees Using the Adjacency Matrix of a Given Graceful Caterpillar
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 41-48
M.4. Naimah Aris, A. Muh. Amil Siddik, Muh. Nur. The Existence of Global
Attractor for a Strongly Continuous Semigroup in Metric Space
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 49-54
M.5. Nur Erawaty, Integer Solutions for Pellโs Equation
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 55-62
M.6. Nur Ilmiyah Djalal, Armin Lawi, Aidawayati Rangkuti. Supply Chain
Management 3-Echelon
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 63-67
vi
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
M.7. Ratianingsih, R, Jaya, A.I, Santule, M.B. The Predator-Prey Model of
Fishery Cultivation in Conservation Zone with Top-Predator Attack
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 68-73
M.8. Usman Pagalay, Budimawan , Silvia Anggraini. The Stability Of
Harvesting Logistic Model On Fishery
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 74-78
II. STATISTICS
S.1. Erna Tri Herdiani. Variance Vector Control Chart with Mean Square
Successive Difference Method (MSSD) To Monitoring Variability Multivariate
Control Process
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ......... 79-84
S.2. Fatimah Ashara, Erna Tri Herdiani, and M. Saleh. Estimation Parameter of
Vector Autoregressive Model Using by Two Stage Least Square Method
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 85-88
S.3. Herianti, Anisa, Ladpoje. Multiple Imputation with PMM Method To
Estimate Missing Data On Nonresponse Item
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ 89-96
S.4. Miftahuddin, Anisa K., Asma G. A Review of the Time Effects in the SST
Data using Modified GamboostLSS Models
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ....... 97-107
S.5. Muflihah, Armin Lawi, Erna Tri Herdiani, Prediction of Rainfall by State
Space Model For Missing Data
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 108-112
III. COMPUTER SCIENCE
SC.1. Heliawaty Hamrul, Hardiana. Data Warehouse Software To Support
Arrangement Of Standart 3 Borang Accreditation
......................................................................................................................113-121
SC.2. Loeky Haryanto, Nurdin. An algorithm for searching the total edge-
irregularity strength of the corona graph PmโPn with minimal weights on the
edges of its subgraph Pm
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..โฆโฆโฆโฆโฆ122-132
SC.3 Monika S Rahayu, Armin Lawi, Sri Astuti Thamrin, The Comparison
Multiclass Classification Using Support Vector Machine.
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..133-144
vii
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
SC.4. Octavian, Supri Bin Hj Amir, Armin Lawi, Determining of the Relations of
Specific Variables in Massive Database Using Association Rule Learning
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..145-152
SC.5. Rahmawati, Sri Astuti Thamrin, Armin Lawi, Kernel Bayesian-Based
Classification For Microarray Data
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..153-159
SC.6. Rini Anggraini, Armin Lawi, Sri Astuti Thamrin, Ensemble Support
Vector Machine Optimization Using Adaboost Algorithm
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..160-166
IV. MATHEMATICS EDUCATION
ME.1. Budi Nurwahyu, Tatag Y.E.S, St. Suwarsono. Studentsโ Concept Image
of Permutation and Combination viewed from Difference of Gender with High
Ability in Basic Mathematics
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..167-185
ME.2. Ety Tejo Dwi Cahyowati. The Reduction of Student Misperception in Set
Topic through Cognitive Conflic
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..186-191
ME.3. Georgina Maria Tinungki. The Role of Cooperative Learning Type
Team Assisted Individualization to Improve the Studentsโ Mathematics
Communication Ability in the Subject of Probability Theory
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 192-199
ME.4. Masduki, Rita Pramujiyanti Khotimah. An Error Analysis of Students to
Solve The First Order Differential Equations
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..200-205
ME.5. Muslimin, Muh. Hajarul Aswad A. Alternative Completion of Poverty in
Indonesia through Mudharabah
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..206-210
ME.6. Nasrullah. Using Daily Problems to Measure Math Literacy and
Characterise Mathematical Abilities for Students in South Sulawesi
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..211-218
ME.7. Nining Setyaningsih, Sri Rejeki dan Sri Sutarni. Developing A
Mathematics Instructional Model Based on RAKIR (Child Friendly, Innovative ,
Creative and Realistics)At Junior High School
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..219-227
ME.8. Rita Pramujiyanti Khotimah, Masduki . Problem Solving Ability of
Students to Solve Ordinary Differential Equations
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ. 228-235
ME.9. Saleh Haji, M. Ilham Abdullah. Developing Studentsโ Mathematical
Communication Through Realistic Mathematics Learning
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..236-244
viii
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
ME.10. Sitti Busyrah Muchsin, Dwiโs Concept Understanding Concerning
Operation on Integer
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..245-250
ME.11. Sitti Maesarah. Analysis Mastery of Mathematics Teacher of
Implementation Curriculum 2013 in the Junior School
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..251-256
ME.12. Tedy Machmud, Sumarno Ismail, Nursia Bito. Development of PCL
Approach in Mathematics Learning Integrated with Character Education at Junior
High Schools in Gorontalo Province
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..257-263
ME.13. Yuda Satria Nugraha, The Effectiveness of Graph Theoryโs Learning
Model Based on Decision-Making System Using Analytical Hierarchy Process
(AHP) (Case Study of Semester IV-C Students Academic Year 2014/2015)
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..264-270
ME.14. Yumiati. The Application of Connecting, Organizing, Reflecting, and
Extending Learning in Enhancing Studentsโ Algebraic Thinking Skill
โฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆโฆ..271-283
- 79 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Variance Vector Control Chart with Mean Square Successive
Difference Method (MSSD) To Monitoring Variability
Multivariate Control Process
Erna Tri Herdiani1
1Department Mathematics, Faculty Mathematics and Natural Sciences, Hasanuddin University
Jl. Perintis Kemerdekaan Km 10 Tamalanrea
email: [email protected]
ABSTRACT
Multivariate control chart to control the variance covariance matrix typically use
basic statistical matrix determinant and inverse matrix. In the case of variable data
that has quite a lot, this statistic is difficult in its calculations. To overcome this, the
vector Variance been proposed as an alternative statistics from statistical M - Box
and Jennrich preexisting. But in general, the variance covariance matrix involved in
the vector variance is estimated using the full data set ( FDS ) , and in this paper we
investigate the effect of variance vector control chart with the assessment of the
variance covariance matrix by using the Mean Squared Successive Differences (
MSSD ) . Furthermore, the results of which have been applied to the data obtained
by the weather in the city of Makassar in 2003 until 2012. Key Words: multivariate control charts, control charts, vector variance, variance
matrix Covariance, mean square successive difference
1. Introduction
Multivariate control chart used to monitoring variables together in a process. As is usually controlled mean vector and variance-covariance matrix. Charts for variance-covariance matrix is generally used with basic statistical matrix determinant and inverse matrix, see [7],[6]. In the case of data that has quite a lot of variables; vector variance was proposed as an alternative statistics of statistic M-Box and Jennrich, see [3], [4]. But in practice the variance-covariance matrix is estimated using a sample. Variance-covariance matrix can be estimated using the maximum likelihood method, which involves all sample results are available, then the estimation using the assessment method is called Full Set of Data (FSD) [1]. In this paper we investigate the creation of vector charts variance with the estimation of the variance-covariance matrix by using the Mean Squared Successive Differences (MSSD) [5].
2. Estimation Of Variance-Covariance Matrix
Let ๐ฅ1โโ โ, ๐ฅ2โโโโ , โฆ . , ๐ฅ๐โโโโ be a random vector of random variables ๐ฅ is multivariate
normal distribution with p variable, mean vector ๐ and the variance covariance matrix ฮฃ. Estimation of variance covariance matrix can be done with the Full of Data
Set (FSD) and the Mean Squared Successive Differences (MSSD) methods. One of
the estimators used to estimate the variance covariance matrix ฮฃ by involving all n
observations by maximum likelihood method is as follows:
- 80 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
๐บ1 = โ1
๐๐ โ 1(๐๐ โ ๏ฟฝฬ ๏ฟฝ)(๐๐ โ ๏ฟฝฬ ๏ฟฝ)๐
๐
๐=1
The second estimator using the difference between successive pairs on observations:
๐ฃ๐ = ๐๐ โ ๐๐โ1 , ๐ = 2, 3,โฆ , ๐. Let ๐ฟ1, ๐ฟ2, ๐ฟ3, โฆ , ๐ฟ๐ be a vector random multivariate with p variables, element-j of vector random is:
๐ฟ๐ =
[ ๐1๐
๐2๐
โฎ๐๐๐]
๐ = 1, 2, โฆ , ๐.
Where p is the number of quality characteristics and n is the number of samples, so
estimates of the mean vector is
๐ธ(๏ฟฝฬ๏ฟฝ) = ๏ฟฝฬ ๏ฟฝ =
[ ๏ฟฝฬ ๏ฟฝ1
๏ฟฝฬ ๏ฟฝ2
โฎ๏ฟฝฬ ๏ฟฝ๐]
Where ๏ฟฝฬ ๏ฟฝ๐ =1
๐โ ๐ฟ๐๐
๐๐=1 , ๐ = 1, 2,โฆ . , ๐. Vector of matrix V is
๐ฝ =
[ ๐๐โฒ
๐๐โฒ
โฎ๐๐โฒ ]
so
๐บ2 =1
2
๐ฝโฒ๐ฝ
๐
Both of these estimates S1 and S2 will be used to estimate the variance covariance
matrix of the sample to be used in the statistical variance vector.
3. Vector Variance Control Chart
Let X be a random vector which is a superposition of ( 1 )X and ( 2 )X , where
each dimension p and q , X = t
1 2X X . Suppose also, ( i ) = ( i )E X ; i = 1, 2 and
ij = t
( i ) ( i ) ( j ) ( j )E X X
; i, j = 1, 2. Therefore, the covariance matrix of X ,
It is called by ๐บ, can be written in the form of partition ๐บ = 11 12
21 22
. [3] suggests
that [2] using 12 21Tr to measure the linear relationship between two random
vectors ( 1 )X and ( 2 )X . This parameter is called covariance vector which is the sum
of all diagonal elements of12 21 . Thus, as submitted by [3], 2
11Tr and 2
22Tr
respectively called vector variance (VV) between of 1X and 2
X . If p = q = 1,
covariance vector is the square of the variance covariance while the vector is the
square of the classical variance. Furthermore, the vector variance written with the
symbol ๐๐(๐บ2). the variance covariance matrix is estimated with the variance
covariance matrix of the sample variance covariance matrix samples are denoted
by๐๐(๐2).
- 81 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
[3] states that the asymptotic distribution of variance vector is ๐ (๐๐(๐บ2),8๐๐(๐บ4)
๐โ1) ,
vector mean ๐๐(๐บ2) and variance 8๐๐(๐บ4)
๐โ1. The value of this variance will be used
to establish statistical control chart of Variance Vector.
Theorem 1
Let ๐ฟ๐, ๐ฟ๐, ..., ๐ฟ๐ be a random sample that have multivariat normal distribution
๐๐(๐, ๐บ), If ๐๐(S2)๐โ ๐ (๐๐(๐บ2),
8๐๐(๐บ4)
๐โ1) than variance vector control chart have
Upper Central Limit (UCL) : ๐๐(๐บ2) + 3โ8๐๐(๐บ4)
๐โ1 and Lower Control Limit (LCL)
: ๐๐(ฮฃ2) โ 3โ8๐๐(๐บ4)
๐โ1. It takes value of level significance ๐ผ = 0.0027.
Furthermore, if value of variance-covariance matrix is estimated by FDS then
variance vector control chart will become in Theorem 2.
Theorem 2
If variance-covariance matrix ๐บ be estimated by full data set (FDS) as
๐บ1 = โ1
๐๐ โ 1(๐๐ โ ๏ฟฝฬ ๏ฟฝ)(๐๐ โ ๏ฟฝฬ ๏ฟฝ)๐
๐
๐=1
Then vector variance control chart will be have control limit as follows: Upper
Control Limit (UCL): ๐๐(S12) + 3โ8๐๐(S1
4)
๐โ1 and Lower Control Limit
(LCL): ๐๐(S12) โ 3โ8๐๐(S1
4)
๐โ1.
Furthermore, if value of variance-covariance matrix is estimated by MSSD then
variance vector control chart will become in Theorem 3.
Theorem 3
If variance-covariance matrix ๐บ be estimated by Mean Square Successive Difference (MSSD) as
๐บ2 =1
2
๐ฝโฒ๐ฝ
๐
Then vector variance control chart will be having control limit as follows: UCL
๐๐(S22) + 3โ
8๐๐(S24)
๐โ1 and LCLโถ ๐๐(S2
2) โ 3โ8๐๐(S2
4)
๐โ1.
Based on theorem 2 and 3 will be applied on The Weather Data in Makassar City at
2003 until 2012.
4. Result and Discussion Study of Weather Data in Makassar in The Year
2003 To The Year 2012
In this case study used the data yearly, Air Temperature (๐ฟ1), irradiation sun (๐ฟ2),
humidity (๐ฟ3) and wind speed (๐ฟ4) in Makassar City at year 2003 to 2012, obtained
by each subgroup q, where q indicates the year defined by ๐ = 1, 2, โฆ , ๐. With quality
- 82 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
characteristics p = 4 namely Temperature ( Celsius ) , irradiation sun ( percent ) ,
humidity ( percent ) and wind speed ( knots ) which is symbolized by ๐ฟ1, ๐ฟ2, ๐ฟ3 dan
๐ฟ4 . Table 1 Value of ๐๐(๐1
2) And ๐๐(๐22) to
The Weather Data Makassar City at 2003 until 2012
Year ๐๐(๐12) ๐๐(๐2
2)
2003 304340 19874.27
2004 236560 17392.14
2005 146460 7621.045
2006 293190 16591.38
2007 211490 22240.75
2008 708460 825.891
2009 257300 16574.13
2010 73927 19950.08
2011 325540 13309.43
2012 99778 7806.186
Source: Data Processing, 2015
The data in table 1 are shown in Figure 1.
Figure 1 Value of ๐๐(๐12) and ๐๐(๐2
2) by year
Also, variance vector control chart based on two difference estimation variance-
covariance matrix that can be seen in Table 2.
Table 2 Value variance vector control chart
Source: Data processing, 2015
0.00000E+00
2.00000E+05
4.00000E+05
6.00000E+05
8.00000E+05
1.00000E+06
2002 2004 2006 2008 2010 2012 2014
Va
lue
of
Tr(
S2)
Year
FSD ๐๐๐๐ท
UCL 888,750 143,700
CL 233,240 38,048
LCL -422,280 -67,599
- 83 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
If both of control chart is presented on figure than Its result on figure 2.
Figure 2. Variance Vector Control Chart
Based on figure 2, variance vector control chart by FDS method show that
observation data entrance all of them in control limit, furthermore based on MSSD
method show that there is one data out of control limit, that is data year 2008.
Therefore, it is necessary reprocessing variance vector control chart through MSSD
method until all the data into the control limit, by eliminating the data in 2008.
At the time of the data in 2008 eliminated the variance vector control chart with
MSSD method produce the data as follows:
Table 3 Value for Vector Variance Control Chart by
MSSD Method Before and After 2008 Omitted Data
Before
Omitted Data
After
Omitted Data
UCL 143,700 160,090
CL 38,048 40,531
LCL -67,599 -79,031
Source: Data processing, 2015
The results obtained after 2008 data is eliminated can be seen in Figure 3.
-5.00000E+05
-3.00000E+05
-1.00000E+05
1.00000E+05
3.00000E+05
5.00000E+05
7.00000E+05
9.00000E+05
2002 2004 2006 2008 2010 2012 2014
Val
ue
of
Tr(S
2)
Year
UCL FSD = 888,750
UCL MSSD = 143,700
LCL MSSD = - 67,599
LCL FSD = - 422,280
- 84 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Figure 3. Vector Variance Control Chart Method with MSSD and
Data for 2008 has been omitted
Thus the vector variance control chart of Weather Data in Makassar in 2003 up to
2012. Vector variance control chart with different estimation is based FDS and
MSSD generates two different control chart. MSSD method of detecting where the
data in 2008 as a data outlier while the FDS does not. Thus, for further research, we
can continue to see which one is better if the FSD or MSSD.
5. Conclusion
Vector variance control chart based on variance-covariance matrix sample that is charged with FDS and MSSD. MSSD detected that produce that data for 2008 are out of control while not FSD. To determine the performance of charts that can be used, preferably in writing, then compare the two charts by using the Average Run Length (ARL). REFERENCES
[1] Anderson T.W. (2003). An Introduction Multivariate Statistical Analysis. Third Edition.
Page: 251 โ 282. Standford University.
[2] Cleroux, R. (1987). Multivariate Association and Inference Problems in Data
Analysis, Proceedings of the Fifth International Symposium on Data Analysis
and Informatics, Vo. 1 Versailles, France.
[3] Djauhari, M.A. (2007). A Measure of Data Concentration. Journal of Probability
and Statistics, vol 2, No. 2, 139-155.
[4] Herdiani, E.T and Djauhari, M.A. (2013). Asymtotic distribution of Vector
Variance Standardized Variable Without Duplication, Journal of Concrete and
Applicable Mathematics โ JCAAM, volume 11, Nomor 1, Januari 2013, 87-95. [5] Levinson, W. A., Holmes, D.S., and Mergen, A.E. (2002). Variation Charts For
Multivariate Processes,. Quality Engineering, volume 14, issue 4, 539-545
[6] Sindelar, M.F. (2007). Multivariate Statistical Proses Control For Correlation
Matrices. Pittsburgh : University of Pittsburgh.
[7] Tang, G.Y.N. (1998). The Intertemporal Stability of the Covariance and
Correlation Matrices of Hong Kong Stock Returns, Applied Financial
Economics, 8, pp. 359-365, 1998.
-1.00000E+05
-5.00000E+04
0.00000E+00
5.00000E+04
1.00000E+05
1.50000E+05
2.00000E+05
2002 2004 2006 2008 2010 2012 2014
Val
ue
of
Tr(S
2)
Year
UCL MSSD = 160,090
LCL MSSD = -79,031
- 85 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Estimation Parameter of Vector Autoregressive Model
Using by Two Stage Least Square Method
Fatimah Ashara1, Erna Tri Herdiani2, and M. Saleh3
Department of Mathematics, Faculty of Mathematics and Natural Sciences,
Hasanuddin University,
Jl. Perintis Kemerdekaan Km.10 Tamalanrea, Makassar, Indonesia
*Corresponding Author: [email protected]
ABSTRACT
Model Vector Autoregressive (VAR) is an analytical tool that is very useful in
understanding the reciprocal relationship (interrelationship) between economic
variables as well as in the establishment of a structured economy. There are two
important assumptions to be aware of the data time series that can be formed into a
VAR model, namely: stationary, normal and mutually error-free. The method is
usually used to estimate the parameters of the VAR model is OLS, but the author
aims to develop this method by using TSLS method to estimate the parameters of the
VAR model. Steps in the method of Two Stage Least Squares (TSLS) there are two
stages. The first phase, parameter estimation using OLS, and continued in the second
stage, where the results of the estimation in the first stage is used to estimate the TSLS
stage. Based on this thesis, in addition to the VAR model is able to be estimated using
OLS, one of which is using the TSLS method
Keyword: Vector Autoregressive models, OLS, TSLS, analysis regression
1. Introduction
Forecasting method is a way to predict or estimate quantitatively and
qualitatively what happens in the future based on the relevant data in the past. Thus
forecasting method is expected to provide greater objectivity. Forecasting required to
perform a certain method and which method to use depends on the data and
information that will be predictable and the objectives to be achieved. One method
that is often used salhsatunya forecasting is a time series.
Basically the time series analysis is used to perform data analysis that
considers the influence of time, data collected periodically berdasaran time sequence,
can be within hours, days, weeks, months and years can be analyzed using the method
of analysis of time series data, the analysis of time series data is not can only be done
for one variable (univariate) but also to many variables(multivariate).
One is modeling for multivariate data with models Vector Autoregressive
(VAR). VAR was first introduced by C.A. Sims (1972) as a development of thought
Granger (1969). One use of the VAR model is to forecast or prediction (forecasting),
especially for projections or forecasting short-term (short-term forecast). VAR
models can also be used to see the effects of changes in the system of one variable to
- 86 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
another variable dynamically (Juanda and Junaidi, 2012). Basically VAR analysis
can be paired with a simultaneous equation model for this analysis into account
several dependent variable (tied) together in a model. So that the VAR model can be
estimated using two stage least square method because in general the TSLS method
used to estimate the simultaneous equations the variables have relevance, because the
results of the estimation using TSLS method is consistent and efficient.
2. Vector Autoregressive (VAR) Model
VAR models can be used to determine a causal relationship. As part of
econometrics, the VAR model is one of the discussions in multivariate time
series.Time series ๐๐ก follow the model VAR (p) if it satisfies
๐๐ก = ๐0 + ๐1๐๐กโ1 + โฏ + ๐๐๐๐กโ๐ + ๐๐ก , ๐ > 0
Where: ๐๐ก : (๐ฆ1๐ก,โฆ,๐ฆ๐๐ก)' size (๐ ร 1)
๐1 : the coefficient matrix of size (n ร n)
๐๐ก : (๐1๐ก, โฆ , ๐๐๐ก)โฒ is the n-dimensional white noise
๐ : number of variables
Because ๐0 assumed to be zero, then the VAR model can be written as:
๐1๐๐กโ1 + โฏ+ ๐๐๐๐กโ๐ + ๐๐ก ,
We can be written in matrix form
[
๐1๐ก
๐2๐ก
โฎ๐๐,๐ก
] = [
๐11 ๐12 โฆ ๐1๐
๐21
โฎ๐๐1
๐22
โฎ๐๐2
โฆโฑโฆ
๐12
โฎ๐๐๐
] [
๐1๐กโ1
๐2๐กโ2
โฎ๐๐๐กโ๐
] + [
๐1๐ก๐2๐ก
โฎ๐๐๐ก
]
From the matrix above it can be modeled in a general form:
๐ = ๐ต๐ + ๐
3. Result And Discuss
In general, VAR models can be estimated with Ordinary Least Square (OLS).
Basically VAR analysis can be paired with a simultaneous equation model because
in this analysis to consider some of the dependent variable (tied) together in a model.
Therefore VAR models are also able to be estimated by Two Stage Least Square
(TSLS) method. OLS parameter estimation method according to [2] forms the matrix
of the VAR model equation is:
๐ = ๐ต๐ + ๐
Then estimated by minimizing the least squares
๐โฒ๐ = (๐ฆ โ ๐ต๐)โฒ(๐ฆ โ ๐ต๐)
And generate estimator
๏ฟฝฬ๏ฟฝ = ๐๐โฒ(๐๐โฒ)โ1
Parameter estimation method of TSLS is according to[4]. Parameter estimation TSLS
is estimated in two stages, in order to obtain its estimator also minimize the least
squares of equations
- 87 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
๐ด = ๐พ๏ฟฝฬ๏ฟฝ + ๐
then ๐โฒ๐ = (๐ด โ ๐พ๏ฟฝฬ๏ฟฝ)โฒ(๐ด โ ๐พ๏ฟฝฬ๏ฟฝ)
And generate estimator
๏ฟฝฬ๏ฟฝ = ๐ด๏ฟฝฬ๏ฟฝโฒ(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ1
In this final task to be done is to prove the VAR model parameter estimation
through TSLS method.
VAR models in general
๐๐ก = ๐1๐๐กโ1 + โฏ+ ๐๐๐๐กโ๐ + ๐๐ก ,
In the form of a matrix in equation
๐ = ๐ต๐ + ๐
Or
๐ = (๐โฒโจ๐ผ๐)๐ท + ๐บ
3.1 VAR model parameter estimation stage 1, by OLS.
To get the OLS estimator, is done by minimizing the sum of squared error.
Where ๐บ = ๐ โ (๐โฒโจ๐ผ๐)๐ท
then ๐บโฒ๐บ = (๐ โ (๐โฒโจ๐ผ๐)๐ท )โฒ(๐ โ (๐โฒโจ๐ผ๐)๐ท)
So ๐(๐ฝ) = ๐โฒ๐ + ๐ทโฒ(๐๐โฒโจ๐ผ)๐ท โ 2๐ทโฒ(๐ฟโจ๐ผ)๐
Then to obtain OLS estimation then used partial derivative for squared error
๐๐(๐ฝ)
๐๐ฝ|๐ฝ=๏ฟฝฬ๏ฟฝ =
๐โฒ๐+๐ทโฒ(๐๐ โฒโจ๐ผ)๐ทโ2๐ทโฒ(๐โจ๐ผ)๐
๐๐ฝ|๐ฝ=๏ฟฝฬ๏ฟฝ
= 0
= (๐๐โฒโจ๐ผ)๐ท โ (๐โจ๐ผ)๐ = 0
Or ๏ฟฝฬ๏ฟฝ = [(๐๐โฒ)โ1โจ๐ผโ1](๐โจ๐ผ)๐
And then ๏ฟฝฬ๏ฟฝ = (๐๐)โ1(๐โจ๐ผ)[((๐โฒโจ๐ผ)๐ท + ๐บ)]
Or ๐ฃ๐๐(๏ฟฝฬ๏ฟฝ) = ๏ฟฝฬ๏ฟฝ = (๐๐โฒ)โ1(๐โจ๐ผ)๐ฃ๐๐(๐)
So ๏ฟฝฬ๏ฟฝ = ๐๐โฒ(๐๐โฒ)โ1
Next determine ๐ to obtain data that will be used in order to obtain the TSLS method
๐ = ๐ฉ๐ฟ
= (๐๐โฒ(๐๐โฒ))โ1๐
= ๐๐ฅ๐ฆ๐
3.2 VAR model parameter estimation stage 2, called by TSLS.
Suppose there is a vector ๐ด size (๐ ร ๐ก) to be regressed to the matrix ๐, then formed
a model
๐ด = ๐พ๏ฟฝฬ๏ฟฝ + ๐
or ๐ = (๏ฟฝฬ๏ฟฝโฒโจ๐ผ)๐ธ + ๐บ
Where ๐พ is the coefficient for the ๐ parameter.
Furthermore, the partial derivative is used to minimize the sum of squared error
Where ๐บ = ๐ โ (๐โฒโจ๐ผ๐)๐ธ,
- 88 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Then ๐บโฒ๐บ = (๐ โ (๐โฒโจ๐ผ๐)๐ธ )โฒ(๐ โ (๐โฒโจ๐ผ๐)๐ธ)
so that ๐(๐ฝ) = ๐โฒ๐ = (๐ โ ((๏ฟฝฬ๏ฟฝโฒโจ๐ผ)๐ธ)โฒ
) โ (๐ โ (๏ฟฝฬ๏ฟฝโฒโจ๐ผ))๐ธ
= ๐โฒ๐ โ 2๐ธโฒ(๏ฟฝฬ๏ฟฝโจ๐ผ)๐ + ๐ธโฒ(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒโจ๐ผ) ๐ธ
Then to obtain TSLS estimation then used partial derivative for squared error
๐๐(๐พ)
๐๐พ|๐พ=๏ฟฝฬ๏ฟฝ=
๐โฒ๐โ2๐ธโฒ(๏ฟฝฬ๏ฟฝโจ๐ผ)๐+๐ธโฒ(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒโจ๐ผ) ๐ธ
๐๐พ|๐พ=๏ฟฝฬ๏ฟฝ=0
Partial derivatives from ๐๐(๐ฝ) is
๐๐(๐ฝ) = โ2(๏ฟฝฬ๏ฟฝโจ๐ผ)๐ + ๐(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒโจ๐ผ) ๐ธ
or
๏ฟฝฬ๏ฟฝ = ((๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ๐
โจ๐ผโ1)(๏ฟฝฬ๏ฟฝโจ๐ผ)๐
And then
๏ฟฝฬ๏ฟฝ = ((๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ๐
๏ฟฝฬ๏ฟฝโจ๐ผ๐ผ)๐
So that
๏ฟฝฬ๏ฟฝ = ๐๐๐ (๐ด๏ฟฝฬ๏ฟฝโฒ)(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ๐
= (๐ด๏ฟฝฬ๏ฟฝโฒ)(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ๐
because ๏ฟฝฬ๏ฟฝ = ๐๐๐ฆ๐ and can be written
๏ฟฝฬ๏ฟฝ = ๐ด((๐๐ฅ๐ฆ)๐)โฒ((๐๐ฅ๐ฆ)๐((๐๐ฅ๐ฆ)๐)โฒ))โ1
๐๐ฅ๐ฆโฒ = ๐๐ฅ๐ฆ because ๐๐ฅ๐ฆ is simmetric matrix, then can be written
๏ฟฝฬ๏ฟฝ = ๐ด(๐โฒ๐๐ฅ๐ฆ)((๐๐ฅ๐ฆ)๐(๐โฒ๐๐ฅ๐ฆโฒ))โ1
๏ฟฝฬ๏ฟฝ = ๐ด๐โฒ(๐๐โฒ)โ1๐๐ฅ๐ฆโ1 (4.14)
where ๐๐ฅ๐ฆ = (๐ด๏ฟฝฬ๏ฟฝโฒ)(๏ฟฝฬ๏ฟฝ๏ฟฝฬ๏ฟฝโฒ)โ๐ , so TSLS estimator is
๏ฟฝฬ๏ฟฝ = ๐ด๐โฒ(๐๐โฒ)((๐ด๐โฒ(๐๐โฒ)โ1)โ1
4. Conclusion
Based on the results of the study it can be concluded that the work theoretically, TSLS
equation for the VAR model is
๏ฟฝฬ๏ฟฝ = ๐ด๐โฒ(๐๐โฒ)((๐ด๐โฒ(๐๐โฒ)โ1)โ1
REFERENCES
[1] Laub, Alan J. 2005. Matrix Analysis for Scientists and Engineers. university of
California. California
[2] Lutkepohl, Helmut. 1991. New Introduction To Multiple Time Series Analysis.
Europen University. New York
[3] Susilawati,Sumarni. 2014. Estimasi Parameter Model Vector Autoregressive
generalized Space Time Autoregressive Menggunakan Metode Two Stage
Least Squares, Tesis, Universitas Hasanuddin, Makassar.
[4] Wang, S. & Hsiao, C. 2006. Modified Two Stage Least Squares Estimator for
The Estimation Structural Vector Autoreg ressive Integrated Process.
Journal Of Econometrics. 135: 427-463.
[5] Wei, William W.S. 1994. Time Series Analysis. Addison- Wesley Publishing
Company : California.
- 89 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Multiple Imputation with PMM Method To Estimate Missing Data
On Nonresponse Item
Herianti1, Anisa
2, Ladpoje
3,
1Department of Mathematics, Faculty of Mathematics and Natural Sciences, Hasanuddin
University
Jl. Perintis Kemerdekaan Km.10 Tamalanrea, Makassar, Indonesia
email: [email protected], 2,3
Department of Mathematics, Faculty of Mathematics and Natural Sciences,
Hasanuddin University
Jl. Perintis Kemerdekaan Km.10 Tamalanrea, Makassar, Indonesia
ABSTRACT
Missing data is incomplete information commonly found in the survey, census and
experiments. Some statistical analyzes were developed to deal the missing data
such as Multiple Imputation (MI), Maximum Likelihood, Weighting methods, and
so on. One of MI approach is Predictive Mean Matching (PMM) Method. PMM is
technique to fill missing data with a set values imputed from the nearest
observation values of the model. This paper reviews about application of PMM
method on data not normal distribution and normal data distribution. Both of the
data are complete. It is designed to omit such a way as to for monotone missing
data pattern and fulfill MAR (Missing At Random) missing data mechanism. The
missing data simulation analyzed by method of PMM as many as ten imputations.
The imputation outcome used to found Relative Efficiency (RE). The result of RE
values of not normal data distribution is faster convergence to 1 than normal data
distribution. So that, this research found that the PMM method work well on the
not normal data distribution.
Key Words: Multiple Imputation, PMM, Monotone Missing Data Pattern, MAR,
Relative Efficiency.
1. Introduction
Nowadays many researchers developed a method of imputation. It was statistical
analysis procedure in dealing with missing data. The imputation method is divided to
single imputation and MI. Filling in a single value to missing data is mention single
imputation. MI is the technique that replaces each missing value with two or more
plausible values as a representation of missing values [1]. PMM is one of multiple
imputation method. The advantage this method can ensures that the imputed values
are more reasonable when the assumption of normality were violated [2].
MI with PMM method procedure try to predict missing values based on the others
variable then the missing data are filled in m imputations to generate m complete data
- 90 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
sets. The m complete data sets are analyzed by using standard procedure (e.g.
regression). The results from the m complete data sets are combined to found RE for
inference.
Basuki [3] on his research used survey data IBS 2007 East Java indicate that PMM
method work better on not normal data distribution which is using univariate missing
data pattern. So that, this paper investigated application of PMM method on not
normal data distribution and normal distribution with monotone pattern of missing
data.
2.1 Patterns and Mechanism Of The Missing Data
The missing data pattern related to the form of missing value was observed in
the group of data. Some missing data patterns by Little and Rubin [4] described
as follow. The first section is a general pattern, where the missing values usually
have an arbitrary pattern. The second, univariate pattern is a pattern of missing
data has a single variable incomplete. The third, multivariate pattern occurs when
a subset of sample did incomplete the question sheet given because of losing on
contact, rejection, or other reasons. The fourth, monotone pattern is a data pattern
that occurs when the number of complete observations on the first question item
more than the second question and the number of observations of the second
question more than a third question and so on. For example, the respondent
moved house before the end of the study and researchers were not able to access
the location where the respondent moved. The fifth, file matching pattern is
missing data patterns that are planned. It is useful for collecting question sheet
items in large numbers at once reducing the burden on respondents. The sixth,
factor analysis pattern where the first item is a variable ๐ of size n ร k missing
completely and the second item is a variable ๐ size n ร p completely observed
with k < p. it can be seen as a factor analysis of multivariate regression analysis
with no predictor variables were observed.
Missing data mechanisms describe possible relationships between measured
variables and the probability of missing the data [5]. Little and Rubin [4]
distinguish three types missing data. Those are Missing Completely at Random
(MCAR) occurs when probability missing data on a variable is not related to
some other the observed variables and also not related on the value of the variable
itself. Missing at Random (MAR) occurs when probability missing data on a
variable related to other variables were observed but not related to the missing
value itself. An illustration, in a research is measure the weight and height of
students of Makassar city. Female respondents will tend refuse to provide a
response on the question of their weight. Random (NMAR) isnโt probability
missing data the variable depends on the variable itself.
- 91 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
2.2 Mi With Predictive Mean Matching Method
Imputation method is a method of filling the missing values to deal non
response items. Imputation method is divided into single imputation and
multiple imputation. Imputation method with a single value for each
nonresponse item is called single imputation method. At this stage of analysis
data, imputation value obtained from a single imputation is considered as if such
real data. The disadvantage of a single imputation is a value which is used to
replace the missing data does not describe the diversity of sample values when
one nonresponse model is actually formed. The disadvantages can be resolved
by using multiple imputations [6].
MI is a technique for replacing the missing data values with two or more
acceptable values and represents the probability distribution. There are m
number values for each missing data and will form m sets of data completed [7].
Composite method was first introduced by Rubin in 1987 then developed by
Little in 1988 to solve nonresponse multivariate. Little introduces composite
method called Predictive Mean Matching. Basically the same method Predictive
Mean Matching with Regression method, but the difference is each missing
values imputed from the nearest observation values of the model [6].
Analysis of missing data use MI with PMM need to pay attention to some
things, namely missing data pattern, missing data mechanism, variable types,
and distribution of data. PMM method assumes that the missing data mechanism
is MAR. It works on the missing data monotone pattern.
2.3 Relative Efficiency of Multiple Imputation
Relative efficiency of imputation results is used to determine how the better
of the population parameter estimates. This is related to how much number of
data is missing and the number of m imputation done. According to Bruin [9]
when the number missing data is very low then the efficiency can be achieved
with only doing a few imputations. However, if the number of missing data is
larger usually require m imputation more also to achieve sufficient efficiency
value. Some literatures use 3 to 5 imputations. However, Schafer [10]
recommend 3 to 10 imputations if information is missing quite a lot. A method
is said to be efficient if RE value is equal to one [6].
Parameter ๐โฒ = (๏ฟฝฬ๏ฟฝ0๏ฟฝฬ๏ฟฝ1๏ฟฝฬ๏ฟฝ2)๐ is the regression coefficient which complete data
imputation results. Point estimation of each component ๐โฒ, suppose ๐ with the
average ๏ฟฝฬ ๏ฟฝ obtained as m imputation. ๏ฟฝฬ ๏ฟฝ Calculated by the formula:
๏ฟฝฬ ๏ฟฝ = 1
๐โ๐๐
๐
๐=1
(1)
- 92 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
๐๐ is Point estimation of each component ๐โฒ, bฬ is the average of bi, m is number
of imputation (1).
Let ๐๐ is the variance obtained from the variance covariance matrix of
regression parameters imputation result. It is (Mean Square Error) ร (๐๐๐)โ1.
Variance estimation in multiple imputation is partitioned into the within
imputation variance and the between imputation variance. ๏ฟฝฬ ๏ฟฝ is the within
imputation variance average with m imputations. According to Yuan [8], the
within imputation variance formula is
๏ฟฝฬ ๏ฟฝ = 1
๐โ ๐๐
๐
๐=1
(2))
Wi is the within imputation variance, ๏ฟฝฬ ๏ฟฝ is the within imputation variance
average (2).
Whereas the between imputation variance by the following formula:
๐ต = 1
๐ โ 1โ(๐ โ ๏ฟฝฬ ๏ฟฝ)
2๐
๐โ1
(3))
B is the between imputation variance (3).
According Yuan [8] the variance imputation total is combining the two variances
by the following formula:
๐ = ๏ฟฝฬ ๏ฟฝ + (1 +1
๐)๐ต (4))
T is the variance imputation total (4).
Because statistics (b โ bฬ )Tโ1/2 is approximately distributed ast-distribution
then degrees of freedom [8], can be written the following:
๐๐ = (๐ โ 1) [1 + ๐๏ฟฝฬ ๏ฟฝ
(๐ + 1)๐ต]
2
(5))
df is degrees of freedom (5).
Statistics r is defined as the relative increase in variance due to nonresponse [8].
Its formula is as follows:
๐ = (1 + ๐โ1)๐ต
๏ฟฝฬ ๏ฟฝ (6)
r is the relative increase in variance (6).
The great m value result small r values and degrees of freedom ๐๐ can be much
large so that the distribution will be near normal [6]. Another very useful
statistics related to nonresponse is fraction. Fraction is a value which affects the
- 93 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
speed of convergence to a value. The larger fraction value then fraction of time
required to converge more slowly. Fraction can be calculated using the formula:
๐พ =(๐ + 2)/(๐๐ + 3)
๐ + 1 ( (7))
ฮณ is fraction (7).
Relative efficiency is the efficiency obtained by using m imputation. Its value is
obtained from m and ๐พ the following formula [8]:
๐ธ๐ = (1 +๐พ
๐)โ1
( (8))
ER is relative efficiency (8).
3. Methodology
This paper has two groups of dataโs for simulating. The first group of data is not
normally distributed and the second is normally distributed. Each group of data has
three variables. Let ๐1, ๐2, and ๐1 are variable of the not normal data distribution.
The normal data distribution give ๐3, ๐4, and ๐2. Both of the data are complete data
from a survey result conducted by Ilham Nurhidayah [11] and Agustina Karoma [12]
students of Hasanuddin University.
Each of the data group is designed to omit such a way as form monotone missing data
pattern and omission at random to fulfill MAR (Missing At Random) missing data
mechanism as shown in Table 1.We do three simulations of omission on each of data
group. The first, omission as many as 2% on ๐2 and 5% on ๐1. The second omission
is5% on ๐2 and 10% on ๐1. The last omission is 10% on ๐2 and20% on ๐1. Table 1. Simulation of Omission on Data 2% on ๐ฟ๐ and 5% on ๐๐
Number
of
Sample
Variables
๐ฟ๐ ๐ฟ๐ ๐๐
1
2
โฎ
26
27
โฎ
33
โฎ
38
Information: : Missing data
: Complete data
The missing values are estimated by PMM method. Each missing value was replaced
by m=3 to m=10 imputations.
The steps of PMM method are [6]:
a. Model of a complete data use the regression model equation = ๐ฟ๐ท + ๐บ .
- 94 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
b. Calculate the value of the parameter estimation equation ๐ = ๐ฟ๐ท + ๐บ use the
least squares method. After the value of the model parameter estimation was
found, the next step is calculating the value of variance estimation error by
the formula:
๏ฟฝฬ๏ฟฝ2 =๐๐๐
๐โ๐ ( (9))
ฯฬ2 is variance estimation error, ฮตTฮต is sum of squares due to error, n is the
number of complete data observations, p is the number of parameters (9).
c. The estimation value obtained in step b is used in the imputation stage with
the following steps:
1. Calculate the value of ๐โ๐2 = ๏ฟฝฬ๏ฟฝ2(๐ โ ๐)/๐๐, where ๐ = 1, 2,โฆ , ๐ and ๐๐
is a random variable generated from the distribution of chi square with
๐ โ ๐(๐๐โ๐2 ).
2. Calculate the new parameter values by ๐ทโ๐ = ๏ฟฝฬ๏ฟฝ + ๐โ๐๐ฝ๐๐, where ๐ =
1, 2, โฆ ,๐, ๐ฝ๐ is the upper triangular matrix in the Cholesky
decomposition and ๐ is a random variable as ๐ generated from standard
normal distribution, ๐(0,1).
3. The missing data are replaced by ๐ฆโ๐ = ๐ฟ๐ทโ๐ where ๐ = the respondent
to n which has the nonresponse item.
4. Doing imputation is taking an observation the value of the closest to the
value of ๐ฆโ๐. The next is repeating steps 1 to 3 as many as m times.
The results of the m complete data sets are combined to found RE for inference
(Excel 2010, was used for all the simulation).
4. Result and Discussion
The main results of the simulation are presented in Table 2 and Table 3.Table 2
displays the results from relative efficiency of imputation results of PMM method on
data which is not normal distribution. Relative efficiency of Missing data is 2% on
๐2 and 5% on ๐1have parameter estimation ๏ฟฝฬ๏ฟฝโ๐ obtained by using the PMM method.
The estimation process conducted 3-10 imputations. It is similarly columns 1 and 2
of relative efficiency. Columns of relative efficiency values are similar to each other.
A method PMM said to be efficient if the value of the relative efficiency is equal to
1. Three columns of relative efficiency values have time to converge towards one
fairly quickly. Despite the relative efficiency of each column has a number of
different missing data. RE value does not change significantly. It is meaning that the
estimation missing data on the not normal distribution data which result unbiased
estimation, just do 3-10 imputations.
- 95 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Table 2 Relative Efficiency of Imputation Results of PMM Method on Not Normal Distribution Data
Number of
Imputation Parameter
Relative Efficiency
Missing 2%
on ๐ฟ๐ and
5% on ๐๐
Missing 5%
on ๐ฟ๐and
10% on ๐๐
Missing 10%
on ๐ฟ๐and
20% on ๐๐
3
๏ฟฝฬ๏ฟฝโ0 0,9999 0,9998 0,9960
๏ฟฝฬ๏ฟฝโ1 0,9999 0,9999 0,9999
๏ฟฝฬ๏ฟฝโ2 0,9999 0,9999 0,9999
5
๏ฟฝฬ๏ฟฝโ0 0,9999 0,9995 0.9995
๏ฟฝฬ๏ฟฝโ1 0,9999 0,9999 0,9999
๏ฟฝฬ๏ฟฝโ2 0,9999 0,9999 0,9999
10
๏ฟฝฬ๏ฟฝโ0 0,9999 0,9999 0,9994
๏ฟฝฬ๏ฟฝโ1 0,9999 0,9999 0,9999
๏ฟฝฬ๏ฟฝโ2 0,9999 0,9999 0,9999
Source: The results of data analysis, 2015
Table 3 Relative Efficiency of Imputation Results of PMM Method on Normal Distribution Data
Number of
Imputation Parameter
Relative Efficiency
Missing 2%
on ๐ฟ๐ and
5% on ๐๐
Missing 5%
on ๐ฟ๐and
10% on ๐๐
Missing 10%
on ๐ฟ๐and
20% on ๐๐
3
๏ฟฝฬ๏ฟฝโ0 0,9998 0,9859 0,9801
๏ฟฝฬ๏ฟฝโ1 0,9939 0,9994 0,9636
๏ฟฝฬ๏ฟฝโ2 0,9969 0,9999 0,9767
5
๏ฟฝฬ๏ฟฝโ0 0,9997 0,9981 0,9951
๏ฟฝฬ๏ฟฝโ1 0,9871 0,9999 0,9964
๏ฟฝฬ๏ฟฝโ2 0,9922 0,9999 0,9977
10
๏ฟฝฬ๏ฟฝโ0 0,9999 0,9997 0,9988
๏ฟฝฬ๏ฟฝโ1 0,9988 0,9999 0,9996
๏ฟฝฬ๏ฟฝโ2 0,9993 0,9999 0,9997
Source: The results of data analysis, 2015
However, The result of table 3 is different with table 2. Table 3 is relative efficiency
of imputation results of PMM method on data is not normal distribution. It has value
of relative efficiency with run the slow-moving to convergent. Each column of
relative efficiency has different RE. The greater the number of missing data, the
value of RE is the slower towards the 1 value. It is mean that PMM data need more
than 10 imputations to result unbiased estimation.
5. Conclusion
This paper discuss about PMM method on data with not distribution normally and
normal data. The sample size is 38. This sample is a complete simulation data so it is
shaping nonresponse data which is fill pole of monotone data and MAR. The data of
missing is estimated with 3-10 imputations to each group data. Based on the
simulation data got that PMM method work done on not normal distribution than
normal distribution data. But it is likely that only valid on the sample size and missing
data are small. The next researcher should research on population size which is has
more size with the big case of missing data.
- 96 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
REFERENCES
[1] Durrant, B. Gabriele (2005). Imputation Methods for Handling Item-
Nonresponse in the Social Science: A Methodological Review. National
Centre for Research Methods Working Paper Series.
[2] Horton, N., dan Lipsitz, S (2001). Multiple imputation in practice:
Comparison of software package for regression model with missing
variables. Journal American Statistical Association 55: 244-255.
[3] Basuki, R (2009). Imputasi Berganda Menggunakan Metode Regresi dan
Metode Predictive Mean Matching untuk Menangani Missing Data. Thesis.
Institute of Technology Sepuluh November. Surabaya.
[4] Little, R.J.A dan Rubin, D.B (2002). Statistical Analysis With Missing
Data. Cambridge: John Wiley & Sons, Inc. pp. 3-8
[5] Enders, Craig K (2010). Applied Missing Data Analysis. New York: A
Division of Guilford Publications, Inc.
[6] Mardiah, Hafti (2010). Imputasi Missing Value pada Data yang
Mengandung Outlier. Thesis. University of Padjadjaran, Bandung.pp. 18-
24
[7] Rubin, D.B (1987). Multiple Imuputation for Nonresponse in Surveis.
Canada : John Wiley & Sons, Inc,.
[8] Yuan, Yang C. (2000). Multiple Imputation for Missing Data : Concepts
and New Development. Rockville, MD: SAS Institute Inc. pp. 5.
[9] Bruin, J (2006). Statistical Computing Seminars Missing Data in SAS Part
1. UCLA. Statistical Consulting Group.
http://www.ats.ucla.edu/stat/sas/seminars/missing_data/mi_new_1.htm
[10] Schafer, Joseph L (1999) Multiple Imputation: A Primer. Journal
Statistical Methods in Medical Research, 8: 3-15.
[11]Ilham, Nurhidayah (2014). Analisis Faktor-Faktor Yang Mempengaruhi
Laba Usaha Dagang Pada Pasar Tradisional Di Kabupaten Pangkep.
Mini Thesis. Hasanuddin University, Makassar.
[12]Karoma, Agustina R (2013). Analisis Faktor-Faktor Yang Mempengaruhi
Pola Konsumsi Mahasiswa Indekos Di Kota Makassar. Mini Thesis.
Hasanuddin University, Makassar.
- 97 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
A Review of the Time Effects in the SST Data
using Modified GamboostLSS Models
Miftahuddin*, Anisa K., Asma G.
Department of Mathematics and Statistics, Faculty of Mathematics and Natural
Sciences,
Syiah Kuala University, Hasanuddin University, Benazir Bhutto University
Jl. T.M. Abdul Rauf No.6 Darussalam, Banda Aceh, Indonesia
*Corresponding Author: [email protected]
ABSTRACT
To predict Sea Surface Temperature (SST) data precisely, investigation of the time
effects of covariates is needed on monthly and yearly bases with several climate
features, e.g. humidity, temperature and rainfall on SST are of utmost importance.
Various approaches are used to investigate the effects of covariates on the SST data.
We proposed generalized additive models for location, scale, and shape by boosting
that consider autocorrelation AR(1) (called gamboostLSS-AR(1)). The proposed
method is applied on SST data and the results indicate that there are significant
relationships between covariates and response. GamboostLSS-AR(1) models are
used to examine the effects of trends of the data.
Keyword: Annual and seasonal effects, gamboostLSS-AR(1), MPI-AR(1).
1. Introduction
The climate data set have been used to obtain some parameter uncertainties,
where various parameter effects, such as the global, local, marginal or partial
aggregate levels and correlation effects were also found, Magnus et al. [1]. Recently
in 2012, Magnus et al. [1] proposed a climate model to investigate the effects of solar
radiation and the greenhouse effect on global warming. Their analysis is based on the
data from land stations only and does not consider the relationship between sea and
land data. Global warming interacts with SST patterns [2]. The increase in global
temperature has a significant impact on the earth's climate. The earth's climate system
is influenced by a large number of parameters. Sea Surface Temperature (SST) is one
of them. It affects the regional climate that influences the global climate variability,
specifically in the tropical Indian Ocean [3,4]. SST data is very useful in getting an
indication of the earth's climate, its variability, and the tropical climate variability [3,
5, 6, 7].
In this paper the SST data set is used to model the relationship of variables in
sea and land. The variables have different measurement scales. The observed ranges
are; SST (27-31 degree Celsius), air temperature (23-29 degree Celsius), relative
- 98 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
humidity (70-100 percent) and rainfall (0-400 millimeter). SST data is from one buoy
at the Indian Ocean in position 1.5N90E from 2006 to 2012 in the period 2006-2012
with 2263 daily observation. Three climate features have several missing values, i.e.
4.1 percent of air temperature, 0.044 percent of humidity and 4.286 percent of rainfall
covariate.
The SST data obtained from sea buoy is utilized in modelling. The issues in
SST data fitting are such as the gap (missing observation) and autocorrelation of the
available data. We proposed modified gamboostLSS models using penalized spline
(P-splines) basis function in [8] to overcome these problems in model fitting of SST
data. Marginal prediction interval (MPI) was investigated in [9], for generalized
additive models for location, scale and shape (GAMLSS) without considering
autocorrelation in model fitting. In this paper we investigated MPI-AR(1) with
autocorrelation at lag 1 of gamboostLSS model fitting. The model considering time
autocorrelation effect provides many useful insights. The hyper-parameters such as
location, scale, and shape provide more detailed information. The proposed models
have a flexible structure and smoothness that incorporates many effects of covariates.
It can be used in further investigation of the effects of the time covariates in location,
scale and shape parameters as well.
2. Gamboostlss Models with Consider Autocorrelation
The autocorrelation or serial correlation errors in the data can affect the
response over time. The SST data is collected from different locations and therefore
have variability. The SST data has two types of autocorrelation, i.e. spatial and time
autocorrelation. Time autocorrelation can be derived from periodical time units; such
as daily, monthly, seasonal, and annual basis. Generally, autocorrelation occurs due to
the heteroscedasticity (serially correlated) problems when there are possible violations
of assumptions, (a). E[ฮตฮต'|X] = ฯ2In (b). E[ฮตt, ฮตt-1] = 0, [10, 11, 12].
We suggested a model auto-regressive AR(1) where generalizing differencing
approach is used to investigate autocorrelation in the data by incorporating an
autoregressive process.
Consider an additive model (AM) is y = f + ฮต where,
Yi = ฮฒ0 + โ ๐๐(๐๐๐) ๐๐=1 + ฮตi, i =1,...,n
โฆ(1)
then errors ฮตt and ฮตt-1 are ฮตt = yt - ft and ฮตt-1 = yt-1 - ft-1. Referring to equation (1), we use
the AR(1) model in a formulation in our experiments as follows,
ฮตt = ฯ ฮตt-1 + ut, t = 1, 2,...,n
โฆ(2)
If we assume the ut are uncorrelated random errors with zero mean and constant
variances then,
- 99 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
E[ut] = 0, Var[ut] = ฯu 2, and, Cov[ut, us] = 0, t โ s, โฆ(3)
and let assume that ฮต ~ N(0, ฯ2ษ ), where ษ is a correlation matrix defined through an
AR(1) with parameter ฯ. Generalized additive model (GAM) structure is given as,
g(๐) = g(E[Y|X1, X2,โฆ, Xp]) โฆ(4)
where g(.) is known as link function. From equation (1) we have:
๐โ(X) = ฮฒ0 + โ ๐๐(๐ฅ๐)๐๐=1
We assume that Y response is univariate and continuous and the loss function ๐ is
assumed to be differentiable with respect to ๐โ(X) [13, 14, 15]. To estimate the
function ๐โ(X) minimizing the expected loss function ๐ (.), such that
๐โ(.) = argmin EY,X[๐(Yi, ๐โ(Xi))] โฆ(5)
based on training data (yi, xi); i = 1, โฆ, n. Also suppose that ๐โ(X, ๐ฝ) is an
approximate function with a set of parameters ๐ฝ๐โ๐. Due to the expectation in ๐โ(.)
is unknown, so minimizing expectation by gradient boosting algorithm. Furthermore,
the function f (.) can be estimated through a constrain or objective minimization of
the empirical risk (ER) as follows,
ER = 1
๐โ ๐(๐๐ , ๐
โ(๐๐)๐๐=1 ) โฆ(6)
which is implemented by gradient boosting algorithm, e.g. functional gradient
descent [16].
To observe the functional and distributional effects in construction of model
with time covariates, consider GAMLSS (Generalized Additive Models for Location,
Scale, and Shape) without random effects as follows:
gd(๐๐) = ๐ฝ0๐๐+ โ ๐๐๐๐
(๐ฅ๐๐)๐๐=1 = ๐๐๐, d = 1, 2, 3, 4. โฆ(7)
The above model consists of two terms i.e.,
๐ฝ0๐๐, d = 1, 2, 3, 4 are the intercept term of the four submodels;
โ ๐๐๐๐(๐ฅ๐๐)
๐๐=1 = Xd ๐ฝ๐ as a parametric term; ๐๐ and ๐๐๐= ๐๐ are vectors of
length n;
๐๐๐๐ are the type of effect that covariate j has on the distribution parameter ๐๐;
๐ฝ๐๐ = (๐ฝ1๐, โฆ , ๐ฝ๐๐โฒ๐) is a parameter vector of length pdโ;
Xd is a known design matrix of order n x pd';
For instance, fj๐๐(xdj) is linear or smooth effect, categorical effect, and other effects
depending on the characteristic of the covariates [8,17]. Each distribution has a fitting
function. Through the link-function like in equation (4), precision can be achieved in
fitting process [18]. From above GAMLSS equation (7), we know that gd(.) is a
monotonic link function that is related to distribution parameter ๐๐ with predictor ๐๐
g1(๐) = ๐๐= ๐ฝ0๐+ โ ๐๐๐(๐ฅ๐)
๐๐=1 = X1 ๐ฝ1; g2(๐) = ๐๐= ๐ฝ0๐+ โ ๐๐๐
(๐ฅ๐)๐๐=1 = X2 ๐ฝ2
โฆ(8a)
g3(๐) = ๐๐= ๐ฝ0๐+ โ ๐๐๐(๐ฅ๐)
๐๐=1 = X3 ๐ฝ3; g4(๐) = ๐๐= ๐ฝ0๐+ โ ๐๐๐
(๐ฅ๐)๐๐=1 = X4
๐ฝ4 โฆ(8b)
- 100 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
GAMLSS distribution from equation (7), which is represented by observations (yi, xiT)
for i = 1, 2,...,n where yi is response variable and xi = (xi1,..., xip)T is a set of the
covariates vector. The conditional density function (cdf) fY(yi|๐๐), depend on
๐๐ = (๐, ๐, ๐, ๐) โฆ(9)
where a vector of four distribution parameters, i.e. ๐, ๐, ๐ and ๐ are location, scale,
skewness and kurtosis parameters, respectively [8,17,19,20]. In general, each
distribution parameter is modelled through its own additive covariate ๐๐๐and depends
on additive and covariates effects, such as nonlinear, smooth, interaction, etc [17,18].
Location parameter of distribution referred to the measurement of the center and scale
of distribution refers to the variance or dispersion, while shape parameter of
distribution refers to skewness and kurtosis. The optimization of the distribution
parameters of cdf in equation (9) for gamboostLSS models are,
(๏ฟฝฬ๏ฟฝ, ๏ฟฝฬ๏ฟฝ, ๏ฟฝฬ๏ฟฝ, ๏ฟฝฬ๏ฟฝ) = argmin๐๐,๐๐,๐๐,๐๐ EY,X[๐(๐๐ , ๐๐(๐), ๐๐(๐), ๐๐(๐), ๐๐(๐)] โฆ(10)
with loss function ๐ = -L the negative log-likelihood of the response distribution and
based on training data. By equation (6), gradient boosting approach to minimize the
ER is used,
ER = 1
๐โ ๐(๐๐ , ๐๐๐
)๐๐=1 โฆ(11)
The P-spline with autocorrelation errors is:
PLS(ฮฒ) = (u โ Bฮฒ)T Vโ1 (u โ Bฮฒ) + ฮปW(ฮฒ,m) โฆ(12)
where the correlation matrix V = [vij] as suggested in [21].
3. Methodology
To apply gamboostLSS model fitting in autocorrelation for the data, we use
autocorrelation of AR(1) model. Then, we used the gamboostLSS-AR(1) model fitting
for the SST data set. In general, the procedure of gamboostLSS-AR(1) model fitting
are as follows:
a). Determine the parameter ฯโs by using generalized least squares technique.
b). Decide the parameters of continuous and time covariates in base-learners
specification.
c). Determine assumption of the distribution of parameter in gamboostLSS-AR(1)
model.
d). Apply the single autocorrelation coefficient of ฯ in gamboostLSS-AR(1) model
fitting.
e). Determine the suitable fitting for gamboostLSS-AR(1) model to obtain the
appropriate global model fitting, which produces submodels. The global model is
related to data response, while submodel, which is called local fitting, is related to
the covariates. By tuning hyper-parameters we can fit the time covariates in the
model to obtain appropriate global model fitting.
f). Select the appropriate model fitting to obtain the optimal global and local models
fitting by cross-validation of the final risk (CVrisk).
- 101 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
g). Specify MPI-AR(1) of local models fitting, mainly for time effects. We use step of
length factor v = 0.01 to 0.05, 0.1 and different stopping iteration (mstop) to obtain
MPI-AR(1).
Figure 1. Scatterplot of Sea Surface Temperature data in the period 2006-2015 from buoy at position 1.5N90E,
which is found in: www.pmel.noaa.gov/tao/.
Figure 1 shows that the data have characteristics, such as gaps, irregular peaks,
periodicity and autocorrelation. We summarized the SST data of buoy in the following
table.
Table 1. Univariate description of SST dataset during the period of 2006-2012
Variable Minimum Q1 Median Mean Q3 Maximum
SST 27.90 28.78 29.09 29.13 29.42 30.87
Temperature 23.57 25.90 26.47 26.45 27.00 30.95
Humidity 74.00 86.00 89.00 89.15 92.27 102.23
Rainfall 0.00 0.00 0.90 12.18 11.00 414.00
Table 1, displays the statistical description of SST climate features from buoy. The
central tendency of SST data in the above table shows almost similar value, except for
rainfall covariate. The dispersion of SST data is as follows: SST (2.97oC), temperature
(7.38oC), humidity (28.23%), and rainfall (414 mm).
4. Result and Discussion
In this study, we examined the SST data from the Tropical Atmosphere Ocean (TAO)
moored ocean buoy positioned at 1.5N90E which are in the Indian Ocean in the
period 2006-2012 with complete data case 2066 daily observations. Whereas the
three climate parameters from the Meulaboh land station are from the same period.
- 102 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Preliminary analysis of the SST data (Figure 2a) reveals residual in
autocorrelation function (ACF). The ACF model of the buoy is ฯ = 0.8835944 in high
autocorrelation category.
Figure 2, GamboostLSS-AR(1) model fitting for the SST data from the buoy at position
1.5N90E.
The ACF plot can be used to detect the pattern of autocorrelation errors of the SST
data. The change of pattern in peaks and magnitudes are also displayed in various
period. Figure 2(b) displays a smooth gamboostLSS-AR(1) model fitting with
considering time autocorrelation. The model produces 12 submodels as seen in Figure
3.
- 103 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Figure 3. GamboostLSS-AR(1) model fitting produces 12 submodels.
Figure 3 shows the local model fitting of the SST dataset by using gamboostLSS-
AR(1) model. It consists of some figures which present the climate features which
are temperature, humidity, and rainfall. Each of them represents submodel of
gamboostLSS-AR(1) model. It can be seen that the mu parameters of temperature
and rainfall shows similar curves which are linear, according to linear base-learner.
In contrast, the mu and sigma parameters of temperature and rainfall covariates show
the opposite direction.
The mu parameter of temperature in smooth base-learner have unique curve, whereas
the mu and sigma parameters of humidity have an upward and downward curves. In
this figure, we also capture the Nrdays and Doy as time covariates to determine the
annual and seasonal effects respectively, of fitting model of the SST data. For annual
effects, the mu parameter shows decreasing trend before 1000 days and increasing
with a peak about 1500 days. Then annual effects decrease before the gap and slightly
stable after the gap in the mu parameter
It is a different trend in the sigma parameter. Decreasing of annual effects occurred
about 500 days and a slight increase before the gap and drastically increases after the
- 104 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
gap. For the seasonal term, the mu parameter shows a peak about 150 days (or April),
whereas the sigma parameter seen as the letterโVโ.
Several studies on climate features such as rainfall variability related to SST
variability in [22] states that no clear relationship between rainfall and SST mainly
from October to March in region B (include Sumatra Island), whereas from December
to February shows a high precipitation and low SST. Although [22] used monthly
observations and different time periods, interestingly, we can see from our study that
relationship between SST and seasonal effects from April to August is downwards,
but different level (in magnitudes) from July to August with the base line as January
as captured in Figure 3. The graph shows that there is an increase in the monthly
effects from October to April, but different level (in magnitudes) from October to
February.
By SST data experiment, we obtain how P-splines smoothing property and gradient
boosting in gamboostLSS-AR(1) model can help to discover an underlying variability
structure of time covariates and other covariates. This result shows technique trade-
off between global and local appropriate fitting and detail visualization in location,
scale, and shape parameters. Meanwhile, some approaches addressed for the
interpretability of covariates related to spatial nature of climate features, such as local
variation by econometric models with kernel technique [1] and generalized linear
models (GLMs) in [23].
In addition, as inheritance of gamboostLSS [8,9], then gamboostLSS-AR(1) models
can be applied with cheap computationally cost, high dimensional data, suitable for
use with large data set, variable selection, handle complex data structure and relax to
cover common issues of the SST data. In this model, gradient boosting is central in
fitting process, prediction accuracy, handle various risk functions, simultaneous
process between model fitting and variable selection, and addresses multi-collinearity
issues [13,15,16].
Therefore, we consider 80% and 95% confidence interval for the MPI of the SST
data. It can be seen in the figures that the resulted models have different values of the
ฮฝ and mstop, have similar MPI-AR(1) patterns. This is also interesting because the
different values of control boosting parameters do not change MPI-AR(1) patterns.
However, we do not present plots of the MPI-AR(1) patterns because they are
structurally similar to those obtained from the gamboostLSS-AR(1) model fitting.
- 105 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
(a) MPI-AR(1) of seasonal effects (b) MPI-AR(1) of annual effects
Figure 4. The MPI-AR(1) of seasonal and annual effects of the gamboostLSS-AR(1)
model fitting for the SST data with ฯ= 0.8835944 in size of length factor v= 0.01 and
mstop= 110000.
Furthermore, the results are presented using the size of length factor ฮฝ= 0.01 as
depicted in Figures 4. As can be seen from Figure 4, the annual effects curves (see
Figures 4, (a) and (b)) seem wider when the data available. In other words, the curves
of the missing data (gap) are closer to each other. MPI-AR(1) of the seasonal effects
at buoy show a bimodal curve. Removing autocorrelation effect on MPI-AR(1) shows
significant effects.
The smoothing of the SST data fitting depends on the selection of the hyper-parameters
for the base-learners of the gamboostLSS-AR(1) model. These parameters including
the degrees of freedom df, the number of knots, the stopping iteration mstop and the
coefficient autocorrelation ฯ. This also determines the number of submodels to be
produced by the model.
5. Conclusion
One of the issues in SST data is the presence of autocorrelation in the data. Therefore,
we have proposed gamboostLSS-AR(1) model to deal with common issue of the SST
data. We have applied generalized differencing technique to reduce the time
autocorrelation of the SST data in fitting process.
Removing autocorrelation with AR(1) model has a large impact on global and local
model fitting. By tuning hyper-parameters, which are flexible and interpretable
estimates of annual and seasonal effects in climate features, we can achieve the
appropriate gamboostLSS-AR(1) models. The proposed model can be used in further
investigation of the effects of the time covariates in location, scale and shape
parameters.
- 106 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
We also computed MPI-AR(1) for gamboostLSS-AR(1) model fitting. The choice of
hyper-parameters in the model affects the MPI-AR(1). Through MPI-AR(1) of
gamboostLSS-AR(1) the missing data on the gap can be estimated with confidence
intervals.
REFERENCES
[1] Magnus, J. R., Melenberg, B., and Muris, C. Global Warming and Local Dimming: The
Statistical Evidence. Journal of the American Statistical Association, vol. 106: 452-
464, Taylor and Francis,2012.
[2] Xie, S. P., Deser, C., Vecchi, G. A., Ma, J., Teng, H. and Wittenberg, A. T., โGlobal
warming pattern formation: Sea Surface Temperature and Rainfall. Journal of American
Meteorological Society, vol. 23: 966โ986, 2010.
[3] Schott, F.A., Xie, S. P., and P. McCreary, J. Indian Ocean Circulation and Climate
Variability. Reviews of Geophysics, vol. 47: 1-46, American Geophysical Union, 2009.
[4] Dommenget, D. and Jansen, M. Notes and Correspondence: Prediction of Indian Ocean
SST Indices with a Simple Statistical Model: A Null Hypothesis. Journal of the Climate,
vol. 22: 4930-4938, American Meteorological Society, 2009.
[5] North, G.R. and Stevens, M.J. Detecting climate signals in the surface temperature record.
Journal of climate, vol. 11: 563-577, 1998.
[6] Deser, C., Alexander, M. A., Xie, S. P. and Phillips, A. S. Sea Surface Temperature
Variability: Patterns and Mechanisms. The Annual Review of Marine Science, vol. 2:
115-143, 2010.
[7] B. P. Kumar, J. Vialard, M. Lengaigne, V. S. N. Murty, M. J. McPhaden, M. F. Cronin,
and K. G. Reddy. Evaluation of Air-sea heat and momentum fluxes for the tropical oceans
and introduction of TropFlux. CLIVAR, vol 58: 1-9, 2012.
[8] Mayr, A., Fenske, N., Hofner, B., Kneib, T., and Schmid, M. Generalized Additive Models
for Location, Scale and Shape for High Dimensional Data: a Flexible Approach Based
on Boosting. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2012.
[9] Hofner, B., Mayr, A., and Schmid, M. GamboostLSS: An R Package for Model
Building and Variable Selection in the GAMLSS Framework, CRAN, 2014.
[10] Greene, W. H. Econometric Analysis. Prentice Hall, 2011.
[11] Hsiao, C. Analysis of Panel Data. Cambridge University Press, 2003.
[12] Baltagi, B. H. Econometric Analysis of Panel Data. John Wiley and Sons, 2005.
[13] Schmid, M. and Hothorn, T. Boosting additive models using component-wise P-splines.
Technical Report, no. 002: 1โ21, 2007.
[14] Henning, C. and Kutlukaya, M. Some thoughts about the design of loss functions.
REVSTAT-Statistical Journal, vol. 5, no. 1: 19โ39, 2007.
[15] Natekin, A. and Knoll, A. Gradient boosting machines, a tutorial. Frontiers in
Neororobotics, vol 7, 2013.
[16] Buhlmann, P. and Yu, B. Boosting with the L2 loss: regression and classification. Journal
of the American Statistical Association, vol. 98: 324โ339, 2003.
[17] Mayr, A., Fenske, N., Hofner, B., Kneib, T., and Schmid, M. GAMLSS for High-
Dimensional Data-a Flexible Approach Based on Boosting. Ludwig-Maximilians-
Universitat Munchen: 1-29, 2010.
- 107 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
[18] Rigby, B. and Stasinopoulos, M.. A flexible regression approach using GAMLSS in R.
University of Athens, 2010.
[19] Rigby, R. A. and Stasinopoulos, D. M. Generalized Additive Models for Location, Scale
and Shape. The Journal of the Royal Statistical Society: Series C (Applied Statistics),
vol. 54: 507-554, 2005.
[20] Stasinopoulos, D. M. and Rigby, R. A. Generalized Additive Models for Location Scale
and Shape GAMLSS in R. Journal of Statistical Software. American Statistical
Association, vol. 23: 1-46, 2007.
[21] Diggle, P. J. and Hutchinson, M. F. On spline smoothing with autocorrelated errors.
Australian and new Zealand Journal Statistics, vol. 31, no. 1: 166โ182, 1989.
[22] Aldrian, E. and Susanto, R. D. Identification of three dominant rainfall regions within
Indonesia and their relationship to sea surface temperature. International Journal of
Climatology, vol. 23: 1435-1452, 2003.
[23] Chandler, R. E. On the use of generalized linear models for interpreting climate
variability. Environmetrics, vol. 16: 699-715, 2005.
- 108 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
Prediction of Rainfall by State Space Model For Missing Data
Muflihah*, Armin Lawi, Erna Tri Herdiani
Department of Mathematics, Faculty of Mathematics and Natural Sciences, Hasanuddin
University,
Jl. Perintis Kemerdekaan Km.10 Tamalanrea, Makassar, Indonesia
*Corresponding Author: [email protected]
ABSTRACT
State space models is the model used by the kalman filter equations consisting of
observation and transition equations . In this paper airm to apply the space model in
estimating missing data, and the determine the accuracy the estimattio of state space
model on the missing data. The aplication of the model used time series data of
monthly rainfall. From the data of the rainfall it was simulated that 19 missing data
would be estimated using the state space model . The estimastion results were
analysed using Paired Samples T Test. The results of the state space model estimation
shows U Theil's statistic value of 0.0742 which indicates that this model is valid and
feasible to be used.
Keyword:State space model, kalman filter, missing data, rainfall.
1.Introduction
Missing data is undesired by researchers because it may cause difficulties for
analysis and decision-making process. Cryer in 1986 [2]declared that if the position
data is lost there in the middle it is necessary to fill the estimation of the missing data.
Research by [4]showed on the estimation of the missing data by using multiple
models one of which is the model state space using data abduction wallet in Chicago
and produce optimal estimates for missing data. A common treatment used to
estimate the missing data is to fill in the missing data with the average value of the
data series. This method has many deficiency because it leads to reduced diversity of
data that may result correlation in the data to be biased, so this method is not feasible
for use again.
Another method often used in handling missing data is the method of listwise
deletion and pairwaise deletion, however in this method if the value of the missing
data occurs in the majority of the data, the information from the data will be wasted.
So that even this method is not appropriate for use.At 2006, David conducts research
on estimating the missing data by using multiple models one of which is the model
state space. The study was conducted using data theft of money in Chicago that the
data is not stationary and random lost data position five times in a sequence that
produces estimates closer to the actual value. Model State Space is a new approach
in time series analysis. This modeling approach can incorporate multiple time series
- 109 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
models such as ARIMA Box-Jenskins and structural time series. This model is an
orderly and flexible. General in the sense that can be applied to the overall modeling
of time series with Box-Jenkins method and flexible because it can be applied to
univariate and multivariate time series. In the ARIMA model, this approach can
facilitate the handling of data lost due to only one prediction, estimation can be
updated easily
Besides, research has been carried out by the Kalman filter Mirawati, et al
(2013) who make predictions on rainfall data and found a pretty good value in
describing the actual rainfall patterns. Daniel &Leo in 2013 [3] and has conducted
research on the estimation of missing data by using state space models for which data
is applied to the nonlinear and AR (1) with the result that a satisfactory value. Kalman
filter is a recursive algorithm that gives optimal estimation that depends on a series
of information and knowledge about the parameters of the model state space
(Mumtaz, 2009). Its main purpose is to estimate the state vector ๐๐ก. There are two
steps involved in this process called predict stage and update stage. Kalman filter has
provided us with two prediction equations.
Starting from the research study it appeared an idea in the application of the model
state space through a Kalman filter for handling cases of missing data that was applied
to the time series data of rainfall
2. Methodology
Data from this study are secondary data obtained from the Center for Meteorologi,
Klimatogi and Geofisika Region IV Makassar in the form of rainfall data Tenete Riau
district of Barru period 1980 to 2014.
.
The analysis method used in this study is the estimation of the missing data using the
model state space is applied to rainfall data Tenete Rilau Barru.
The model state space is one of modeling in time series analysis. This model consists
of two equations that observation equation and transition equation. Observation
equation is
๐๐ก = ๐ป๐๐๐ก + ๐๐ก , ๐๐ก~๐๐๐. ๐(0, ๐๐ก) (1)
And transition equation is
๐๐ก = ๐บ๐๐กโ1 + ๐พ๐๐ก , ๐๐ก~๐๐๐.๐(0, ๐๐ก) (2)
The advantages of using the model state space is possible to put different types of
time series models into state space formulation.
The estimation process is done with the help of numerical methods Rstudio program
using function na.StructTS the zoo package, besides Microsoft Excel and SPSS for
Windows version 20 is also used for graphing and calculating error rates.Work stages
of this study are as follows: The firs phase is a Identification of data. In this prase we
must to prepare the data in the form of time series in Micrsoft Excel, then Change the
- 110 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
data in Microsoft Excel Ecstasy into csv and Import data from csv file to R studio.
The secont phase is a estimate the missing data (Process Kalman Filter), this phase
begine with Activate the zoo in the program package Rstudio then orders using na.
StuctTs with the data that has been formatted Once the results appear on the graph
plots the original data and the data of estimation and data estimation results, and for
re to estimated by simulating some observational data are considered as missing data.
(Emptying some observation data). The third phase is a verify the model, the first
step of this phase is Testing the estimation of lost data using simulated data with
actual observed data and then to deternmine how large the model accuracy estimation
accuracy of the data is lost by using model state space. From Of all the above steps
it will produce the estimated value of the missing data.
3. Result And Discuss
Rainfall data used are monthly rainfall data Tanete Rilau Barru. based on the average
normal, rain type Tanete Rilau a monsoon type. Data used in the estimation of
missing data with as many as 180 state space model of the data with regard to some
data missing at random and the amount of data lost money have been as many as 19
data.The estimation of data missing from the simulation data is done using state space
models can be seen in Table 1.
Table 1. Comparison of the actual value and the estimated state space models
NO TIME ACTUAL ESTIMATED
1. Feb 1982 574 300
2. Oct 1982 9 69
3. Jan 1985 458 497
4. Apr 1990 201 222
5. Febr 1991 257 350
6. March 1993 254 309
7. July 1993 15 51
8. Sept 1994 0 -27
9. Jan 1995 485 461
10. May 1995 99 99
11. Sept 1995 16 48
12. Nov1999 386 329
13. Aug 2003 7 1
14. Juny 2006 130 78
15. Aug 2008 0 -26
16. March 2008 433 385
17. Des 2008 677 690
18. April 2014 209 210
19. Sep 2014 0 -9
Source: BMKG Wilayah IV Makassar (Actual)
- 111 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
From Table 1 we can see the results of state space model estimation there is a negative
value, whereas undue rainfall value should be positive and the lowest is 0. .All
negative estimation data is actually worth 0. This indicates that the model state space
is difficult to predict the value 0, so that the negative is assumed to 0, because the
lowest estimated value from state space models is negative while the value of the
lowest rainfall is 0.
The following comparison of the actual data presented by state space model es
timation using the following cha
a. Column chart b. Line chart
Gambar 1.Comparison of rainfall data to estimate the model state space
Further test the validity of the model state space to determine the appropriateness of
the model used by the results shown in Table 2.
Table 2: Result U statistical value estimation model state space
U UM US UC U (UM+ US+UC)
0.074 0.003 0.006 0.997 0.999
Based on the Table 2. value that can be seen U Theil's for 0.074 are close to 0 and
value UM of 0.0031 < 0.2 and a value of U (UM + US + UC) of 0.997.
4. Conclution
Estimates of missing data by using state space model of the ideal and valid enough
to be used according to the criteria U Theil's. As a suggestion from this paper In
future studies, it is suggested that using positive distribution assumptions to avoid
negative estimated value
- 112 -
Proceeding of International Conference on Mathematics, Statistics, Computer
Sciences, and Mathematics Education (ICMSCSME) 2015
ISBN 978-602-72198-2-3
REFERENCES
[1] Aswi., and Sukarna( 2006). Analisis Deret Waktu. Makassar: Andira
Publisher
[2] Cryer, J.D (1986). Time Series Analysis. Boston: PWS-KENT Publishing
Company
[3] Daniel, B.K and Leo, O.O (2013). Generalized Estimation of Missing
Observation in Nonlinier Time Series Model using State Space
Representation. American Journal of Theoritical and Applied Statistics.
2(2):21-28.
[4] David, Sheung Chi Fung. (2006). Methods for the Estimation of Missing
Values in Time Series. Australia : Faculty of Communications, Health and
Science - Edith Cowan University.
[5] Janacek, G., & Swift, L. (1992). Time Series Forecasting Simulation &
Application. West Sussex, England : Ellis Horwood Limited.
[6] Mumtaz, H. (2009) State Space models and The Kalman Filter. (Online),
(http://www.pftac.org/filemanager/files/Macro_Training/CCBS_2009/3_kalm
anfilter.pdf, accessed 17 January 2014).
[7] Zivot, Eric. (2006). State Space Models and The Kalman Filter. (Online),
(http://faculty.washington.edu/ezivot/econ584/notes/statespacemodels.pdf,
accessed 26 April 2013).