model regresi logit ordinal - wordpress.com · a. bandingkan hasilnya dengan output sas pada buku...

32
Analisis Data Kategorik - STK654 (Materi UAS) Dr. Kusman Sadik, M.Si Program Studi Magister Statistika Terapan Departemen Statistika IPB, Semester Ganjil 2019/2020 IPB University ─ Bogor Indonesia ─ Inspiring Innovation with Integrity Model Regresi Logit Ordinal (Peubah Respon Multikategori-Ordinal)

Upload: others

Post on 24-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

  • Analisis Data Kategorik - STK654 (Materi UAS)

    Dr. Kusman Sadik, M.Si

    Program Studi Magister Statistika Terapan

    Departemen Statistika IPB, Semester Ganjil 2019/2020

    IPB University─ Bogor Indonesia ─ Inspiring Innovation with Integrity

    Model Regresi Logit Ordinal(Peubah Respon Multikategori-Ordinal)

  • 2

    The main feature of the ordinal logistic models is that

    they predict the log odds, odds, or probability of a

    response occurring at or below any given outcome

    category.

    For example, ordering the educational attainment

    categories from lowest to highest (less than high

    school, high school, junior college, bachelor’s degree,

    graduate degree) we can use this model to predict the

    probability of being (for example) at the bachelor’s

    level or below from age at first marriage.

  • 3

    .... (a)

    (a)

  • 4

  • 5

    a

  • 6

    The slopes are assumed to be the same for all logits

    and, under this assumption, the model is known as

    the proportional odds model.

    The underlying assumption of equivalent slopes

    across all logits can, and should, be tested to verify

    that this model is appropriate.

    If this assumption appears to be violated, then one

    could fit the nominal, or more complicated alternative

    models.

  • 7

    We use data from the 2006 GSS to predict a

    respondent’s educational attainment level (degree),

    measured as either less than high school, high school,

    junior college, bachelor’s degree, or graduate degree,

    from the respondent’s age when first married

    (agewed).

    The outcome variable (educational attainment level) is

    treated as ordinal, so the proportional odds model is

    used.

  • 8

    # Model Logistik Ordinal untuk Data GSS (Azen, sub-bab 10.5)

    # Data Respon : Harus Data Terurut

    dataku

  • 9

    # Pendugaan nilai peluang untuk tiap kategori

    prediksi

  • 10

    degree degree.order agewed

    1 HIGH SCHOOL 2 22

    2 HIGH SCHOOL 2 23

    3 HIGH SCHOOL 2 24

    4 HIGH SCHOOL 2 22

    5 LT HIGH SCHOOL 1 28

    6 LT HIGH SCHOOL 1 21

    7 HIGH SCHOOL 2 29

    8 LT HIGH SCHOOL 1 19

    9 LT HIGH SCHOOL 1 28

    10 LT HIGH SCHOOL 1 29

    .

    .

    .

    1158 HIGH SCHOOL 2 21

    1159 HIGH SCHOOL 2 22

    1160 BACHELOR 4 28

    Catatan : yang dipakai “degree.order” bukan

    “degree”, karena “degree” belum terurut.

  • 11

    degree.order

    degree 1 2 3 4 5

    LT HIGH SCHOOL 195 0 0 0 0

    BACHELOR 0 0 0 185 0

    GRADUATE 0 0 0 0 104

    HIGH SCHOOL 0 590 0 0 0

    JUNIOR COLLEGE 0 0 86 0 0

  • 12

    Coefficients:

    Value Std. Error t value

    agewed 0.05059 0.01031 4.908

    Intercepts:

    Value Std. Error t value

    1|2 -0.4549 0.2431 -1.8711

    2|3 1.9226 0.2501 7.6886

    3|4 2.2940 0.2530 9.0670

    4|5 3.5242 0.2682 13.1389

    Residual Deviance: 3096.156

    AIC: 3106.156

  • 13

    degree.order agewed P.Y.1 P.Y.2 P.Y.3 P.Y.4 P.Y.5

    1 2 22 0.172538473 0.51950698 0.07310156 0.1525423 0.08231071

    2 2 23 0.165435593 0.51572572 0.07477403 0.1578513 0.08621336

    3 2 24 0.158569083 0.51150679 0.07640620 0.1632351 0.09028283

    4 2 22 0.172538473 0.51950698 0.07310156 0.1525423 0.08231071

    5 1 28 0.133396465 0.49051546 0.08241161 0.1853397 0.10833676

    6 1 21 0.179880581 0.52283962 0.07139470 0.1473156 0.07856954

    7 2 29 0.127656350 0.48431357 0.08375223 0.1909568 0.11332101

    8 1 19 0.195291690 0.52812158 0.06790088 0.1371355 0.07155037

    9 1 28 0.133396465 0.49051546 0.08241161 0.1853397 0.10833676

    10 1 29 0.127656350 0.48431357 0.08375223 0.1909568 0.11332101

    11 4 30 0.122128426 0.47776346 0.08501689 0.1965871 0.11850410

    12 2 21 0.179880581 0.52283962 0.07139470 0.1473156 0.07856954

    13 4 24 0.158569083 0.51150679 0.07640620 0.1632351 0.09028283

    14 2 18 0.203364066 0.53005548 0.06612507 0.1321935 0.06826190

    15 4 52 0.043717274 0.28635273 0.08659973 0.2930019 0.29032839

    16 4 26 0.145531789 0.50180589 0.07952560 0.1741929 0.09894384

    17 1 29 0.127656350 0.48431357 0.08375223 0.1909568 0.11332101

    18 2 19 0.195291690 0.52812158 0.06790088 0.1371355 0.07155037

    19 1 25 0.151935686 0.50686240 0.07799206 0.1686853 0.09452453

    20 4 18 0.203364066 0.53005548 0.06612507 0.1321935 0.06826190

    21 1 16 0.220246789 0.53248085 0.06254204 0.1226288 0.06210152

    22 2 20 0.187464338 0.52571395 0.06965925 0.1421779 0.07498452

    .

    .

    .

    1160 4 28 0.133396465 0.49051546 0.08241161 0.1853397 0.10833676

  • 14

    Output SAS : Bandingkan dengan Output R

  • 15

    Output SAS : Bandingkan dengan Output R

  • 16

    Perbedaan Model antara R, SPSS, dan SAS

    R dan SPSS

    SAS

  • 17

    Perbedaan Model antara R, SPSS, dan SAS

  • 18

    Interpretasi dan Pengujian Parameter

  • 19

    Ilustrasi Interpretasi Parameter (Output R)

    Coefficients:

    Value Std. Error t value

    agewed 0.05059 0.01031 4.908

    Intercepts:

    Value Std. Error t value

    1|2 -0.4549 0.2431 -1.8711

    2|3 1.9226 0.2501 7.6886

    3|4 2.2940 0.2530 9.0670

    4|5 3.5242 0.2682 13.1389

    Nilai negatif dari β

    Misal untuk Y = 1:

    ln𝑃(𝑌≤1)

    𝑃(𝑌>1)=ln

    𝑃 𝑌≤1

    1−𝑃 𝑌≤1

    = −0.455 − 0.051(𝑎𝑔𝑒𝑤𝑒𝑑)

  • 20

    Penentuan Nilai Peluang Kumulatif

    Penjabaran untuk Y = 1:

    ln𝑃(𝑌≤1)

    𝑃(𝑌>1)=ln

    𝑃 𝑌≤1

    1−𝑃 𝑌≤1= −0.455 − 0.051𝑥

    ⇔𝑃 𝑌≤1

    1−𝑃 𝑌≤1= 𝑒−0.455−0.051𝑥

    ⇔ 𝑃 𝑌 ≤ 1 = (1 − 𝑃 𝑌 ≤ 1 )𝑒−0.455−0.051𝑥

    ⇔ 𝑃 𝑌 ≤ 1 =𝑒−0.455−0.051𝑥

    1 + 𝑒−0.455−0.051𝑥

  • 21

    𝑃 𝑌 ≤ 1 =𝑒−0.455−0.051𝑥

    1 + 𝑒−0.455−0.051𝑥=

    𝑒−0.455−0.051(20)

    1 + 𝑒−0.455−0.051(20)=

    0.2288

    1.2288= 0.1862

    𝑃 𝑌 ≤ 2 =𝑒1.923−0.051𝑥

    1 + 𝑒1.923−0.051𝑥=

    𝑒1.923−0.051(20)

    1 + 𝑒1.923−0.051(20)=

    2.4670

    3.4670= 0.7116

    dengan cara yang sama dapat dihitung 𝑃 𝑌 ≤ 3 dan 𝑃 𝑌 ≤ 4

    Penentuan Nilai Peluang Kumulatif

  • 22

    𝑃 𝑌 = 4 = 𝑃 𝑌 ≤ 4 − 𝑃 𝑌 ≤ 3

    dengan cara yang sama dapat dihitung peluang setiap kategori 𝑌: 𝑃 𝑌 = 1 ,

    𝑃 𝑌 = 2 , 𝑃 𝑌 = 3 , 𝑃 𝑌 = 4 , dan 𝑃 𝑌 = 5 .

    Penentuan Nilai Peluang Setiap Kategori

  • 23

    Kesimpulan

    Berdasarkan nilai dugaan peluang tiap kategori tersebut,

    jika seseorang diketahui berumur 20 tahun saat menikah,

    apa dugaan tingkat pendidikan terakhir orang tersebut?

  • 24

  • 25

    1 Gunakan Program R untuk data Mental Impairment (Agresti, sub-

    bab 7.2.4, hlm. 279 ) .

    a. Bandingkan hasilnya dengan output SAS pada buku Agresti

    tersebut serta berikan interpretasi pada tiap nilai dugaan

    parameter model.

    b. Berdasarkan hasil pada poin (a) di atas, tentukan nilai

    dugaan P(Y = 1), P(Y = 3), dan P(Y > 2).

    c. Tentukan model terbaik.

    d. Misalkan seorang individu diketahui bahwa Life Events (x1 =

    8) dan SES (x2 = 1), berdasarkan model pada poin (c)

    tentukan dugaan “Mental Impairment”.

  • 26

  • 27

    2 Kerjakan Problem 10.7 (Azen, 2011)

  • 28

  • 29

    3 Kerjakan Problem 10.8 (Azen, 2011)

  • 30

    Pustaka

    1. Azen, R. dan Walker, C.R. (2011). Categorical Data

    Analysis for the Behavioral and Social Sciences.

    Routledge, Taylor and Francis Group, New York.

    2. Agresti, A. (2002). Categorical Data Analysis 2nd. New

    York: Wiley.

    3. Pustaka lain yang relevan.

  • 31

    Bisa di-download di

    kusmansadik.wordpress.com

  • 32

    Terima Kasih