lashkia automatic identification of organizational structure in … · 2014. 9. 1. · 1 automatic...

23
1 Automatic Identification of Automatic Identification of Organizational Structure in Writing Organizational Structure in Writing using Machine Learning using Machine Learning Laurence Anthony and George V. Laurence Anthony and George V. Lashkia Lashkia Dept. of Computer Science, Faculty of Engineering Dept. of Computer Science, Faculty of Engineering Okayama Univ. of Science, 1 Okayama Univ. of Science, 1 - - 1 1 Ridai Ridai - - cho cho , Okayama , Okayama [email protected] [email protected] [email protected] [email protected] http://antpc1.ice.ous.ac.jp http://antpc1.ice.ous.ac.jp

Upload: others

Post on 18-Aug-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

1

Auto

mat

ic I

dent

ifica

tion

of

Auto

mat

ic I

dent

ifica

tion

of

Org

aniz

atio

nal S

truc

ture

in W

ritin

g O

rgan

izat

iona

l Str

uctu

re in

Writ

ing

usin

g M

achi

ne L

earn

ing

usin

g M

achi

ne L

earn

ing

Laur

ence

Ant

hony

and

Geo

rge

V.

Laur

ence

Ant

hony

and

Geo

rge

V. L

ashk

iaLa

shki

aD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gD

ept.

of C

ompu

ter

Scie

nce,

Fac

ulty

of E

ngin

eerin

gO

kaya

ma

Univ

. of S

cien

ce, 1

Oka

yam

a Un

iv. o

f Sci

ence

, 1-- 1

1 Ri

dai

Rida

i -- chocho ,

Oka

yam

a, O

kaya

ma

anth

ony@

ice.

ous.

ac.jp

anth

ony@

ice.

ous.

ac.jp

lash

kia@

ice.

ous.

ac.jp

lash

kia@

ice.

ous.

ac.jp

http

://a

ntpc

1.ic

e.ou

s.ac

.jpht

tp:/

/ant

pc1.

ice.

ous.

ac.jp

Page 2: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

2

Pres

enta

tion

Out

line

Pres

enta

tion

Out

line

Back

grou

ndBa

ckgr

ound

Rese

arch

Aim

Rese

arch

Aim

Syst

em D

esig

n (O

verv

iew

)Sy

stem

Des

ign

(Ove

rvie

w)

Appl

icat

ion

to R

esea

rch

Abst

ract

sAp

plic

atio

n to

Res

earc

h Ab

stra

cts

Resu

lts (

Accu

racy

)Re

sults

(Ac

cura

cy)

Res

ults

(Ef

fect

iven

ess

in t

he C

lass

room

)Res

ults

(Ef

fect

iven

ess

in t

he C

lass

room

)So

ftw

are

Dem

onst

ratio

nSo

ftw

are

Dem

onst

ratio

nCo

nclu

sion

sCo

nclu

sion

s

Page 3: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

3

Back

grou

ndBa

ckgr

ound

Impo

rtan

ce o

f Te

xt S

truc

ture

Impo

rtan

ce o

f Te

xt S

truc

ture

Swal

es (

1981

, 199

0),

Swal

es (

1981

, 199

0),

Carr

ell

Carr

ell (

1982

)(1

982)

Hin

ds (

1982

, 198

3),

Hin

ds (

1982

, 198

3),

Hoe

yH

oey

(199

4), W

inte

r (1

994)

(199

4), W

inte

r (1

994)

Stud

ies

on T

ext

Stru

ctur

eSt

udie

s on

Tex

t St

ruct

ure

TITL

ESTI

TLES

--D

udle

yD

udle

y --Ev

ans

(199

4), A

ntho

ny (

2001

)Ev

ans

(199

4), A

ntho

ny (

2001

)A

BST

RA

CTS

AB

STR

AC

TS--

Ayer

s (1

993)

, Pos

tegu

illo

(199

6)Ay

ers

(199

3), P

oste

guill

o (1

996)

INST

RO

DU

CTI

ON

SIN

STR

OD

UC

TIO

NS

--Sw

ales

(19

90),

Ant

hony

(19

99)

Swal

es (

1990

), A

ntho

ny (

1999

)D

ISC

USS

ION

SD

ISC

USS

ION

S--

Hop

kins

& D

udle

yH

opki

ns &

Dud

ley --

Evan

s (1

988)

Evan

s (1

988)

PA

TEN

TSP

ATE

NTS

--Ba

zerm

an (

1994

)Ba

zerm

an (

1994

)G

RA

NT

PR

OP

OSA

LSG

RA

NT

PR

OP

OSA

LS--

Conn

or &

Co

nnor

& M

aura

nen

Mau

rane

n(1

999)

(199

9)LE

GA

L W

RIT

ING

LEG

AL

WR

ITIN

G--

Bhat

ia (

1993

)Bh

atia

(19

93)

Page 4: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

4

Back

grou

ndBa

ckgr

ound

Prob

lem

s w

ith A

naly

zing

Tex

t St

ruct

ure

Prob

lem

s w

ith A

naly

zing

Tex

t St

ruct

ure

We

need

a la

rge

corp

us o

f te

xt d

ata

We

need

a la

rge

corp

us o

f te

xt d

ata

(The

tex

t da

ta m

ust

(The

tex

t da

ta m

ust

‘‘ ACU

RATE

LYAC

URA

TELY

’’ rep

rese

nt w

hat

we

hope

to

repr

esen

t w

hat

we

hope

to

stud

y)st

udy)

We

need

a lo

t of

res

earc

h tim

eW

e ne

ed a

lot

of r

esea

rch

time

(We

mus

t an

alyz

e a

lot

of t

exts

)(W

e m

ust

anal

yze

a lo

t of

tex

ts)

We

need

goo

d va

lidat

ion

and

relia

bilit

y te

sts

We

need

goo

d va

lidat

ion

and

relia

bilit

y te

sts

(Bec

ause

eva

luat

ing

stru

ctur

e ca

n be

ver

y su

bjec

tive)

(Bec

ause

eva

luat

ing

stru

ctur

e ca

n be

ver

y su

bjec

tive)

Mos

t Te

xt S

truc

ture

Stu

dies

are

Mos

t Te

xt S

truc

ture

Stu

dies

are

‘‘ Sm

all S

cale

Smal

l Sca

le’’

Page 5: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

5

Back

grou

ndBa

ckgr

ound

Hen

ry e

t al

. (20

01)

Hen

ry e

t al

. (20

01)

40 A

pplic

atio

n Le

tter

s40

App

licat

ion

Lett

ers

Taro

neTa

rone

et a

l. (2

000)

et a

l. (2

000)

2 Ph

ysic

s Res

earc

h Ar

ticle

s2

Phys

ics

Res

earc

h Ar

ticle

s

Conn

or e

t al

. (19

99)

Conn

or e

t al

. (19

99)

34 G

rant

Pro

posa

ls34

Gra

nt P

ropo

sals

Will

iam

s (1

999)

Will

iam

s (1

999)

5 M

edic

al R

esea

rch

Artic

les

5 M

edic

al R

esea

rch

Artic

les

Anth

ony

(199

9)An

thon

y (1

999)

12 C

ompu

ter

Scie

nce

Rese

arch

Art

icle

12

Com

pute

r Sc

ienc

e Re

sear

ch A

rtic

le

Intr

oduc

tions

Intr

oduc

tions

Page 6: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

6

Res

earc

h Ai

mRes

earc

h Ai

m

Dev

elop

a C

ompu

ter

Syst

em t

o Pr

oces

s D

evel

op a

Com

pute

r Sy

stem

to

Proc

ess

Text

s an

d An

alyz

e Te

xt S

truc

ture

Te

xts

and

Anal

yze

Text

Str

uctu

re

Auto

mat

ical

lyAu

tom

atic

ally

––A A

‘‘ Mac

hine

Lea

rnin

g Sy

stem

Mac

hine

Lea

rnin

g Sy

stem

’’fo

r te

xt

for

text

st

ruct

ure

stru

ctur

eEa

sy t

o pr

oces

s a

larg

e co

rpus

of

text

dat

aEa

sy t

o pr

oces

s a

larg

e co

rpus

of

text

dat

aFa

stFa

stTh

e an

alyt

ic p

roce

ss w

ould

be

clea

rly d

efin

edTh

e an

alyt

ic p

roce

ss w

ould

be

clea

rly d

efin

edEa

sy t

o te

st t

he r

elia

bilit

y an

d va

lidity

Ea

sy t

o te

st t

he r

elia

bilit

y an

d va

lidity

Page 7: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

7

Syst

em D

esig

n (O

verv

iew

)Sy

stem

Des

ign

(Ove

rvie

w)

Mac

hine

Lea

rnin

g: U

nsup

ervi

sed

? M

achi

ne L

earn

ing:

Uns

uper

vise

d ?

Supe

rvis

ed L

earn

ing

Supe

rvis

ed L

earn

ing ’’

??In

Sup

ervi

sed

Lear

ning

,In

Sup

ervi

sed

Lear

ning

,G

ive

the

syst

em a

str

uctu

ral m

odel

(se

t of

cla

sses

)G

ive

the

syst

em a

str

uctu

ral m

odel

(se

t of

cla

sses

)G

ive

the

syst

em e

xam

ples

of

the

mod

elG

ive

the

syst

em e

xam

ples

of

the

mod

elTe

ll th

e sy

stem

wha

t Te

ll th

e sy

stem

wha

t ‘‘ fe

atur

esfe

atur

es’’ i

n th

e ex

ampl

es a

re

in t

he e

xam

ples

are

im

port

ant

impo

rtan

tD

efin

e a

rela

tion

betw

een

the

clas

ses

and

the

Def

ine

a re

latio

n be

twee

n th

e cl

asse

s an

d th

e fe

atur

esfe

atur

esCl

assi

fy n

ew t

ext

exam

ples

by

com

parin

g its

Cl

assi

fy n

ew t

ext

exam

ples

by

com

parin

g its

fe

atur

es w

ith t

hose

in e

ach

clas

sfe

atur

es w

ith t

hose

in e

ach

clas

s

Page 8: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

8

Syst

em D

esig

n (O

verv

iew

)Sy

stem

Des

ign

(Ove

rvie

w)

Prob

lem

sPr

oble

ms

We

need

a

We

need

a ‘‘ g

ood

good

’’ mod

el o

f st

ruct

ure

mod

el o

f st

ruct

ure

––Bu

t th

ere

are

man

y m

odel

s of

str

uctu

re in

the

lite

ratu

reBu

t th

ere

are

man

y m

odel

s of

str

uctu

re in

the

lite

ratu

re

We

need

a s

et o

f W

e ne

ed a

set

of

‘‘ labe

led

exam

ples

labe

led

exam

ples

’’––

But

man

y sy

stem

s w

ork

wel

l with

onl

y a

few

labe

led

But

man

y sy

stem

s w

ork

wel

l with

onl

y a

few

labe

led

exam

ples

exam

ples

We

need

a

We

need

a ‘‘ g

ood

good

’’ set

of

feat

ures

set

of f

eatu

res

––Bu

t la

ngua

ge c

onta

ins

a LO

T of

noi

se w

ords

!Bu

t la

ngua

ge c

onta

ins

a LO

T of

noi

se w

ords

!(e

.g. a

, the

, of,

in, a

t, b

ut?,

tho

ugh?

, (e

.g. a

, the

, of,

in, a

t, b

ut?,

tho

ugh?

, ……

))––

Build

ing

a lis

t of

fea

ture

s by

han

d is

infe

asib

leBu

ildin

g a

list

of f

eatu

res

by h

and

is in

feas

ible

We

need

a

We

need

a ‘‘ g

ood

good

’’ rel

atio

n be

twee

n th

e cl

asse

s an

d re

latio

n be

twee

n th

e cl

asse

s an

d th

e fe

atur

esth

e fe

atur

es

Page 9: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

9

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Giv

e th

e sy

stem

a s

truc

ture

mod

el:

Giv

e th

e sy

stem

a s

truc

ture

mod

el:

‘‘ Mod

ifie

dM

odif

ied ’’

CAR

S M

odel

(Sw

ales

, 199

0: A

ntho

ny, 1

999)

CAR

S M

odel

(Sw

ales

, 199

0: A

ntho

ny, 1

999)

Mov

e 1

Mov

e 1

Esta

blis

hing

Es

tabl

ishi

ng

1.1

1.1

Cla

imin

g c

entr

alit

yC

laim

ing

cen

tral

ity

a Te

rrito

ry

a Te

rrito

ry

1.2

1.2

Mak

ing

top

ic g

ener

aliz

atio

ns

Mak

ing

top

ic g

ener

aliz

atio

ns

1.3

1.3

Rev

iew

ing

item

s of

pre

viou

s re

sear

chR

evie

win

g it

ems

of p

revi

ous

rese

arch

Mov

e 2

Mov

e 2

Esta

blis

hing

Es

tabl

ishi

ng

2.1

A2

.1A

Cou

nte

r cl

amin

gC

oun

ter

clam

ing

a ni

che

a ni

che

2.1

B

2.1

B

Ind

icat

ing

a g

apIn

dic

atin

g a

gap

2.1

C

2.1

C

Qu

esti

on r

aisi

ng

Qu

esti

on r

aisi

ng

2.1

D

2.1

D

Con

tin

uin

g a

tra

diti

onC

onti

nu

ing

a t

radi

tion

Mov

e 3

Mov

e 3

Occ

upyi

ng

Occ

upyi

ng

3.1

A

3.1

A

Ou

tlin

ing

purp

ose

Ou

tlin

ing

purp

ose

the

nich

eth

e ni

che

3.1

B

3.1

B

An

nou

nci

ng

pres

ent

rese

arch

An

nou

nci

ng

pres

ent

rese

arch

3.2

3.2

An

nou

nci

ng

prin

cipa

l fin

din

gsA

nn

oun

cin

g pr

inci

pal f

ind

ings

3.3

3.3

Eval

uat

ion

of

rese

arch

Eval

uat

ion

of

rese

arch

3.4

3.4

Ind

icat

ing

RA

str

uct

ure

Ind

icat

ing

RA

str

uct

ure

Page 10: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

10

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

Giv

e th

e sy

stem

exa

mpl

es o

f th

e m

odel

100

Abst

ract

s (I

EEE

Tran

s. o

n PD

S) d

ivid

ed in

to10

0 Ab

stra

cts

(IEE

E Tr

ans.

on

PDS)

div

ided

into

692

labe

led

69

2 la

bele

d ‘‘ S

teps

Uni

tsSt

eps

Uni

ts’’ (

only

exa

mpl

es f

rom

6 c

lass

es)

(onl

y ex

ampl

es f

rom

6 c

lass

es)

554

Step

Uni

ts (

80%

) us

ed f

or

554

Step

Uni

ts (

80%

) us

ed f

or ‘‘ t

rain

ing

trai

ning

’’ the

sys

tem

the

syst

em13

8 St

ep U

nits

(20

%)

used

for

13

8 St

ep U

nits

(20

%)

used

for

‘‘ tes

ting

test

ing ’’

the

syst

emth

e sy

stem

Tell

the

syst

em w

hat

Tell

the

syst

em w

hat

‘‘ feat

ures

feat

ures

’’ to

look

at

to lo

ok a

tAl

l wor

d cl

uste

rs (

chun

ks)

up t

o 5

wor

ds lo

ngAl

l wor

d cl

uste

rs (

chun

ks)

up t

o 5

wor

ds lo

ngPo

sitio

n of

ste

p un

it in

abs

trac

t (i.

e. 1

Posi

tion

of s

tep

unit

in a

bstr

act

(i.e.

1stst

line,

2lin

e, 2

ndndlin

e,

line,

……))

(Red

uce

(Red

uce

‘‘ Noi

seN

oise

’’ in

Feat

ures

)in

Fea

ture

s)Au

tom

atic

ally

ran

k w

ords

by

Auto

mat

ical

ly r

ank

wor

ds b

y ‘‘ im

port

ance

impo

rtan

ce’’ u

sing

:us

ing:

raw

fre

quen

cy,

raw

fre

quen

cy,

Info

rmat

ion

Gai

nIn

form

atio

n G

ain

Use

onl

y hi

gh r

anke

d w

ords

Use

onl

y hi

gh r

anke

d w

ords

Page 11: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

11

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

““ In

this

pap

er, w

e pr

opos

e a

new

sys

tem

.In

thi

s pa

per,

we

prop

ose

a ne

w s

yste

m. ””

––1

wor

d ch

unks

1 w

ord

chun

ksin

/ th

is/

pape

r/ w

e/ p

ropo

se/

a/ n

ew/

syst

emin

/ th

is/

pape

r/ w

e/ p

ropo

se/

a/ n

ew/

syst

em

––2

wor

d ch

unks

2 w

ord

chun

ksin

thi

s/ t

his

pape

r /

pape

r w

e/ w

e pr

opos

e/

in t

his/

thi

s pa

per

/ pa

per

we/

we

prop

ose/

pr

opos

e a/

a n

ew/

new

sys

tem

prop

ose

a/ a

new

/ ne

w s

yste

m

––3

wor

ds c

hunk

s3

wor

ds c

hunk

sin

thi

s pa

per

/ th

is p

aper

we/

pap

er w

e pr

opos

e/in

thi

s pa

per

/ th

is p

aper

we/

pap

er w

e pr

opos

e/w

e pr

opos

e a/

pro

pose

a n

ew/

a ne

w s

yste

mw

e pr

opos

e a/

pro

pose

a n

ew/

a ne

w s

yste

m

––……

Page 12: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

12

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

““ In

this

pap

er, w

e pr

opos

e a

new

sys

tem

.In

thi

s pa

per,

we

prop

ose

a ne

w s

yste

m. ””

––1

wor

d ch

unks

1 w

ord

chun

ksinin

/ / th

isth

is/ /

pape

rpa

per /

/ w

ew

e / / pr

opos

epr

opos

e / / aa /

/ ne

wne

w/ /

syst

emsy

stem

––2

wor

d ch

unks

2 w

ord

chun

ksin

thi

s/

in t

his/

thi

s pa

per

this

pap

er/

pape

r w

e/

/ pa

per

we/

we

prop

ose

we

prop

ose /

/ pr

opos

e a/

a n

ew/

prop

ose

a/ a

new

/ ne

w s

yste

mne

w s

yste

m

––3

wor

d ch

unks

3 w

ord

chun

ksin

thi

s pa

per

in t

his

pape

r/

this

pap

er w

e/ p

aper

we

prop

ose/

/ th

is p

aper

we/

pap

er w

e pr

opos

e/w

e pr

opos

e a

we

prop

ose

a / p

ropo

se a

new

/ /

prop

ose

a ne

w/

a ne

w s

yste

ma

new

sys

tem

––……

Page 13: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

13

Info

rmat

ion

Gai

n (I

G)

Info

rmat

ion

Gai

n (I

G)

jj

pp

DEntropy

c j2

log

)(

1−

≡∑ =

whe

re

whe

re pp

jjis

the

pro

port

ion

of d

ata

(is

the

pro

port

ion

of d

ata

( DD)

in a

cla

ss

) in

a c

lass

jjfr

om

from

th

e se

t of

cla

sses

th

e se

t of

cla

sses

CC..

)(

||

||

)(

),

()

(v

v

wValues

vD

Entropy

DDD

Entropy

wD

Gain

∑∈

−≡

whe

re

whe

re V

alue

sVa

lues

(w(w))is

the

set

of

all p

ossi

ble

valu

es f

or

is t

he s

et o

f al

l pos

sibl

e va

lues

for

w

ord

wor

d w

,w

,an

d an

d DD

vvis

the

sub

set

of

is t

he s

ubse

t of

DDfo

r w

hich

wor

d fo

r w

hich

wor

d ww

has

has

a va

lue

a va

lue

vv ..

Page 14: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

14

Info

rmat

ion

Gai

n (I

G)

Info

rmat

ion

Gai

n (I

G)

Proc

ess

310

task

_mig

ratio

n2

9di

ffic

ult

18

not

and

7of

ten

is6

trans

mitt

ing

of5

is_o

ften

in4

diff

icul

t_to

to3

2_ho

wev

era

2ho

wev

erth

e1

Info

rmat

ion

Gai

n (I

G)

Raw

Fre

quen

cyR

ank

Page 15: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

15

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Def

ine

a re

latio

n be

twee

n fe

atur

es a

nd c

lass

esD

efin

e a

rela

tion

betw

een

feat

ures

and

cla

sses

––U

se p

roba

bilit

y of

eac

h cl

ass

and

the

prob

abili

ty o

f U

se p

roba

bilit

y of

eac

h cl

ass

and

the

prob

abili

ty o

f fe

atur

es (

clus

ters

) be

ing

in e

ach

clas

sfe

atur

es (

clus

ters

) be

ing

in e

ach

clas

s(( A

NA

A N

AÏÏ V

E BA

YES

Clas

sifie

r)VE

BAY

ES C

lass

ifier

)

Clas

s 1

(Cla

imin

g Ce

ntra

lity)

Clas

s 1

(Cla

imin

g Ce

ntra

lity)

Clas

s 2

(Mak

ing

topi

c ge

nera

lizat

ions

)Cl

ass

2 (M

akin

g to

pic

gene

raliz

atio

ns)

Clas

s 3

(Ind

icat

ing

a ga

p)

Clas

s 3

(Ind

icat

ing

a ga

p)

Clas

s 4

(Out

linin

g pu

rpos

e)Cl

ass

4 (O

utlin

ing

purp

ose)

Clas

s 5

(Ann

ounc

ing

prin

cipa

l fin

ding

s)

Clas

s 5

(Ann

ounc

ing

prin

cipa

l fin

ding

s)

Clas

s 6

(Eva

luat

ion

of r

esea

rch)

Cl

ass

6 (E

valu

atio

n of

res

earc

h)

Clas

s 1:

C

lass

1 P

rob.

Clas

s 1:

C

lass

1 P

rob.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Clas

s 2:

C

lass

2 P

rob.

Clas

s 2:

C

lass

2 P

rob.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Clas

s 3:

C

lass

3 P

rob.

Cl

ass

3:

Cla

ss 3

Pro

b.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Clas

s 4:

C

lass

4 P

rob.

Cl

ass

4:

Cla

ss 4

Pro

b.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Clas

s 5:

C

lass

5 P

rob.

Cl

ass

5:

Cla

ss 5

Pro

b.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Clas

s 6:

C

lass

6 P

rob.

Cl

ass

6:

Cla

ss 6

Pro

b.

Feat

. 1 p

rob.

Feat

. 2 p

rob.

Feat

. 3 p

rob.

Fe

at. 1

pro

b.

Fe

at. 2

pro

b.

Fe

at. 3

pro

b. ……

Page 16: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

16

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Clas

sify

the

str

uctu

re o

f ne

w t

ext

exam

ples

Clas

sify

the

str

uctu

re o

f ne

w t

ext

exam

ples

––Ch

oose

the

mos

t pr

obab

le c

lass

con

tain

ing

the

Choo

se t

he m

ost

prob

able

cla

ss c

onta

inin

g th

e fe

atur

es in

eac

h st

ep u

nit.

feat

ures

in e

ach

step

uni

t.““ 2

thi

s pa

per

is a

n ef

fort

in t

he s

ame

dire

ctio

n2

this

pap

er is

an

effo

rt in

the

sam

e di

rect

ion ““

(Ste

p 3.

1B

(Ste

p 3.

1B --

Anno

unci

ng P

rese

nt R

esea

rch

Anno

unci

ng P

rese

nt R

esea

rch ””

))

Feat

ures

Con

tain

ed in

Tra

inin

g D

ata

Feat

ures

Con

tain

ed in

Tra

inin

g D

ata

––pa

per

(c3)

, pa

per

(c3)

, th

is_p

aper

this

_pap

er(c

4), i

s (c

14)

this

(c1

8) t

he (

c39)

(c4)

, is

(c14

) th

is (

c18)

the

(c3

9)2

(c10

3)

2 (c

103)

is_a

nis

_an

(c36

4) in

(c5

71)

(c36

4) in

(c5

71)

Step

1.1

Pro

b.

=

Step

1.1

Pro

b.

=

-- 2.9

498

+

2.94

98 +

--7.

0449

+

7.04

49 +

--7.

0449

+

7.04

49 +

--4.

3368

+

4.33

68 +

……+

+

--4.

4058

=

4.40

58 =

--48

.769

048

.769

0St

ep 1

.2 P

rob.

=St

ep 1

.2 P

rob.

=-- 1

.839

8 +

1.

8398

+ --

7.48

99 +

7.

4899

+ --

7.48

99 +

7.

4899

+ --

3.85

23 +

3.

8523

+ ……

+

+ --

3.87

90 =

3.

8790

= --

45.5

972

45.5

972

Step

2.1

B Pr

ob.

=

Step

2.1

B Pr

ob.

=

-- 3.1

391

+

3.13

91 +

--6.

9157

+

6.91

57 +

--6.

9157

+

6.91

57 +

--4.

3507

+

4.35

07 +

……+

+

--4.

2076

=

4.20

76 =

--47

.082

647

.082

6St

ep 3

.1B

Prob

.

=

Step

3.1

B Pr

ob.

=

-- 1

.333

5 +

1.

3335

+ --

4.15

66 +

4.

1566

+ --

4.24

36 +

4.

2436

+ --

4.84

97 +

4.

8497

+ ……

+

+ --

3.91

69 =

3.

9169

= --

39.0

836

39.0

836

Step

3.2

Pro

b.

=

Step

3.2

Pro

b.

=

-- 1.8

398

+

1.83

98 +

--6.

3677

+

6.36

77 +

--6.

3677

+

6.36

77 +

--3.

6936

+

3.69

36 +

……+

+

--3.

7837

=

3.78

37 =

--40

.844

840

.844

8St

ep 3

.3 P

rob.

=St

ep 3

.3 P

rob.

=-- 1

.580

9 +

1.

5809

+ --

6.61

78 +

6.

6178

+ --

6.61

78 +

6.

6178

+ --

3.78

46 +

3.

7846

+ ……

+

+ --

4.05

28 =

4.

0528

= --

43.2

638

43.2

638

Mos

t Pr

obab

le S

tep

Mos

t Pr

obab

le S

tep

……

Page 17: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

17

Appl

icat

ion

of S

yste

m t

o Ap

plic

atio

n of

Sys

tem

to

Res

earc

h Ab

stra

cts

Res

earc

h Ab

stra

cts

Clas

sify

the

str

uctu

re o

f ne

w t

ext

exam

ples

Clas

sify

the

str

uctu

re o

f ne

w t

ext

exam

ples

––Ch

oose

the

mos

t pr

obab

le c

lass

con

tain

ing

the

Choo

se t

he m

ost

prob

able

cla

ss c

onta

inin

g th

e fe

atur

es in

eac

h st

ep u

nit.

feat

ures

in e

ach

step

uni

t.““ 2

thi

s pa

per

is a

n ef

fort

in t

he s

ame

dire

ctio

n2

this

pap

er is

an

effo

rt in

the

sam

e di

rect

ion ““

(Ste

p 3.

1B

(Ste

p 3.

1B --

Anno

unci

ng P

rese

nt R

esea

rch

Anno

unci

ng P

rese

nt R

esea

rch ””

))

Feat

ures

Con

tain

ed in

Tra

inin

g D

ata

Feat

ures

Con

tain

ed in

Tra

inin

g D

ata

––pa

per

(c3)

, pa

per

(c3)

, th

is_p

aper

this

_pap

er(c

4), i

s (c

14)

this

(c1

8) t

he (

c39)

(c4)

, is

(c14

) th

is (

c18)

the

(c3

9)2

(c10

3)

2 (c

103)

is_a

nis

_an

(c36

4) in

(c5

71)

(c36

4) in

(c5

71)

Step

1.1

Pro

b.

=

Step

1.1

Pro

b.

=

-- 2.9

498

+

2.94

98 +

--7.

0449

+

7.04

49 +

--7.

0449

+

7.04

49 +

--4.

3368

+

4.33

68 +

……+

+

--4.

4058

=

4.40

58 =

--48

.769

048

.769

0St

ep 1

.2 P

rob.

=St

ep 1

.2 P

rob.

=-- 1

.839

8 +

1.

8398

+ --

7.48

99 +

7.

4899

+ --

7.48

99 +

7.

4899

+ --

3.85

23 +

3.

8523

+ ……

+

+ --

3.87

90 =

3.

8790

= --

45.5

972

45.5

972

Step

2.1

B Pr

ob.

=

Step

2.1

B Pr

ob.

=

-- 3.1

391

+

3.13

91 +

--6.

9157

+

6.91

57 +

--6.

9157

+

6.91

57 +

--4.

3507

+

4.35

07 +

……+

+

--4.

2076

=

4.20

76 =

--47

.082

647

.082

6St

ep 3

.1B

Pro

b. =

St

ep 3

.1B

Pro

b. =

-- 1

.333

5 +

1

.333

5 +

--4

.15

66 +

4

.15

66 +

--4

.24

36 +

4

.24

36 +

--4

.84

97 +

4

.84

97 +

……+

+

--3

.91

69 =

3

.91

69 =

--3

9.0

836

39

.08

36St

ep 3

.2 P

rob.

=St

ep 3

.2 P

rob.

=-- 1

.839

8 +

1.

8398

+ --

6.36

77 +

6.

3677

+ --

6.36

77 +

6.

3677

+ --

3.69

36 +

3.

6936

+ ……

+

+ --

3.78

37 =

3.

7837

= --

40.8

448

40.8

448

Step

3.3

Pro

b.

=

Step

3.3

Pro

b.

=

-- 1.5

809

+

1.58

09 +

--6.

6178

+

6.61

78 +

--6.

6178

+

6.61

78 +

--3.

7846

+

3.78

46 +

……+

+

--4.

0528

=

4.05

28 =

--43

.263

843

.263

8

Mos

t Pr

obab

le S

tep

= h

M

ost

Prob

able

Ste

p =

h s

tep

3.1B

step

3.1

B=

=

--39

.083

639

.083

6(D

ecis

ion

is S

tep

3.1

B

(Dec

isio

n is

Ste

p 3

.1B

““A

nn

oun

cin

g P

rese

nt

Res

earc

hA

nn

oun

cin

g P

rese

nt

Res

earc

h”” ))

Page 18: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

18

Res

ults

(Cl

assi

ficat

ion

Accu

racy

)Res

ults

(Cl

assi

ficat

ion

Accu

racy

)

Clas

sific

atio

n Ac

cura

cy (

Ove

rall)

Clas

sific

atio

n Ac

cura

cy (

Ove

rall)

554

Step

Uni

ts u

sed

for

554

Step

Uni

ts u

sed

for

‘‘ trai

ning

trai

ning

’’ the

sys

tem

(80

% o

f en

tire

data

)th

e sy

stem

(80

% o

f en

tire

data

)13

8 St

ep U

nits

use

d fo

r 13

8 St

ep U

nits

use

d fo

r ‘‘ te

stin

gte

stin

g ’’th

e sy

stem

(20

% o

f en

tire

data

)th

e sy

stem

(20

% o

f en

tire

data

)

No.

of F

eatu

res

Acc

urac

y(R

aw F

requ

ency

)A

ccur

acy

(Inf

orm

atio

n G

ain)

2208

(all)

56 %

-10

0051

%70

%70

056

%70

%50

059

%69

%30

059

%69

%10

054

%-

Not

e: R

ando

m g

uess

ing

has

an a

ccur

acy

of 1

6.66

% (

NO

T 50

%!)

Not

e: R

ando

m g

uess

ing

has

an a

ccur

acy

of 1

6.66

% (

NO

T 50

%!)

Choo

sing

the

mos

t co

mm

on c

lass

= 2

6%

Choo

sing

the

mos

t co

mm

on c

lass

= 2

6%

Page 19: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

19

Res

ults

(Cl

assi

ficat

ion

Accu

racy

)Res

ults

(Cl

assi

ficat

ion

Accu

racy

)

Clas

sific

atio

n Ac

cura

cy (

Each

Ste

p U

nit)

Clas

sific

atio

n Ac

cura

cy (

Each

Ste

p U

nit)

Num

ber

of f

eatu

res

= 7

00N

umbe

r of

fea

ture

s =

700

Rank

ed b

y Ra

nked

by

Info

rmat

ion

Gai

nIn

form

atio

n G

ain

mea

sure

mea

sure

Accu

racy

(ov

eral

l) =

70%

Accu

racy

(ov

eral

l) =

70%

Clas

sSt

ep 1.

1St

ep 1.

2St

ep 2.

1bSt

ep 3

.1bSt

ep 3.

2St

ep 3.

3St

ep 1.

12

(43

%)

40

01

0St

ep 1.

20

17 (7

7 %

)0

04

1St

ep 2.

1b0

21

(17

%)

02

1St

ep 3.

1b0

00

34 (9

2 %

)3

0St

ep 3.

20

20

225

(66

%)

9St

ep 3.

30

10

28

17 (6

1 %

)

Not

e: C

lass

ifica

tions

cor

resp

ond

with

CAR

S M

odel

N

ote:

Cla

ssifi

catio

ns c

orre

spon

d w

ith C

ARS

Mod

el ‘‘ m

oves

mov

es’’

(Acc

urac

y=88

% w

hen

usin

g (A

ccur

acy=

88%

whe

n us

ing

‘‘ sec

ond

opin

ion

seco

nd o

pini

on’’ ))

Page 20: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

20

Res

ults

(In

the

cla

ssro

om)

Res

ults

(In

the

cla

ssro

om)

A A ‘‘ W

indo

ws

Win

dow

s ’’In

terf

ace

Inte

rfac

eTo

ena

ble

rese

arch

ers,

tea

cher

s an

d st

uden

ts t

o To

ena

ble

rese

arch

ers,

tea

cher

s an

d st

uden

ts t

o us

e th

e sy

stem

it n

eeds

to

be e

asily

acc

essi

ble

via

use

the

syst

em it

nee

ds t

o be

eas

ily a

cces

sibl

e vi

a a a

‘‘ win

dow

sw

indo

ws ’’

inte

rfac

ein

terf

ace

A A ‘‘ w

indo

ws

win

dow

s ’’sy

stem

has

bee

n bu

ilt u

sing

the

sy

stem

has

bee

n bu

ilt u

sing

the

pr

ogra

mm

ing

lang

uage

PER

L 5.

6 an

d PE

RL/

prog

ram

min

g la

ngua

ge P

ERL

5.6

and

PERL

/ Tk

Tk

Page 21: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

21

Res

ults

(In

the

cla

ssro

om)

Res

ults

(In

the

cla

ssro

om)

Mat

eria

ls S

elec

tion

by N

onM

ater

ials

Sel

ectio

n by

Non

-- Nat

ive

Teac

her

Nat

ive

Teac

her

““ The

dec

isio

ns a

re f

ast.

The

deci

sion

s ar

e fa

st. ””

““ It

is s

impl

e an

d ea

sy t

o co

mpl

ete

the

task

.It

is s

impl

e an

d ea

sy t

o co

mpl

ete

the

task

. ””““ I

rel

y to

o m

uch

on t

he s

oftw

are

and

stop

fee

ling

I re

ly t

oo m

uch

on t

he s

oftw

are

and

stop

fee

ling

like

doin

g th

e an

alys

is m

ysel

f.lik

e do

ing

the

anal

ysis

mys

elf.

””

Com

men

tsCo

mm

ents

1/7

1/7

2/7

2/7

Erro

rsEr

rors

28 m

in.

28 m

in.

(1 m

in. f

or a

naly

sis

plus

(1

min

. for

ana

lysi

s pl

us

time

to c

heck

res

ults

)tim

e to

che

ck r

esul

ts)

100

min

.10

0 m

in.

Tim

e to

com

plet

e ta

sks

Tim

e to

com

plet

e ta

sks

Usi

ng S

yste

mU

sing

Sys

tem

By h

and

By h

and

Sele

ctio

n of

7 t

exts

Se

lect

ion

of 7

tex

ts

from

10

text

cor

pus

from

10

text

cor

pus

Page 22: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

22

Res

ults

(In

the

cla

ssro

om)

Res

ults

(In

the

cla

ssro

om)

Text

Ana

lysi

s by

Non

Text

Ana

lysi

s by

Non

-- Nat

ive

Stud

ent

Nat

ive

Stud

ent

““ ItIt’’ s

ver

y fa

st.

s ve

ry f

ast.

””““ T

he s

truc

ture

is n

ow v

ery

clea

r.Th

e st

ruct

ure

is n

ow v

ery

clea

r.””

““ The

sys

tem

has

cle

arly

ana

lyze

d th

e st

ruct

ure,

Th

e sy

stem

has

cle

arly

ana

lyze

d th

e st

ruct

ure,

w

hat

you

shou

ld d

o is

cor

rect

onl

y th

e pa

rt t

hat

is

wha

t yo

u sh

ould

do

is c

orre

ct o

nly

the

part

tha

t is

st

rang

e. S

o th

e w

ork

is li

ttle

.st

rang

e. S

o th

e w

ork

is li

ttle

. ””

Com

men

tsCo

mm

ents

0/4

0/4

2/4

2/4

Erro

rsEr

rors

15 m

in.

15 m

in.

(1 m

in. f

or a

naly

sis

plus

(1

min

. for

ana

lysi

s pl

us

time

to c

heck

res

ults

)tim

e to

che

ck r

esul

ts)

38 m

in.

38 m

in.

Tim

e to

com

plet

e ta

sks

Tim

e to

com

plet

e ta

sks

Usi

ng S

yste

mU

sing

Sys

tem

By h

and

By h

and

Sele

ctio

n of

4 t

exts

Se

lect

ion

of 4

tex

ts

from

10

text

cor

pus

from

10

text

cor

pus

Page 23: Lashkia Automatic Identification of Organizational Structure in … · 2014. 9. 1. · 1 Automatic Identification of Organizational Structure in Writing . using Machine Learning

23

Conc

lusi

ons

Conc

lusi

ons

A co

mpu

ter

syst

em w

as d

evel

oped

to

A co

mpu

ter

syst

em w

as d

evel

oped

to

anal

yze

text

str

uctu

rean

alyz

e te

xt s

truc

ture

Lear

ning

met

hod:

Le

arni

ng m

etho

d: ‘‘ S

uper

vise

d Le

arni

ngSu

perv

ised

Lea

rnin

g ’’Ac

cura

cy 7

0% (

88%

whe

n us

ing

seco

nd o

pini

on)

Accu

racy

70%

(88

% w

hen

usin

g se

cond

opi

nion

)

Syst

em e

rror

s co

rres

pond

ed w

ith C

ARS

Syst

em e

rror

s co

rres

pond

ed w

ith C

ARS

Mod

el

Mod

el ‘‘ m

oves

mov

es’’

Effe

ctiv

e in

the

cla

ssro

om f

or u

se b

y Ef

fect

ive

in t

he c

lass

room

for

use

by

teac

hers

and

stu

dent

ste

ache

rs a

nd s

tude

nts

Run

s in

Win

dow

s en

viro

nmen

tRun

s in

Win

dow

s en

viro

nmen

t