lashkia automatic identification of organizational structure in … · 2014. 9. 1. · 1 automatic...
TRANSCRIPT
1
Auto
mat
ic I
dent
ifica
tion
of
Auto
mat
ic I
dent
ifica
tion
of
Org
aniz
atio
nal S
truc
ture
in W
ritin
g O
rgan
izat
iona
l Str
uctu
re in
Writ
ing
usin
g M
achi
ne L
earn
ing
usin
g M
achi
ne L
earn
ing
Laur
ence
Ant
hony
and
Geo
rge
V.
Laur
ence
Ant
hony
and
Geo
rge
V. L
ashk
iaLa
shki
aD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gD
ept.
of C
ompu
ter
Scie
nce,
Fac
ulty
of E
ngin
eerin
gO
kaya
ma
Univ
. of S
cien
ce, 1
Oka
yam
a Un
iv. o
f Sci
ence
, 1-- 1
1 Ri
dai
Rida
i -- chocho ,
Oka
yam
a, O
kaya
ma
anth
ony@
ice.
ous.
ac.jp
anth
ony@
ice.
ous.
ac.jp
lash
kia@
ice.
ous.
ac.jp
lash
kia@
ice.
ous.
ac.jp
http
://a
ntpc
1.ic
e.ou
s.ac
.jpht
tp:/
/ant
pc1.
ice.
ous.
ac.jp
2
Pres
enta
tion
Out
line
Pres
enta
tion
Out
line
Back
grou
ndBa
ckgr
ound
Rese
arch
Aim
Rese
arch
Aim
Syst
em D
esig
n (O
verv
iew
)Sy
stem
Des
ign
(Ove
rvie
w)
Appl
icat
ion
to R
esea
rch
Abst
ract
sAp
plic
atio
n to
Res
earc
h Ab
stra
cts
Resu
lts (
Accu
racy
)Re
sults
(Ac
cura
cy)
Res
ults
(Ef
fect
iven
ess
in t
he C
lass
room
)Res
ults
(Ef
fect
iven
ess
in t
he C
lass
room
)So
ftw
are
Dem
onst
ratio
nSo
ftw
are
Dem
onst
ratio
nCo
nclu
sion
sCo
nclu
sion
s
3
Back
grou
ndBa
ckgr
ound
Impo
rtan
ce o
f Te
xt S
truc
ture
Impo
rtan
ce o
f Te
xt S
truc
ture
Swal
es (
1981
, 199
0),
Swal
es (
1981
, 199
0),
Carr
ell
Carr
ell (
1982
)(1
982)
Hin
ds (
1982
, 198
3),
Hin
ds (
1982
, 198
3),
Hoe
yH
oey
(199
4), W
inte
r (1
994)
(199
4), W
inte
r (1
994)
Stud
ies
on T
ext
Stru
ctur
eSt
udie
s on
Tex
t St
ruct
ure
TITL
ESTI
TLES
--D
udle
yD
udle
y --Ev
ans
(199
4), A
ntho
ny (
2001
)Ev
ans
(199
4), A
ntho
ny (
2001
)A
BST
RA
CTS
AB
STR
AC
TS--
Ayer
s (1
993)
, Pos
tegu
illo
(199
6)Ay
ers
(199
3), P
oste
guill
o (1
996)
INST
RO
DU
CTI
ON
SIN
STR
OD
UC
TIO
NS
--Sw
ales
(19
90),
Ant
hony
(19
99)
Swal
es (
1990
), A
ntho
ny (
1999
)D
ISC
USS
ION
SD
ISC
USS
ION
S--
Hop
kins
& D
udle
yH
opki
ns &
Dud
ley --
Evan
s (1
988)
Evan
s (1
988)
PA
TEN
TSP
ATE
NTS
--Ba
zerm
an (
1994
)Ba
zerm
an (
1994
)G
RA
NT
PR
OP
OSA
LSG
RA
NT
PR
OP
OSA
LS--
Conn
or &
Co
nnor
& M
aura
nen
Mau
rane
n(1
999)
(199
9)LE
GA
L W
RIT
ING
LEG
AL
WR
ITIN
G--
Bhat
ia (
1993
)Bh
atia
(19
93)
4
Back
grou
ndBa
ckgr
ound
Prob
lem
s w
ith A
naly
zing
Tex
t St
ruct
ure
Prob
lem
s w
ith A
naly
zing
Tex
t St
ruct
ure
We
need
a la
rge
corp
us o
f te
xt d
ata
We
need
a la
rge
corp
us o
f te
xt d
ata
(The
tex
t da
ta m
ust
(The
tex
t da
ta m
ust
‘‘ ACU
RATE
LYAC
URA
TELY
’’ rep
rese
nt w
hat
we
hope
to
repr
esen
t w
hat
we
hope
to
stud
y)st
udy)
We
need
a lo
t of
res
earc
h tim
eW
e ne
ed a
lot
of r
esea
rch
time
(We
mus
t an
alyz
e a
lot
of t
exts
)(W
e m
ust
anal
yze
a lo
t of
tex
ts)
We
need
goo
d va
lidat
ion
and
relia
bilit
y te
sts
We
need
goo
d va
lidat
ion
and
relia
bilit
y te
sts
(Bec
ause
eva
luat
ing
stru
ctur
e ca
n be
ver
y su
bjec
tive)
(Bec
ause
eva
luat
ing
stru
ctur
e ca
n be
ver
y su
bjec
tive)
Mos
t Te
xt S
truc
ture
Stu
dies
are
Mos
t Te
xt S
truc
ture
Stu
dies
are
‘‘ Sm
all S
cale
Smal
l Sca
le’’
5
Back
grou
ndBa
ckgr
ound
Hen
ry e
t al
. (20
01)
Hen
ry e
t al
. (20
01)
40 A
pplic
atio
n Le
tter
s40
App
licat
ion
Lett
ers
Taro
neTa
rone
et a
l. (2
000)
et a
l. (2
000)
2 Ph
ysic
s Res
earc
h Ar
ticle
s2
Phys
ics
Res
earc
h Ar
ticle
s
Conn
or e
t al
. (19
99)
Conn
or e
t al
. (19
99)
34 G
rant
Pro
posa
ls34
Gra
nt P
ropo
sals
Will
iam
s (1
999)
Will
iam
s (1
999)
5 M
edic
al R
esea
rch
Artic
les
5 M
edic
al R
esea
rch
Artic
les
Anth
ony
(199
9)An
thon
y (1
999)
12 C
ompu
ter
Scie
nce
Rese
arch
Art
icle
12
Com
pute
r Sc
ienc
e Re
sear
ch A
rtic
le
Intr
oduc
tions
Intr
oduc
tions
6
Res
earc
h Ai
mRes
earc
h Ai
m
Dev
elop
a C
ompu
ter
Syst
em t
o Pr
oces
s D
evel
op a
Com
pute
r Sy
stem
to
Proc
ess
Text
s an
d An
alyz
e Te
xt S
truc
ture
Te
xts
and
Anal
yze
Text
Str
uctu
re
Auto
mat
ical
lyAu
tom
atic
ally
––A A
‘‘ Mac
hine
Lea
rnin
g Sy
stem
Mac
hine
Lea
rnin
g Sy
stem
’’fo
r te
xt
for
text
st
ruct
ure
stru
ctur
eEa
sy t
o pr
oces
s a
larg
e co
rpus
of
text
dat
aEa
sy t
o pr
oces
s a
larg
e co
rpus
of
text
dat
aFa
stFa
stTh
e an
alyt
ic p
roce
ss w
ould
be
clea
rly d
efin
edTh
e an
alyt
ic p
roce
ss w
ould
be
clea
rly d
efin
edEa
sy t
o te
st t
he r
elia
bilit
y an
d va
lidity
Ea
sy t
o te
st t
he r
elia
bilit
y an
d va
lidity
7
Syst
em D
esig
n (O
verv
iew
)Sy
stem
Des
ign
(Ove
rvie
w)
Mac
hine
Lea
rnin
g: U
nsup
ervi
sed
? M
achi
ne L
earn
ing:
Uns
uper
vise
d ?
Supe
rvis
ed L
earn
ing
Supe
rvis
ed L
earn
ing ’’
??In
Sup
ervi
sed
Lear
ning
,In
Sup
ervi
sed
Lear
ning
,G
ive
the
syst
em a
str
uctu
ral m
odel
(se
t of
cla
sses
)G
ive
the
syst
em a
str
uctu
ral m
odel
(se
t of
cla
sses
)G
ive
the
syst
em e
xam
ples
of
the
mod
elG
ive
the
syst
em e
xam
ples
of
the
mod
elTe
ll th
e sy
stem
wha
t Te
ll th
e sy
stem
wha
t ‘‘ fe
atur
esfe
atur
es’’ i
n th
e ex
ampl
es a
re
in t
he e
xam
ples
are
im
port
ant
impo
rtan
tD
efin
e a
rela
tion
betw
een
the
clas
ses
and
the
Def
ine
a re
latio
n be
twee
n th
e cl
asse
s an
d th
e fe
atur
esfe
atur
esCl
assi
fy n
ew t
ext
exam
ples
by
com
parin
g its
Cl
assi
fy n
ew t
ext
exam
ples
by
com
parin
g its
fe
atur
es w
ith t
hose
in e
ach
clas
sfe
atur
es w
ith t
hose
in e
ach
clas
s
8
Syst
em D
esig
n (O
verv
iew
)Sy
stem
Des
ign
(Ove
rvie
w)
Prob
lem
sPr
oble
ms
We
need
a
We
need
a ‘‘ g
ood
good
’’ mod
el o
f st
ruct
ure
mod
el o
f st
ruct
ure
––Bu
t th
ere
are
man
y m
odel
s of
str
uctu
re in
the
lite
ratu
reBu
t th
ere
are
man
y m
odel
s of
str
uctu
re in
the
lite
ratu
re
We
need
a s
et o
f W
e ne
ed a
set
of
‘‘ labe
led
exam
ples
labe
led
exam
ples
’’––
But
man
y sy
stem
s w
ork
wel
l with
onl
y a
few
labe
led
But
man
y sy
stem
s w
ork
wel
l with
onl
y a
few
labe
led
exam
ples
exam
ples
We
need
a
We
need
a ‘‘ g
ood
good
’’ set
of
feat
ures
set
of f
eatu
res
––Bu
t la
ngua
ge c
onta
ins
a LO
T of
noi
se w
ords
!Bu
t la
ngua
ge c
onta
ins
a LO
T of
noi
se w
ords
!(e
.g. a
, the
, of,
in, a
t, b
ut?,
tho
ugh?
, (e
.g. a
, the
, of,
in, a
t, b
ut?,
tho
ugh?
, ……
))––
Build
ing
a lis
t of
fea
ture
s by
han
d is
infe
asib
leBu
ildin
g a
list
of f
eatu
res
by h
and
is in
feas
ible
We
need
a
We
need
a ‘‘ g
ood
good
’’ rel
atio
n be
twee
n th
e cl
asse
s an
d re
latio
n be
twee
n th
e cl
asse
s an
d th
e fe
atur
esth
e fe
atur
es
9
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Giv
e th
e sy
stem
a s
truc
ture
mod
el:
Giv
e th
e sy
stem
a s
truc
ture
mod
el:
‘‘ Mod
ifie
dM
odif
ied ’’
CAR
S M
odel
(Sw
ales
, 199
0: A
ntho
ny, 1
999)
CAR
S M
odel
(Sw
ales
, 199
0: A
ntho
ny, 1
999)
Mov
e 1
Mov
e 1
Esta
blis
hing
Es
tabl
ishi
ng
1.1
1.1
Cla
imin
g c
entr
alit
yC
laim
ing
cen
tral
ity
a Te
rrito
ry
a Te
rrito
ry
1.2
1.2
Mak
ing
top
ic g
ener
aliz
atio
ns
Mak
ing
top
ic g
ener
aliz
atio
ns
1.3
1.3
Rev
iew
ing
item
s of
pre
viou
s re
sear
chR
evie
win
g it
ems
of p
revi
ous
rese
arch
Mov
e 2
Mov
e 2
Esta
blis
hing
Es
tabl
ishi
ng
2.1
A2
.1A
Cou
nte
r cl
amin
gC
oun
ter
clam
ing
a ni
che
a ni
che
2.1
B
2.1
B
Ind
icat
ing
a g
apIn
dic
atin
g a
gap
2.1
C
2.1
C
Qu
esti
on r
aisi
ng
Qu
esti
on r
aisi
ng
2.1
D
2.1
D
Con
tin
uin
g a
tra
diti
onC
onti
nu
ing
a t
radi
tion
Mov
e 3
Mov
e 3
Occ
upyi
ng
Occ
upyi
ng
3.1
A
3.1
A
Ou
tlin
ing
purp
ose
Ou
tlin
ing
purp
ose
the
nich
eth
e ni
che
3.1
B
3.1
B
An
nou
nci
ng
pres
ent
rese
arch
An
nou
nci
ng
pres
ent
rese
arch
3.2
3.2
An
nou
nci
ng
prin
cipa
l fin
din
gsA
nn
oun
cin
g pr
inci
pal f
ind
ings
3.3
3.3
Eval
uat
ion
of
rese
arch
Eval
uat
ion
of
rese
arch
3.4
3.4
Ind
icat
ing
RA
str
uct
ure
Ind
icat
ing
RA
str
uct
ure
10
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
Giv
e th
e sy
stem
exa
mpl
es o
f th
e m
odel
100
Abst
ract
s (I
EEE
Tran
s. o
n PD
S) d
ivid
ed in
to10
0 Ab
stra
cts
(IEE
E Tr
ans.
on
PDS)
div
ided
into
692
labe
led
69
2 la
bele
d ‘‘ S
teps
Uni
tsSt
eps
Uni
ts’’ (
only
exa
mpl
es f
rom
6 c
lass
es)
(onl
y ex
ampl
es f
rom
6 c
lass
es)
554
Step
Uni
ts (
80%
) us
ed f
or
554
Step
Uni
ts (
80%
) us
ed f
or ‘‘ t
rain
ing
trai
ning
’’ the
sys
tem
the
syst
em13
8 St
ep U
nits
(20
%)
used
for
13
8 St
ep U
nits
(20
%)
used
for
‘‘ tes
ting
test
ing ’’
the
syst
emth
e sy
stem
Tell
the
syst
em w
hat
Tell
the
syst
em w
hat
‘‘ feat
ures
feat
ures
’’ to
look
at
to lo
ok a
tAl
l wor
d cl
uste
rs (
chun
ks)
up t
o 5
wor
ds lo
ngAl
l wor
d cl
uste
rs (
chun
ks)
up t
o 5
wor
ds lo
ngPo
sitio
n of
ste
p un
it in
abs
trac
t (i.
e. 1
Posi
tion
of s
tep
unit
in a
bstr
act
(i.e.
1stst
line,
2lin
e, 2
ndndlin
e,
line,
……))
(Red
uce
(Red
uce
‘‘ Noi
seN
oise
’’ in
Feat
ures
)in
Fea
ture
s)Au
tom
atic
ally
ran
k w
ords
by
Auto
mat
ical
ly r
ank
wor
ds b
y ‘‘ im
port
ance
impo
rtan
ce’’ u
sing
:us
ing:
raw
fre
quen
cy,
raw
fre
quen
cy,
Info
rmat
ion
Gai
nIn
form
atio
n G
ain
Use
onl
y hi
gh r
anke
d w
ords
Use
onl
y hi
gh r
anke
d w
ords
11
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
““ In
this
pap
er, w
e pr
opos
e a
new
sys
tem
.In
thi
s pa
per,
we
prop
ose
a ne
w s
yste
m. ””
––1
wor
d ch
unks
1 w
ord
chun
ksin
/ th
is/
pape
r/ w
e/ p
ropo
se/
a/ n
ew/
syst
emin
/ th
is/
pape
r/ w
e/ p
ropo
se/
a/ n
ew/
syst
em
––2
wor
d ch
unks
2 w
ord
chun
ksin
thi
s/ t
his
pape
r /
pape
r w
e/ w
e pr
opos
e/
in t
his/
thi
s pa
per
/ pa
per
we/
we
prop
ose/
pr
opos
e a/
a n
ew/
new
sys
tem
prop
ose
a/ a
new
/ ne
w s
yste
m
––3
wor
ds c
hunk
s3
wor
ds c
hunk
sin
thi
s pa
per
/ th
is p
aper
we/
pap
er w
e pr
opos
e/in
thi
s pa
per
/ th
is p
aper
we/
pap
er w
e pr
opos
e/w
e pr
opos
e a/
pro
pose
a n
ew/
a ne
w s
yste
mw
e pr
opos
e a/
pro
pose
a n
ew/
a ne
w s
yste
m
––……
12
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
““ In
this
pap
er, w
e pr
opos
e a
new
sys
tem
.In
thi
s pa
per,
we
prop
ose
a ne
w s
yste
m. ””
––1
wor
d ch
unks
1 w
ord
chun
ksinin
/ / th
isth
is/ /
pape
rpa
per /
/ w
ew
e / / pr
opos
epr
opos
e / / aa /
/ ne
wne
w/ /
syst
emsy
stem
––2
wor
d ch
unks
2 w
ord
chun
ksin
thi
s/
in t
his/
thi
s pa
per
this
pap
er/
pape
r w
e/
/ pa
per
we/
we
prop
ose
we
prop
ose /
/ pr
opos
e a/
a n
ew/
prop
ose
a/ a
new
/ ne
w s
yste
mne
w s
yste
m
––3
wor
d ch
unks
3 w
ord
chun
ksin
thi
s pa
per
in t
his
pape
r/
this
pap
er w
e/ p
aper
we
prop
ose/
/ th
is p
aper
we/
pap
er w
e pr
opos
e/w
e pr
opos
e a
we
prop
ose
a / p
ropo
se a
new
/ /
prop
ose
a ne
w/
a ne
w s
yste
ma
new
sys
tem
––……
13
Info
rmat
ion
Gai
n (I
G)
Info
rmat
ion
Gai
n (I
G)
jj
pp
DEntropy
c j2
log
)(
1−
≡∑ =
whe
re
whe
re pp
jjis
the
pro
port
ion
of d
ata
(is
the
pro
port
ion
of d
ata
( DD)
in a
cla
ss
) in
a c
lass
jjfr
om
from
th
e se
t of
cla
sses
th
e se
t of
cla
sses
CC..
)(
||
||
)(
),
()
(v
v
wValues
vD
Entropy
DDD
Entropy
wD
Gain
∑∈
−≡
whe
re
whe
re V
alue
sVa
lues
(w(w))is
the
set
of
all p
ossi
ble
valu
es f
or
is t
he s
et o
f al
l pos
sibl
e va
lues
for
w
ord
wor
d w
,w
,an
d an
d DD
vvis
the
sub
set
of
is t
he s
ubse
t of
DDfo
r w
hich
wor
d fo
r w
hich
wor
d ww
has
has
a va
lue
a va
lue
vv ..
14
Info
rmat
ion
Gai
n (I
G)
Info
rmat
ion
Gai
n (I
G)
Proc
ess
310
task
_mig
ratio
n2
9di
ffic
ult
18
not
and
7of
ten
is6
trans
mitt
ing
of5
is_o
ften
in4
diff
icul
t_to
to3
2_ho
wev
era
2ho
wev
erth
e1
Info
rmat
ion
Gai
n (I
G)
Raw
Fre
quen
cyR
ank
15
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Def
ine
a re
latio
n be
twee
n fe
atur
es a
nd c
lass
esD
efin
e a
rela
tion
betw
een
feat
ures
and
cla
sses
––U
se p
roba
bilit
y of
eac
h cl
ass
and
the
prob
abili
ty o
f U
se p
roba
bilit
y of
eac
h cl
ass
and
the
prob
abili
ty o
f fe
atur
es (
clus
ters
) be
ing
in e
ach
clas
sfe
atur
es (
clus
ters
) be
ing
in e
ach
clas
s(( A
NA
A N
AÏÏ V
E BA
YES
Clas
sifie
r)VE
BAY
ES C
lass
ifier
)
Clas
s 1
(Cla
imin
g Ce
ntra
lity)
Clas
s 1
(Cla
imin
g Ce
ntra
lity)
Clas
s 2
(Mak
ing
topi
c ge
nera
lizat
ions
)Cl
ass
2 (M
akin
g to
pic
gene
raliz
atio
ns)
Clas
s 3
(Ind
icat
ing
a ga
p)
Clas
s 3
(Ind
icat
ing
a ga
p)
Clas
s 4
(Out
linin
g pu
rpos
e)Cl
ass
4 (O
utlin
ing
purp
ose)
Clas
s 5
(Ann
ounc
ing
prin
cipa
l fin
ding
s)
Clas
s 5
(Ann
ounc
ing
prin
cipa
l fin
ding
s)
Clas
s 6
(Eva
luat
ion
of r
esea
rch)
Cl
ass
6 (E
valu
atio
n of
res
earc
h)
Clas
s 1:
C
lass
1 P
rob.
Clas
s 1:
C
lass
1 P
rob.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
Clas
s 2:
C
lass
2 P
rob.
Clas
s 2:
C
lass
2 P
rob.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
Clas
s 3:
C
lass
3 P
rob.
Cl
ass
3:
Cla
ss 3
Pro
b.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
Clas
s 4:
C
lass
4 P
rob.
Cl
ass
4:
Cla
ss 4
Pro
b.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
Clas
s 5:
C
lass
5 P
rob.
Cl
ass
5:
Cla
ss 5
Pro
b.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
Clas
s 6:
C
lass
6 P
rob.
Cl
ass
6:
Cla
ss 6
Pro
b.
Feat
. 1 p
rob.
Feat
. 2 p
rob.
Feat
. 3 p
rob.
Fe
at. 1
pro
b.
Fe
at. 2
pro
b.
Fe
at. 3
pro
b. ……
16
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Clas
sify
the
str
uctu
re o
f ne
w t
ext
exam
ples
Clas
sify
the
str
uctu
re o
f ne
w t
ext
exam
ples
––Ch
oose
the
mos
t pr
obab
le c
lass
con
tain
ing
the
Choo
se t
he m
ost
prob
able
cla
ss c
onta
inin
g th
e fe
atur
es in
eac
h st
ep u
nit.
feat
ures
in e
ach
step
uni
t.““ 2
thi
s pa
per
is a
n ef
fort
in t
he s
ame
dire
ctio
n2
this
pap
er is
an
effo
rt in
the
sam
e di
rect
ion ““
(Ste
p 3.
1B
(Ste
p 3.
1B --
Anno
unci
ng P
rese
nt R
esea
rch
Anno
unci
ng P
rese
nt R
esea
rch ””
))
Feat
ures
Con
tain
ed in
Tra
inin
g D
ata
Feat
ures
Con
tain
ed in
Tra
inin
g D
ata
––pa
per
(c3)
, pa
per
(c3)
, th
is_p
aper
this
_pap
er(c
4), i
s (c
14)
this
(c1
8) t
he (
c39)
(c4)
, is
(c14
) th
is (
c18)
the
(c3
9)2
(c10
3)
2 (c
103)
is_a
nis
_an
(c36
4) in
(c5
71)
(c36
4) in
(c5
71)
Step
1.1
Pro
b.
=
Step
1.1
Pro
b.
=
-- 2.9
498
+
2.94
98 +
--7.
0449
+
7.04
49 +
--7.
0449
+
7.04
49 +
--4.
3368
+
4.33
68 +
……+
+
--4.
4058
=
4.40
58 =
--48
.769
048
.769
0St
ep 1
.2 P
rob.
=St
ep 1
.2 P
rob.
=-- 1
.839
8 +
1.
8398
+ --
7.48
99 +
7.
4899
+ --
7.48
99 +
7.
4899
+ --
3.85
23 +
3.
8523
+ ……
+
+ --
3.87
90 =
3.
8790
= --
45.5
972
45.5
972
Step
2.1
B Pr
ob.
=
Step
2.1
B Pr
ob.
=
-- 3.1
391
+
3.13
91 +
--6.
9157
+
6.91
57 +
--6.
9157
+
6.91
57 +
--4.
3507
+
4.35
07 +
……+
+
--4.
2076
=
4.20
76 =
--47
.082
647
.082
6St
ep 3
.1B
Prob
.
=
Step
3.1
B Pr
ob.
=
-- 1
.333
5 +
1.
3335
+ --
4.15
66 +
4.
1566
+ --
4.24
36 +
4.
2436
+ --
4.84
97 +
4.
8497
+ ……
+
+ --
3.91
69 =
3.
9169
= --
39.0
836
39.0
836
Step
3.2
Pro
b.
=
Step
3.2
Pro
b.
=
-- 1.8
398
+
1.83
98 +
--6.
3677
+
6.36
77 +
--6.
3677
+
6.36
77 +
--3.
6936
+
3.69
36 +
……+
+
--3.
7837
=
3.78
37 =
--40
.844
840
.844
8St
ep 3
.3 P
rob.
=St
ep 3
.3 P
rob.
=-- 1
.580
9 +
1.
5809
+ --
6.61
78 +
6.
6178
+ --
6.61
78 +
6.
6178
+ --
3.78
46 +
3.
7846
+ ……
+
+ --
4.05
28 =
4.
0528
= --
43.2
638
43.2
638
Mos
t Pr
obab
le S
tep
Mos
t Pr
obab
le S
tep
……
17
Appl
icat
ion
of S
yste
m t
o Ap
plic
atio
n of
Sys
tem
to
Res
earc
h Ab
stra
cts
Res
earc
h Ab
stra
cts
Clas
sify
the
str
uctu
re o
f ne
w t
ext
exam
ples
Clas
sify
the
str
uctu
re o
f ne
w t
ext
exam
ples
––Ch
oose
the
mos
t pr
obab
le c
lass
con
tain
ing
the
Choo
se t
he m
ost
prob
able
cla
ss c
onta
inin
g th
e fe
atur
es in
eac
h st
ep u
nit.
feat
ures
in e
ach
step
uni
t.““ 2
thi
s pa
per
is a
n ef
fort
in t
he s
ame
dire
ctio
n2
this
pap
er is
an
effo
rt in
the
sam
e di
rect
ion ““
(Ste
p 3.
1B
(Ste
p 3.
1B --
Anno
unci
ng P
rese
nt R
esea
rch
Anno
unci
ng P
rese
nt R
esea
rch ””
))
Feat
ures
Con
tain
ed in
Tra
inin
g D
ata
Feat
ures
Con
tain
ed in
Tra
inin
g D
ata
––pa
per
(c3)
, pa
per
(c3)
, th
is_p
aper
this
_pap
er(c
4), i
s (c
14)
this
(c1
8) t
he (
c39)
(c4)
, is
(c14
) th
is (
c18)
the
(c3
9)2
(c10
3)
2 (c
103)
is_a
nis
_an
(c36
4) in
(c5
71)
(c36
4) in
(c5
71)
Step
1.1
Pro
b.
=
Step
1.1
Pro
b.
=
-- 2.9
498
+
2.94
98 +
--7.
0449
+
7.04
49 +
--7.
0449
+
7.04
49 +
--4.
3368
+
4.33
68 +
……+
+
--4.
4058
=
4.40
58 =
--48
.769
048
.769
0St
ep 1
.2 P
rob.
=St
ep 1
.2 P
rob.
=-- 1
.839
8 +
1.
8398
+ --
7.48
99 +
7.
4899
+ --
7.48
99 +
7.
4899
+ --
3.85
23 +
3.
8523
+ ……
+
+ --
3.87
90 =
3.
8790
= --
45.5
972
45.5
972
Step
2.1
B Pr
ob.
=
Step
2.1
B Pr
ob.
=
-- 3.1
391
+
3.13
91 +
--6.
9157
+
6.91
57 +
--6.
9157
+
6.91
57 +
--4.
3507
+
4.35
07 +
……+
+
--4.
2076
=
4.20
76 =
--47
.082
647
.082
6St
ep 3
.1B
Pro
b. =
St
ep 3
.1B
Pro
b. =
-- 1
.333
5 +
1
.333
5 +
--4
.15
66 +
4
.15
66 +
--4
.24
36 +
4
.24
36 +
--4
.84
97 +
4
.84
97 +
……+
+
--3
.91
69 =
3
.91
69 =
--3
9.0
836
39
.08
36St
ep 3
.2 P
rob.
=St
ep 3
.2 P
rob.
=-- 1
.839
8 +
1.
8398
+ --
6.36
77 +
6.
3677
+ --
6.36
77 +
6.
3677
+ --
3.69
36 +
3.
6936
+ ……
+
+ --
3.78
37 =
3.
7837
= --
40.8
448
40.8
448
Step
3.3
Pro
b.
=
Step
3.3
Pro
b.
=
-- 1.5
809
+
1.58
09 +
--6.
6178
+
6.61
78 +
--6.
6178
+
6.61
78 +
--3.
7846
+
3.78
46 +
……+
+
--4.
0528
=
4.05
28 =
--43
.263
843
.263
8
Mos
t Pr
obab
le S
tep
= h
M
ost
Prob
able
Ste
p =
h s
tep
3.1B
step
3.1
B=
=
--39
.083
639
.083
6(D
ecis
ion
is S
tep
3.1
B
(Dec
isio
n is
Ste
p 3
.1B
““A
nn
oun
cin
g P
rese
nt
Res
earc
hA
nn
oun
cin
g P
rese
nt
Res
earc
h”” ))
18
Res
ults
(Cl
assi
ficat
ion
Accu
racy
)Res
ults
(Cl
assi
ficat
ion
Accu
racy
)
Clas
sific
atio
n Ac
cura
cy (
Ove
rall)
Clas
sific
atio
n Ac
cura
cy (
Ove
rall)
554
Step
Uni
ts u
sed
for
554
Step
Uni
ts u
sed
for
‘‘ trai
ning
trai
ning
’’ the
sys
tem
(80
% o
f en
tire
data
)th
e sy
stem
(80
% o
f en
tire
data
)13
8 St
ep U
nits
use
d fo
r 13
8 St
ep U
nits
use
d fo
r ‘‘ te
stin
gte
stin
g ’’th
e sy
stem
(20
% o
f en
tire
data
)th
e sy
stem
(20
% o
f en
tire
data
)
No.
of F
eatu
res
Acc
urac
y(R
aw F
requ
ency
)A
ccur
acy
(Inf
orm
atio
n G
ain)
2208
(all)
56 %
-10
0051
%70
%70
056
%70
%50
059
%69
%30
059
%69
%10
054
%-
Not
e: R
ando
m g
uess
ing
has
an a
ccur
acy
of 1
6.66
% (
NO
T 50
%!)
Not
e: R
ando
m g
uess
ing
has
an a
ccur
acy
of 1
6.66
% (
NO
T 50
%!)
Choo
sing
the
mos
t co
mm
on c
lass
= 2
6%
Choo
sing
the
mos
t co
mm
on c
lass
= 2
6%
19
Res
ults
(Cl
assi
ficat
ion
Accu
racy
)Res
ults
(Cl
assi
ficat
ion
Accu
racy
)
Clas
sific
atio
n Ac
cura
cy (
Each
Ste
p U
nit)
Clas
sific
atio
n Ac
cura
cy (
Each
Ste
p U
nit)
Num
ber
of f
eatu
res
= 7
00N
umbe
r of
fea
ture
s =
700
Rank
ed b
y Ra
nked
by
Info
rmat
ion
Gai
nIn
form
atio
n G
ain
mea
sure
mea
sure
Accu
racy
(ov
eral
l) =
70%
Accu
racy
(ov
eral
l) =
70%
Clas
sSt
ep 1.
1St
ep 1.
2St
ep 2.
1bSt
ep 3
.1bSt
ep 3.
2St
ep 3.
3St
ep 1.
12
(43
%)
40
01
0St
ep 1.
20
17 (7
7 %
)0
04
1St
ep 2.
1b0
21
(17
%)
02
1St
ep 3.
1b0
00
34 (9
2 %
)3
0St
ep 3.
20
20
225
(66
%)
9St
ep 3.
30
10
28
17 (6
1 %
)
Not
e: C
lass
ifica
tions
cor
resp
ond
with
CAR
S M
odel
N
ote:
Cla
ssifi
catio
ns c
orre
spon
d w
ith C
ARS
Mod
el ‘‘ m
oves
mov
es’’
(Acc
urac
y=88
% w
hen
usin
g (A
ccur
acy=
88%
whe
n us
ing
‘‘ sec
ond
opin
ion
seco
nd o
pini
on’’ ))
20
Res
ults
(In
the
cla
ssro
om)
Res
ults
(In
the
cla
ssro
om)
A A ‘‘ W
indo
ws
Win
dow
s ’’In
terf
ace
Inte
rfac
eTo
ena
ble
rese
arch
ers,
tea
cher
s an
d st
uden
ts t
o To
ena
ble
rese
arch
ers,
tea
cher
s an
d st
uden
ts t
o us
e th
e sy
stem
it n
eeds
to
be e
asily
acc
essi
ble
via
use
the
syst
em it
nee
ds t
o be
eas
ily a
cces
sibl
e vi
a a a
‘‘ win
dow
sw
indo
ws ’’
inte
rfac
ein
terf
ace
A A ‘‘ w
indo
ws
win
dow
s ’’sy
stem
has
bee
n bu
ilt u
sing
the
sy
stem
has
bee
n bu
ilt u
sing
the
pr
ogra
mm
ing
lang
uage
PER
L 5.
6 an
d PE
RL/
prog
ram
min
g la
ngua
ge P
ERL
5.6
and
PERL
/ Tk
Tk
21
Res
ults
(In
the
cla
ssro
om)
Res
ults
(In
the
cla
ssro
om)
Mat
eria
ls S
elec
tion
by N
onM
ater
ials
Sel
ectio
n by
Non
-- Nat
ive
Teac
her
Nat
ive
Teac
her
““ The
dec
isio
ns a
re f
ast.
The
deci
sion
s ar
e fa
st. ””
““ It
is s
impl
e an
d ea
sy t
o co
mpl
ete
the
task
.It
is s
impl
e an
d ea
sy t
o co
mpl
ete
the
task
. ””““ I
rel
y to
o m
uch
on t
he s
oftw
are
and
stop
fee
ling
I re
ly t
oo m
uch
on t
he s
oftw
are
and
stop
fee
ling
like
doin
g th
e an
alys
is m
ysel
f.lik
e do
ing
the
anal
ysis
mys
elf.
””
Com
men
tsCo
mm
ents
1/7
1/7
2/7
2/7
Erro
rsEr
rors
28 m
in.
28 m
in.
(1 m
in. f
or a
naly
sis
plus
(1
min
. for
ana
lysi
s pl
us
time
to c
heck
res
ults
)tim
e to
che
ck r
esul
ts)
100
min
.10
0 m
in.
Tim
e to
com
plet
e ta
sks
Tim
e to
com
plet
e ta
sks
Usi
ng S
yste
mU
sing
Sys
tem
By h
and
By h
and
Sele
ctio
n of
7 t
exts
Se
lect
ion
of 7
tex
ts
from
10
text
cor
pus
from
10
text
cor
pus
22
Res
ults
(In
the
cla
ssro
om)
Res
ults
(In
the
cla
ssro
om)
Text
Ana
lysi
s by
Non
Text
Ana
lysi
s by
Non
-- Nat
ive
Stud
ent
Nat
ive
Stud
ent
““ ItIt’’ s
ver
y fa
st.
s ve
ry f
ast.
””““ T
he s
truc
ture
is n
ow v
ery
clea
r.Th
e st
ruct
ure
is n
ow v
ery
clea
r.””
““ The
sys
tem
has
cle
arly
ana
lyze
d th
e st
ruct
ure,
Th
e sy
stem
has
cle
arly
ana
lyze
d th
e st
ruct
ure,
w
hat
you
shou
ld d
o is
cor
rect
onl
y th
e pa
rt t
hat
is
wha
t yo
u sh
ould
do
is c
orre
ct o
nly
the
part
tha
t is
st
rang
e. S
o th
e w
ork
is li
ttle
.st
rang
e. S
o th
e w
ork
is li
ttle
. ””
Com
men
tsCo
mm
ents
0/4
0/4
2/4
2/4
Erro
rsEr
rors
15 m
in.
15 m
in.
(1 m
in. f
or a
naly
sis
plus
(1
min
. for
ana
lysi
s pl
us
time
to c
heck
res
ults
)tim
e to
che
ck r
esul
ts)
38 m
in.
38 m
in.
Tim
e to
com
plet
e ta
sks
Tim
e to
com
plet
e ta
sks
Usi
ng S
yste
mU
sing
Sys
tem
By h
and
By h
and
Sele
ctio
n of
4 t
exts
Se
lect
ion
of 4
tex
ts
from
10
text
cor
pus
from
10
text
cor
pus
23
Conc
lusi
ons
Conc
lusi
ons
A co
mpu
ter
syst
em w
as d
evel
oped
to
A co
mpu
ter
syst
em w
as d
evel
oped
to
anal
yze
text
str
uctu
rean
alyz
e te
xt s
truc
ture
Lear
ning
met
hod:
Le
arni
ng m
etho
d: ‘‘ S
uper
vise
d Le
arni
ngSu
perv
ised
Lea
rnin
g ’’Ac
cura
cy 7
0% (
88%
whe
n us
ing
seco
nd o
pini
on)
Accu
racy
70%
(88
% w
hen
usin
g se
cond
opi
nion
)
Syst
em e
rror
s co
rres
pond
ed w
ith C
ARS
Syst
em e
rror
s co
rres
pond
ed w
ith C
ARS
Mod
el
Mod
el ‘‘ m
oves
mov
es’’
Effe
ctiv
e in
the
cla
ssro
om f
or u
se b
y Ef
fect
ive
in t
he c
lass
room
for
use
by
teac
hers
and
stu
dent
ste
ache
rs a
nd s
tude
nts
Run
s in
Win
dow
s en
viro
nmen
tRun
s in
Win
dow
s en
viro
nmen
t