clinical psychology applications of automated syntax analysis

Clinical Psychology Applications ofAutomated Syntax Analysis

Tova [email protected]

Advisor: Dr. Mitch Marcus

April 18, 2008

Abstract

In this research we analyze the contribution of automatic syntax analysis to the fieldof clinical psychology. Using data from a cognitive study on patients with dementia,sentence grammar structures are extracted from transcribed speech samples and areanalyzed for patterns of speech complexity. These are used to explore correlations be-tween cognitive impairment due to differing diagnoses of dementia and speech syntax.The hypothesis is that patients suffering from dementia affecting executive functioningwill show significantly less use of syntactic phenomena which require a high level of log-ical organization. The method chosen is to employ tools that identify specific sentencestructures and their rates of occurrence, and use these results to identify significant dif-ferences between the group with dementia and the control group. The results supportthis hypothesis, finding significant differences for certain syntactic patterns.

1 Introduction

Syntax analysis, a field within computational linguistics, is used to identify features of speechthat can provide information about the author and tone of a textual segment. To accomplishthis, sophisticated tactics are required, including analysis of content word choice, sentencesas a whole and within-sentence word relationships.

This kind of analysis has many applications. One example is using syntax to determinesentiment in written text. Popular web sites such as Amazon.com and Yahoo! allow users towrite reviews about products that appear on their site. A sentiment analysis tool could createa simple and concise way to view these free-form pieces of text, by automatically labelingthem as positive or negative reviews (Pang, Lee and Vaithyanathan, 2002). Thus, automatedsyntax analysis can be used to streamline online textual information, an increasing concernwith the continual growth of digitally available data.

1

The research presented here is intended to employ syntax analysis tools to approach aproblem within clinical psychology, by developing a model to identify correlations of linguisticdeficits in a particular manifestation of frontotemporal dementia, which has previously beendefined primarily by social and organizational difficulties. Some forms of dementia includedisorganized speech as a symptom, however in these cases the speech patters are easilyrecognizable by a doctor during a conversation with the patient. We propose here thatalthough they are not easily discernible, syntax may be affected by FTD, but stronger toolsare required to notice them, and simply listening to a patient is not enough.

In this research we focus on frontotemporal dementia, however further applications ofthese methods can be used for research problems in many different areas.

2 Related Work

2.1 Text Categorization

Previous work in textual analysis has focused on categorizations of large amounts of textualdata into preset categories. One application is determining the author or genre of a piece oftext. Classification by overall sentiment, positive or negative valence, is another example.Research by Pang et al focused on categorization of online movie reviews, into one of twodistinct groups: positive or negative reviews, using a number of different artificial intelligencemethods. The task was found to be significantly harder than that of topic classification. Theperformance of a machine learning classifier still achieved high accuracy results, than usingrules that humans thought to be important in this type of classification (prompting thequestion of reliance on prior intuitions).

One of the difficulties arises from the source of the data. These short reviews often usea complex stylistic structure to indicate an overall opinion of a product or movie, such aspointing out the positive elements of the product before declaring that it really is a terribleinvestment. Since the methods currently used analyze each sentence as a separate unit, it isdifficult to extract on overarching opinion.

2.2 Countertranference and language

Dahl and Teller (1978) raised the idea that one could extract syntactic cues from speechduring psychoanalysis sessions, and use these to gain insight into the therapist’s underlyingmental state during the session. This mental state is part of a Freudian phenomenon calledcountertransference, which is an important aspect of therapy in Freudian analysis.

In their research, Dahl and Teller used hand annotated, transcribed session between aspecific patient and his analyst. Since hand annotation is difficult and time consuming, weavoided this method for our data set and focused on automated tools for speech tagging andparsing. This allows for flexibility in the amount of data and the speed of analysis.

2

2.3 Cognitive Impairment and Language

Experimental studies involving an oral task to establish cognitive abilities are often the focusof cognitive psychology research. These studies involve a significant amount of speech data,and are likely to benefit from techniques similar to those presented here. Automatic syntaxanalysis can produce a consistent and objective measure of features of speech output. Thisenables the experimenters to create a linguistic profile of the subjects, and observe patternsand trends that can help in diagnosis and in defining a disorders properties, while side-stepping such issues as between-experimenter discrepancies and observer biases.

A study was conducted which measured syntactic complexity of transcribed speech, wherepatients with mild cognitive dementia were asked to retell a story from memory (Roark,Mitchell and Hollingshead, 2007). This study was different from previous studies in thatautomatic parsers were used to parse transcribed spoken samples, rather than requiring ahuman eye. The hypothesis was that subjects with Mild Cognitive Impairment (MCI) wouldshow less grammatically complex speech than healthy subjects, using objective methods toevaluate sentence complexity. Two different measures of syntactic complexity, each a productof parser derivation, were used for evaluation. The hypothesis was that those with MCIwould show lower syntactic complexity, thus enabling syntactic complexity to be considereda diagnostic criterion.

The Charniak parser (Charniak, 2000) showed a high correlation between findings usingautomatic parsing and those using manual parsing. However, the results of the study werenot found to be consistent with the hypothesis, in that the syntactic complexity differenceswere not in agreement across tests and in one test resulted in contradictory results. A possiblereason for this outcome may be differences in the complexity metrics used. However, theimportance of this work is in its demonstration of the contribution of automatic parsing anduse of natural language processing techniques. That is, it was shown that the automaticparsers produced results very similar to those using manual annotation, even though theresults were not as hypothesized.

2.4 Frontotemporal Dementia and Language

In this research we focus on a similar experiment done on patients with various forms offrontotemporal dementia (FTD). The goal is to locate a variety of patterns in speech, andthereby construct a suite of measures with which to compare the study subjects from thetwo sample groups. This differs from the work on MCI, where they chose to assign a singleordinal value to denote the syntactic complexity of each transcribed session. A profile isbuilt by identifying features of speech whose rates of occurrence between the sample groupsis different with statistical significance. These features can then lead to the training of anautomated classifier, that will be able to assign a new subject to one of the groups.

The chief advantage of the current work over prior research is the ability to generateimportant findings using automated techniques, whereas previous work focused on hand an-notated data. Although the data set used here was not large, each step of the process wasdeveloped to optimize usability in future cases with larger data sets. Thus, using computa-

3

tional tools we can promote research in the field of clinical psychology, and allow experimentson a larger scale to be analyzed easily and efficiently.

3 Technical Approach

3.1 Data

The data set for analysis was collected from audio recordings of a neurological experimentdone by Ash, Moore, Antani, McCawley, Work and Grossman (2006). The subjects ofthis experiment were 35 patients diagnosed with frontotemporal dementia (FTD) and 10healthy individuals. FTD is a group of disorders with common symptoms, primarily certainbehavioral, social and cognitive deficits (Farmer and Grossman, 2005). Subtypes of FTDare identified by clusters of symptoms that co-occur, affecting specific areas of functioning.These symptoms tend to be very similar to other disorders, making diagnosis in a clinicalsetting very difficulty.

The subjects of the experiment belonged to one of four categories: progressive nonfluentaphasia (PNFA), semantic dementia (SemD) or nonaphasic dementia with a disorder ofsocial comportment and executive functioning (Soc/Exec), and controls of matching age andeducation that do not suffer from dementia. PNFA manifests primarily in impaired fluencyin speech. Patients with PNFA often attempt to verbalize their intent but are not ableto construct coherent sentences that follow grammatical rules. SemD patients suffer fromdifficult verbal expression as well, however for them the difficulty is semantic. Often, theycannot match a word to the item they are speaking about, and their speech is scatteredwith filler phrases like “stuff” and “thing”. Those in the PNFA and SemD categories werenot included in the presented research, both because their disorders are already linked tolinguistic deficiencies and because their speech deviates so far from normative grammaticalusage, that using the parser would not result in helpful information.

Soc/Exec patients are primarily characterized by inappropriate behavior, lack of insightinto social interactions, decreased attention and inability to conceptually organize complexevents. The hypothesis studied here is that these symptoms may be manifested in a linguisticcontext as well. This may include a difficulty in describing a sequence of actions, or inconstructing a sentence with multiple related clauses.

Forty-five subjects participated in the study, of which 21 are relevant to our paper (11Soc/Exec, 10 Normal). In the remainder of the paper, we will refer to the Soc/Exec patientsas FTD patients since this is enough to differentiate them from the control group.

Each participant in the study was given an illustrated childrens picture book without text,titled ”Frog, Where are you?”, and was asked to narrate the story depicted in the picturesas if he or she was reading it to a child. The narratives were recorded and transcribed bytrained transcribers, using a consistent transcription method. This method was intended tocapture as much detail as possible about the subject’s speech, including coding for pausesand mispronunciations.

4

3.2 Data Preparation

3.2.1 Dysfluencies

The method of transcribing the text was chosen to incorporate as much data about thepatient as possible. Certain codes were used to indicate pauses of different lengths, as wellas mispronunciations and stuttering were faithfully transcribed. However, these nuanceswere not only irrelevant to our purely syntactic analysis, such extraneous data could confusethe parser and produce incorrect parse trees as a result. Additionally, for similar reasons,speech dysfluencies must be removed from the corpus. These include, for example, instanceswhere a person changes the word they are using mid-word, stutter or repeat a word whileconstructing the remainder of the sentence.

This is an important expedient to aid in the parsing process, and when tools are availablethat can handle these dysfluencies we will no longer need to remove them from the corpus.There is a tagging system used to exclude certain pieces of text from the corpus from beingparsed. However, since there were multiple forms of extraneous data, and the total corpus wasfairly small, we chose to clean the data by hand. This proved to be the simplest way to assurethat the data was in the appropriate format. In the future, it may be necessary to employa different method of transcription or develop an automated dysfluency removal system,assuming that the entire corpus used a uniform method of transcribing these dysfluencies.

3.2.2 Sentence Boundaries

There are an additional number of preparatory steps required to format the data before usingthe automated parser. First it is necessary to find sentence boundaries by which to dividethe corpus into individual sentences. Parsing occurs at the sentence level and relationshipsbetween sentences are not considered. At first glance this mission does not seem too difficult;however, a closer look shows that in English there are no clear rules for sentence breaks, aspunctuation can occur in a number of roles. MXTerminator is a probabilistic maximumentropy model developed by Reynar and Ratnaparkhi (1997) to accomplish this task. Thissystem has a high performance version which is given rules about the English language toassist in training, and shows 98.8% accuracy on the WSJ corpus (Reynar and Ratnaparkhi,1997).

Due to the nature of data, it was easy enough to split the corpus into single sentencelines manually, while cleaning the corpus from dyfluencies, as discussed above. However, inthe case of clean data, MXTerminator can be run to the same end, which is useful for largersets of data. In our case of 21 short files, the need was not as significant.

3.2.3 Part-of-Speech Tagging

Sentences are then tokenized using a tokenizer available online, written by Robert MacIntyreto produce Penn Treebank tokenization. This is a sed script which divides contractions andpunctuation marks as separate tokens.

5

On this output we use a part-of-speech tagger, MXPOST, to assign each word in asentence to the appropriate basic syntactic category (Ratnaparkhi, 1996). MXPOST is amaximum entropy model, trained on a large corpus that has been hand-annotated withpart-of-speech (POS) tags. Each word has multiple possible POS tags, that are generatedby their appearance with that tag in the training corpus. During training, the model gener-ates features sets for each word and the joint probability of the history of assignments withinthat sentence along with the probability of each possible tag. This is an oversimplificationof a complex process, but those details are beyond the scope of this paper.

During training, each sentence is evaluated as a single unit. All possible tag assignmentare generated for each word in the sentence. A tag is then chosen which has the highestconditional probabilities of the tag given the history of tagging up to that point in thesentence, the same history as created during the training step. There is an iterative step,part of the max-ent paradigm, however the details of this process will be omitted from thisdiscussion.The results of running MXPOST on a corpus is that each token is appended witha POS tag, of a high probability given the context of the sentence.

The model used to tag the FTD data was trained on Wall Street Journal data. Ratna-parkhi found very high accuracy, 96.6%, on his test data using this model. The expectationwas that due to the fact that the source of the data in the present research was speech ratherthan written word, we would encounter some difficulties with accurate POS tagging. Whenreviewing the results appeared relatively successful upon inspection, however, the data wasnot hand-annoted for correct POS tags and therefore exact accuracy is not known. A furtherdiscussion of this difficulty will follow in section 3.7.

Correct parser output relies on accurate tags, and therefore by observing the parser per-formance, discussed below, we were able to evaluate the performance of the preparatorysteps as well.

3.3 Automatic Parsing

Parsing is the process by which we assign grammatical structures to ordered sets of words.Parsing natural language is simple and natural for humans, however, it is far from trivialfor computers. A parser requires a formal grammar by which to make decisions as well as adecision making algorithm and a method for resolving ambiguities.

There is a tradeoff in the choices made when designing a parser: the ideal parser requiresas little information as possible about the grammar to start off with, and returns the correctsyntactic structure as often as possible. Currently, parsers are built to learn informationabout the grammar using a sufficiently large text base, called a training corpus, which hasbeen hand annotated. The parser will then perform a probabilistic analysis to produce thegrammatical rules that are necessary for it to class a sentence in a way that matches the handannotation. The parser is developed to retain only that information necessary for analysis,and thus is a more efficient parser.

The challenges a parser faces are primarily resolving ambiguities. An example problem

6

sentence contains a relative clause that may modify any one of a number of nouns in thesentence. These are feats that the human mind adapts to doing from a young age; however,computational linguists must come up with mathematically complex algorithms for doingthis automatically.

This project uses Collins’ head-driven statistical parser (Collins, 1999). This model usesa top-down derivation that breaks the sentence into smaller parts and assumes indepen-dence between them, parsing each separately. So even though the model is lexically tied, theamount of data that is required is smaller, since the word strings are shorter. The generalidea involves identifying the head of each sequence, generating the probabilities for the head,all of its left modifiers and all of its right modifiers. The specifics of the algorithm and theadditional steps required to choose the most likely derivation are beyond the scope of thispaper.

The output of the parser is a text file of one sentence per line, where each sentence is for-matted as a parse tree. The parser breaks the sentence down to pieces of a formal grammarand build a tree representing the relationships between these pieces. The leaves of the treesare the tokens comprising the sentence. The raw output represents these relationships byusing brackets as delimiters. In the remainder of this paper, when presenting parser outputwe will format it for readability, using indentation to reflect the depth of a constituent withinthe tree.

Running the parser on the data the first time resulted in many mistakes. The first itera-tion of cleaning the data left the punctuation as had been originally transcribed. However,some uses of punctuation marks were used as transcription codes and not in their correctgrammatical function. These had to be changed before running the parser again. Sections ofthe sentence that were inaccurately separated by commas were considered to be clauses andevaluated under this assumption. However, once the corpus was changed, the parser was runagain and the resulting output was surprisingly accurate. This was surprising because theparser is trained on Wall Street Journal text, which is very different from text from speech.Writers can spend time looking over their text multiple times to assure that ambiguities areminimized, whereas speech is constructed ’on the fly’, with a lot of room for errors. Onesuch complicated sentence that the parser correctly parsed from the corpus is from an FTDpatient’s narrative: ”And the little boy climbs up in a tree, looks in this hole in the tree tosee if the frog is in there.” This multi-part sentence was accurately parsed as shown in thisparse tree:(FTD 992)((S(CC And)

(NP-SBJ-A(DT the)(JJ little)(NN boy))

(VP(VP-A(VBZ climbs)(PRT(RP up))(PP-CLR(IN in)

(NP-A(DT a)

7

(NN tree))))(, ,)(VP-A(VBZ looks)

(PP-LOC-PRD(IN in)(NP-A(NP(DT this)

(NN hole))(PP-LOC(IN in)

(NP-A(NP(DT the)(NN tree))

(, ,)(SBAR(WHNP-0(-NONE- 0))

(S-A(NP-SBJ-0(-NONE- *T*))(VP(TO to)

(VP-A(VB see)(SBAR-A(IN if)...

For lack of space the remainder of the SBAR is not presented, however the parser outputis accurate on the clause as well.

Despite this performance, the parser still ran into problems on some lines. The reasonsstated above are enough to throw off the parser. We add to this the aspect of patientssuffering from FTD, and perhaps the utterances are more likely to have ambiguities and rareconstructions that confuse the parser. As an example, this is a line from an FTD subject,as parsed by the parse: ”Here we go with the owl flying overhead and the little boy’s out,still in the yard.” (FTD 2543)

((S(ADVP-TMP(RB Here))(NP-SBJ-A-0(PRP we))(VP(VBP go)

(PP-CLR(IN with)(NP-A(DT the)

(NN owl)))(S-MNR(NP-SBJ-A-0(-NONE- *))

(VP(VBG flying)(NP-A(NP(NN overhead))

(CC and)(NP(DT the)

(JJ little)(NN boy)(POS ’s)))

(PP-LOC(IN out)(NP-A(NP(, ,)

(RB still))(PP-LOC(IN in)

8

(NP-A(DT the)(NN yard))))))))

(. .)))

Note that part of the parsing trouble comes from the incorrect POS tags. The token”overhead” was tagged as a noun (NN), causing the node ”and the little boy” to be joinedwith ”overhead” as an NP conjunction, as opposed to being conjoined with the constituent”the owl”. This output does not reflect the meaning intended by the utterance, and treessuch as these may mislead some aspects of the analysis presented here.

Given the parse trees from the corpus, we used the raw output without attempting tocorrect parser errors. Even though this paper is done on a small corpus, the hope is that thisresearch can set a path for this type of work on a larger scale in the future. Since no hand-annotation was done on the corpus, accuracy for the parser was not calculate, although asweep over the data indicated that the parser provided correct trees most of the time. There-fore, we assume that the parser data is accurate and see what conclusions become apparentfrom the data. Without manual corrections, generalizations of this methodology to largerdata sets are possible.

3.4 Analysis: Macro View

3.4.1 Detectors

The first method of analyzing the data is to search for clues to the linguistic abilities of thespeaker. To uncover syntax phenomena that occur in the data, we use detectors written inJava, which are pattern matchers, traversing a parse tree and indicating those that match amodel of a syntactic phenomenon.

When run on a corpus, the detectors return raw counts of the number of sentences whichwere found to match the pattern. Even though a certain feature could appear multiple timesper sentence, we chose a binary method of detection. This allowed us to normalize the countsby number of sentences rather than a more complex method of normalization. Additionally,some features relate to the sentence as a whole and having the detectors work in a similarway aided the overall analysis.

This method of analysis is referred to as the macro view, since this is a top-down method.Phenomena that are expected to be relevant are searched for within the data, rather thanallowing the data to present trends. This is an important approach since using a bottom-upmethod may provides so much detailed data that such patterns may be obscured. We willfurther discuss this other approach later in this work.

3.4.2 Choice of Detectors

The list of possibly relevant cues were generated by two methods: from general knowledgeand intuitions about normal linguistic complexity and from looking at the data. An example

9

of the former is the idea that relative clauses indicate a higher complexity level than a simpledeclarative. An inability to construct grammatically complex sentences should result in anoveruse of simple structures, such as subject-verb-object sentences. This may be a result ofa difficulty in incorporating complex ideas and concepts. Usage frequency of conjunctions,passives and inversions are additional indicators of sentence complexity. The hypothesistested is that the FTD patients will differ significantly in the syntactic complexity of theirspeech from the healthy controls, as indicated by the frequency of use of these syntax fea-tures.

The following syntax structures were chosen as possibly significant differences betweenFTD patients speech and that of the controls:

1. Simple declarative: A simple subject-verb-object construct, without any subordinateor relative clauses. The simplest example of this type of sentence, as stated in thePenn Treebank Manual (Bies, Ferguson, Katz, MacIntyre, (1995)): Casey threw theball. This would show up as the following in the parser output:(S (NP-SBJ Casey)

(VP threw(NP the ball)))

(Bies, et. al, pp. 15-16)

2. Wh questions: Any sentence that is formatted as a who, what, where, etc question iscounted as a wh-question. These may be more common in data from a different typeof experiment, for example a dialogue, rather than storytelling. However, we foundthat some subjects use questions and we expect that this may be a sign of higherorganizational and logical skills. These are indicated by the parser using an SQBARtag at the root of a sentence or of a subtree which includes the question. (Bies, et. el.,pp. 22-23).

3. Yes-No questions: Similar to the previous item, these may not be as common in thisdata. However, if there is a difference in the frequency of usage, this may indicate alower complexity than the use of Wh questions. Yes-No questions are tagged by theparser using the SQ tag. However, the SQ tag is also used to indicate the subject ofan SBARQ, a wh-question. Therefore the pattern for this detector excludes any SQunder an SBARQ. (Bies, et. al, pp. 23-24)

4. Fragment: The parser recognizes when a sentence or clause is not complete, when someimportant piece is missing. These clauses are then labeled with a FRAG tag. Whilethese may appear in correct speech (ie, ”Yes” as an answer to a question; this is correct,yet a fragment), a frequent occurrence of FRAGs may be an important indicator ofsyntax deficiencies (Bies, et. al, pp. 25-26).

5. Top level is not an S-typed category: The parser will assign the most likely tree struc-ture given the input. This detector counts the number of sentences that are rooted

10

with a node which has a category other than S, SQ or SINV. The root of the treeestablishes what type of utterance is being parsed. The sentence may be a fragment,however includes a full constituent, as opposed to a sentence that is cut off mid-wayand therefore incomplete. An example from the corpus of a non-S rooted sentenceis taken from an FTD subject describing the scene illustrated on the page: ”It’s anice woodland scene. The rocks and some plants.” The second sentence was actuallyserves as additional details modifying the first sentence, and thereby forms a completeconstituent. However, each sentence is considered a separate unit for the parser, so theparse tree for the second sentence looks like:

((NP(NP(DT The)(NNS rocks))

(CC and)(NP(DT some)

(NNS plants))(. .)))

6. ”And” as a leading modifier: In English, it is incorrect for the first token of the sentenceto be a modifier. However, in speech it occurs quite often that one begins a sentencewith ’And’.

7. ”But” as a leading modifier: The same as the previous, with ’But’ as the leadingmodifier.

8. Passive sentences: In a passive structure, the object of the verb is missing, and actuallyrefers to the subject of the sentence. In parser output we have an index null trace (NP*)placed after the verb phrase, which has an index matched with the index of the subject.We hypothesize that this structure indicates a higher linguistic complexity than thatof the simple declarative. (Bies, et. al, pp. 69-70).

9. Subject conjunctions: Conjunctions often indicate a more sophisticated idea. Thus,the ability to organize multiple elements within the same utterance may be a significantindicator. A subject conjunction, for example, would be a sentence such as: ”Tom andJulie have marbles”. This requires a sense of inter-subject connection ans opposed tosomeone who says ”Tom has marbles. Julie also has marbles”.In parser output, a conjunction word would be indicated with a POS tag of ”CC”, andwould be sandwiched in between the two elements that are being joined (Bies, et. al,pp. 26-28). For example, the following sentence from an FTD patient: ”Well, he goesto bed and the frog gets out of the jar” will be parsed as follows:

((S(INTJ(RB Well))(, ,)(S-A(NP-SBJ-A(PRP he))

(VP(VBZ goes)(PP-DIR(TO to)

11

(NP-A(NN bed)))))(CC and)(S-A(NP-SBJ-A(DT the)

(NN frog))(VP(VBZ gets)

(PP-CLR(IN out)(PP-A(IN of)

(NP-A(DT the)(NN jar))))

10. NP Conjunction: As in item (9), with noun phrases joined rather than subjects. Anexample sentence of this would be: ”Tom has marbles and playing cards”. (Bies, et.al, pp. 26-26)

11. VP Conjunctions: As in item(9) with verbs joined. For example: ”Tom plans to runand swim today”. (Bies, et. al, pp. 26-28)

12. Singular noun with a determiner: Some subjects were found to use a noun as a propernoun, rather than precede the noun with a determiner. For example, instead of saying”Tom played with the frog” or ”The boy played with the frog”, the subject chose ”Boyplayed with frog”. This is incorrect grammatically, and may indicate a simplified logicabout the subjects, and therefore the hypothesis suggests that this is significant.

13. Subordinate and relative clauses: Subordinate clauses are expected to be an excellentindicator of logical complexity and organizational ability. Subordinate clauses allow aspeaker to modify or supplement constituents in the independent clause. To coherentlyconstruct such a sentence requires the ability to understand the relationship betweenthe dependent term and the element which it adjoins, as well as avoid getting side-tracked by the information in the dependent clause. These clauses are indicated bythe parser using an ”SBAR” category.To illustrate this, an example of a line from a control subject: ”And there’s a littlegroundhog who’s watching all this.”

((S(CC And)(NP-SBJ-A(EX there))(VP(VBZ ’s)

(NP-PRD-A(NP(DT a)(JJ little)(NN groundhog))

(SBAR(WHNP-0(WP who))(S-A(NP-SBJ-0(-NONE- *T*))

(VP(VBZ ’s)(VP-A(VBG watching)

(NP-A(PDT all)

12

(DT this))))))))(. .)))

(Bies, et. al, pp. 28-29, 172).

14. Incomplete sentence: Some subjects chose to stopped an utterance mid-sentence startagain with a new sentence. These were not removed from the corpus, but rather wereindicated by omitting final punctuation. These sentences were counted by finding allthose sentences without any form of final punctuation.

3.4.3 Composite Detectors

The above detectors comprised the original list of basic patterns that may differentiatebetween data from FTD patients and that from the controls. However, to gain an additionallevel of data, we viewed that some of the detectors are actually related to one another, andit may be interested to view counts for more complex ideas.

1. Leading modifiers of either kind: ”And” or ”But”.

2. Questions of either kind: Wh- or Yes/No questions.

3. Sentence fragment, indicated either by a FRAG tag or by a top level that is not an S.

4. Conjunctions of type S, NP or VP. Since detectors are built on a binary per-sentencebasis, we must include sentences that have multiple types of conjunctions only once.

5. FRAG tag that does not appear in an incomplete sentence.

These detectors were included in the analysis as well.

3.4.4 Analysis of Detector Results

To infer conclusions from the raw numerical count of the detectors, statistical tests are neces-sary. However, a number of problems arise when attempting to use standard methods. Thesample size of both groups is very small for this type of group comparison. Additionally,nothing is known about the distribution of the two data sets. So a simple comparison ofmeans, or a t-distribution hypothesis test cannot be used since that requires an an assump-tion of Gaussian distributions.

We can consider the individual observations of running the 19 detectors, normalized bythe sentence count, to be a ”well-behaved” 19-dimensional random vector for each sub-ject. ”Well-behaved” refers to the property that the expected value of the vector magnitudesquared converges, that is: E[|V |2] < ∞. This constraint does not pose a problem in ourdata. So we represent the vector for a specific subject i, as the vector:

xi =(

X1i X2i ... X19i

)T

13

where Xki is the value of observation k (ranging from 1 to 19) for the subject i (rangingfrom 1 to 11 for the FTD patients).

From these two 19 x 19 covariance matrices can be created, one for controls and one forFTDs. Each term in the matrix is a scalar value indicating the covariance statistic for eachfeature.Let µ be the expected value of the an observation i, or µ =

∑ni=1 Xi. Then, the covariance

matrix is calculated as follows:

1

n

n∑

k=1

(Xk − µ)(Xk − µ)T

This outer product will be a 19x19 covariance matrix. Given these two matrices, we cancompare the scalar values and extract significance from those differences.

The problem with this method lies in the sparse data. Yet again, our limitation ofabout 10 subjects per group makes it difficult to come up with sound numbers. To create a19x19 matrix using only 10 small data sets per sample is too much of a stretch and unlikelyto provide useful statistics. Therefore, another method is found that does not make anydistribution assumptions and does not suffer as much from the small data set. The non-parametric method we chose to employ is the Mann-Whitney-Wilkoxon test, also known asa rank-sum test (Bickel and Doksum, 1977).

The algorithm for the test requires assigning a rank for each sample group, per feature.To find this value, all observations from both the FTD and control groups are ranked inascending order. Then, all FTD ranks are summed for a group ranking of RFTD. The sameis done for the control group, RCtrl. Let nFTD and nctrl be the sample sizes of the FTD andnormal groups respectively. In our case, we know that nFTD = 11 and nctrl = 10.

Care must be taken when deciding on the specifics of the ranking rule. Specifically, aduplication of a value between the two lists may result in different rankings depending on theimplementation of the U test. If a k-way tie occurred for a value n with the first appearanceof n having the rank r, we assign the rank to each sample group with k′ appearance of thevalue k, by the formula:

k′

k(

r+k∑

i=r

i)

This way we assure that the sum of ranks maintains the constant value of nFTD ·nCtrl withoutassigning different ranks to identical observations.

The value of the U statistic is defined as

U = min(RFTD − nFTD(nFTD + 1)

2, RCtrl −

nctrl(nctrl + 1)

2)

By observation, the U value must fall between 0 and nFTDnctrl. The minimum countindicates whether is was the FTD patients who scored lower or the controls. To establishsignificance, we the observed U value is compared , as in a Normal t-test, with a minimum

14

critical U. These critical values are found in a table, given an α level of confidence and thesample sizes. To rejecting the null hypothesis at an α level of 0.05, with nFTD = 11, nctrl = 10,we find Ucritical = 26 and with α = 0.1, Ucritical = 35. Due to the small sample size, welooked at results at both confidence levels. Thus, for any observation for which the minimalU value is found to be less than Ucritical, we reject the null hypothesis and claim that thetwo distributions are significantly different.

3.4.5 Results

Detector U statistic α Low rank group

Leading But 29.0 0.1 FTDPassive Sentences 32.5 0.1 FTDVP Conjunctions 26.5 0.1 FTDSubordinate clause 34.5 0.1 FTDIncomplete Sentence 21.0 0.05 CtrlTop Level not S 26.5 0.1 Ctrl

The table above presents the numerical counts and U statistics for those detectors thatwere found to be significant, as well as the confidence level under which the ranks differed.The complete data is presented in appendix B.

There are a number of detectors that are notably significant. At a 95% confidencelevel, we observe that control data contains fewer incomplete sentences than FTD subjects.A possible suggestion for why this may be the case is that lack of organizational skillscorrelated with FTD will filter into speech, resulting in a lack of fluidity. An abundance ofincomplete utterances is one manifestation of this lack of fluidity. We also note that controlsshowed fewer non S-rooted trees. As discussed above, this count is related to the sentencecompletion count. Non S-rooted indicate a sentence that consists of a complete constituent,but not constructed as a complete sentence.

At a lower confidence threshold, there are additional findings. FTD patients show a lowercount of passive sentences, subordinate clauses and VP conjunctions. These results are in linewith the original hypotheses, that more complex structures are less likely to show up in FTDdata than in that of normals. Passive sentences require making use of an implied subject, ashift in focus from the primary actor, and an understanding of the scene as a whole, beyondtrivial actors and objects. Additionally, use of subordinate clauses involves an organizedtrain of thought that is able to incorporate information into an utterance without detractingfrom the correct structure of the sentence. VP conjunctions may appear less in FTD datasince one uses a VP conjunction when the actions are viewed as being grouped together.Considering that the data source is telling a story, the simplest narrative would focus oneach action of the story separately. An increase in VP conjunctions might be connected toa more story-telling outlook rather than mere picture describing. These are possible reasonsto describe how these syntactic features are in fact a function of the social and executivesymptoms of this type of FTD.

One detector did not support the original hypothesis. FTDs showed fewer sentences

15

beginning with the token ’But’, a phenomenon that was hypothesized to be an correlatedwith FTDs.

Most of the detectors were not found to be significantly different between the two groups.One reason may be the sparse data problem again. As the sample sizes shrink, we requirea lower U statistic to assert significance. An additional problem may be the experimentmethodology: since both groups were asked to speak as if to children, all subjects may havesimplified their speech. This is further discussed in the section on challenges.

3.5 Analysis: Micro View

3.5.1 Spines

This approach is a bottom-up method, intended to collect more detailed data from the sub-jects speech. The parser breaks up each sentence into subparts. These parts are referredto as the constituents of the sentence. Each constituent has a main word around which itcenters, which is the head of that constituent. For every word in the sentence, the biggestconstituent that it heads is referred to as its maximal projection.

This analysis focuses on each word and builds a spine starting from the POS tag of thatword and working up through all the projections that that word heads, terminating at themaximal projection. An example from an control patient for clarification: ”And the shouldnot worry about him anymore”. Note the angle brackets contain the head for each of theconstituents:

[[S 〈 worry 〉 And [NP 〈 they 〉 they ][VP 〈 worry 〉 should not [VP 〈 worry 〉 worry [PP〈 about 〉 about [NP 〈 him 〉 [NP 〈 him 〉 him][ADVP 〈 any 〉 anymore]]]]].]]

In the case of the word ”worry”, the spine will proceed all the way up to S, because thatis the maximal projection of ”worry”. However, the maximal projection of ”they” is just thenoun phrase (NP).

The tool used to extract the spines was written by Ryan Gabbard, a UPenn graduatestudent , using the rules originally made by Magerman and modified by Collins and himself.Each word is built by traversing up the tree looking for a higher projection of the worduntil the maximal projection is reached. An academic paper has not been published by thedevelopers on this tool.

3.5.2 Analysis of results

The complete results of the spine analysis for the 2 groups is presented in appendix C. Pre-sented in the figures below are the output that presented differences between the FTD andthe control groups. The significant spines are grouped by type of maximal projection, asthese are the varying expansions that one may choose when constructing a sentence, if onethought in terms of parse trees. By noting the differences in counts of spine types, we can

16

view trends in the specific constituent formats that are used by the subjects.

Spine Type Total count FTD Control Difference

NNS –> NP 295 169 126 43EX –> NP 75 52 23 29POS –> NP 41 27 14 13DT DEM –> NP 26 18 8 10

NNP –> NP 247 33 214 181PRP –> NP 674 304 370 66

Table 3.5.2.1: NP Spines

The data in the table contains the differing NP (noun phrase) headed paths that occurredin the data. The top three are those paths that occurred more frequently in the FTD pa-tients and the second group occurred in the controls. Controls use many more proper nouns(NNP) than the FTDs. In the context of telling a story, this may be because characterswere given names that were used throughout the narrative. On the other hand, we can sug-gest that FTD patients avoided names and rather reintroduced the characters with each newsentence, using demonstratives determiners (DT DEM), such as ’this’, ’these’ and ’that’. Ad-ditionally, FTDs used more expletive NPs, such as ’there are’, most likely for similar reasons.


VBZ –> VP –> S 53 39 14 25VBG –> VP –> S 31 24 7 17

VBD –> VP –> S 67 19 48 29

VBZ –> VP –> S –> SBAR 392 249 143 106VBG –> VP –> S –> SBAR 293 163 130 33VBP –> VP –> S –> SBAR 141 79 62 17

VBD –> VP –> S –> SBAR 321 96 225 129VB –> VP –> S –> SBAR 227 105 122 17

Table 3.5.2.2: S and SBAR Spines

The main clause and SBAR paths showed a difference in the use of tense between thesubject groups, as shown in table. FTD patients used much more progressive , simple present(VBG) whereas the normals tended to use past tense (VBD) as well. When telling the story,past tense allows the narrator to create a continuous fabric of events. A primary use of -ingverbs may reflect a separation of the events from each other and portrays to a less coherentstory line.

17


DT 1677 972 705 267CC 838 448 390 58VBZ 208 130 78 52VBP 82 48 34 14CD 46 29 17 12

NNP 46 6 40 34JJ 345 156 189 33IN 120 47 73 26VBD 149 53 96 43

Table 3.5.2.3: POS Spines

The data above shows the counts for words that did not project to a higher level, andtherefore appear as a part-of-speech tag alone. FTD patients show a higher number of de-terminers (DT), which goes along with an increased use of nouns rather than proper nouns.As appeared in the clause data, we see a decreased use of passive and an increase in presenttense verbs, whereas controls show more past tense verbs.

An interesting finding is the coordinating conjunctions (CC) which appear more fre-quently in FTDs. However, it was viewed in the data that FTD patients often begin sen-tences with ”And”, which will be tagged as a CC despite not being an actual conjunction.A possible suggestion for the result that FTD patients over-use the leading modifier ”And”is to make up for a lack of continuity in their narrative, as described above. Rather thanconnecting the utterances by subject name, and a progression of events, they may fabricate adiscourse context by using conjunction words to connect each sentence with its predecessor.

In conclusion, the path analysis seems to be very indicative of differing speech patterns.The explanation are conjectures, attempting to connect what is already known about FTDsymptoms with the observed syntax phenomena. Within the context of the experiment, thedecomposition of the parse trees into maximal projection spines seems to contribute nicelyto the understanding of the linguistic manifestations of a Soc/Exec FTD.

3.6 Data Classifier using Machine Learning

3.6.1 Machine Learning Method

An additional tool that may useful is training a classifier on data of a known source (i.e.control or FTD) and testing it on data from an unknown source. A classifier’s ability to taga new subject relies on the training data containing enough information that distinguishesbetween the groups to establish good rules for the model. The features used for the classifierwere the spine counts and detector counts discussed above. The intention is to train on alldata that is available and the machine learning algorithm will determine the weights of thesefeatures that will maximize the likelihood of a correct prediction.

This project made use of mallet, a language machine learning tool (McCallum, 2007).

18

The mallet classifier training uses an iterative, maximum entropy process to create a modelthat is most likely to make correct predictions. This is done by repeatedly testing the train-ing data, and ’tweaking’ the current classifier in the direction that will result in more correctclassifications. This continues until the change made by an additional step will be negligible.

Each subject data in the training set generates two feature vectors, one for the ’true’ caseand one for the ’false’ case. In this context, we will use ’true’ for FTD and ’false’ for normal,or control. The feature vector includes the counts for each feature, that is spine or detectortype, for that subject. The names of the features in the ’true’ vector are appended with a’ pos’, and those in the ’false’ vector are appended with a ’ neg’. This is such so the featurecan have different weights depending on whether it is contributing to a positive instance ora negative instance.

At each iteration of the training constraints are collected from the training data. Theconstraints are represented by a set of feature-count pairs. For each subject, the elements inthe feature vector that correspond to the correct classification are added to the constraintsset. For example, if the first subject is hand-labeled as an FTD, and has the feature vector{spine1 pos = 3, spine2 pos = 4}, constraints will now be: {spine1 pos = 3, spine2 pos =4}. If the next subject is also an FTD, with a feature vector {spine1 pos = 2, spine3 pos= 5}, the new constraints will be {spine1 pos = 5, spine2 pos = 4, spine3 pos = 5}. If asubject is a control, his ’ neg’ features will be added to the constraints.

Next, for each subject in the training, the current classifier calculates a confidence levelfor classifying the subject as ’true’. Let Xpos be the feature vector for the true classificationand w be the weight vector for the feature set, then the probability of ’true’ is found by thefollowing formula:

P [X = FTD] =eXpos·w

eXpos·w + eXneg ·w

The product of probabilities assigned by the current model to the correct label for eachsubject is calculated. This value, between 0 and 1, indicates the accuracy of the model’stotal predictions. If all predictions were made correctly, with probability 1, this productwould be 1. Therefore, the goal is to maximize this value. Since the product of a series ofdecimal figures gets small very fast, we use the log of the likelihoods. Due to the problem ofoverfitting, we add an additional penalty which is a product of the size of the weight vector.

The expectation vector is then found as E = P (FTD) ·Xpos +(1−P (FTD)) ·Xneg. Thisis like the expected value of an assignment to the subject. Using this vector, mallet finds thegradient on which the model needs to progress in order to approach the ideal model. Thisgradient is found by subtracting the expectation vector from the vector for the true label.The weights are then adjusted by mallet to move along this gradient.

These steps are repeated until a model is created such that an additional iteration willnot produce a more accurate model. We then call this model a classifier and can use it tocreate predictions on new subjects.

Due to the small amount of data available for training, a cross-validation method wasused. A model was trained on every subset of the data which excluded exactly one subject.

19

The model was then tested only on the excluded subject. The alternative of training on60% or 90% of the data would result in a model that is overfitted to those cases. In general,training on a small sample does not result in a competent classifier. The results presentedin the next section are the confidence levels of classification for each of the 21 subjects.

A potential problem that arises when using this method of classification is that the dataprovided to the classifier is almost exactly half FTD patients and half controls. This willlead the model to baseline probabilities of 0.5 that a person has FTD, whereas we know inthe general population this is far from the truth. However, if we consider that those beingtested are already a group selected as likely to be diagnosed with dementia, the 0.5 baselineprobability may not be as far off.

3.6.2 Analysis of Results

Subject ID Type Prediction Agreement Probability

19061 FTD Normal No 1.019106 FTD Control No 0.99919120 FTD FTD Yes 0.99919151 FTD FTD Yes 0.99919188 FTD FTD Yes 0.99919195 FTD Control No 0.98619216 FTD FTD Yes 0.99919219 FTD FTD Yes 0.99919231 FTD Control No 0.99919234 FTD FTD Yes 0.99919269 FTD FTD Yes 0.99942020 Control Control Yes 0.99942041 Control Control Yes 0.99942058 Control Control Yes 0.99942084 Control Control Yes 0.99942108 Control FTD No 0.99942113 Control Control Yes 0.95342114 Control Control Yes 0.99942135 Control FTD No 0.99942149 Control Control Yes 0.99942180 Control FTD No 0.999

Table 3.6.1: Classification Results

This table presents the labels assigned to each subject, by a model trained on all theremaining subjects. The most glaring detail about this data is the high probabilities ofassignments. For the FTD patients, the models assigned the correct tag 7 times out of 11,

20

and for the controls the model assigned correctly 7 times out of 10. For both the correct andincorrect labels the model gave probabilities close to 1. This high level of confidence is nottypical in a machine learning experiment such as this. It appears that the features that havethe most influence on the trained model were the outlying subjects. It’s then possible thatthose outliers shifted the feature counts of the model so much that the probabilities becameextreme as well.

The result of classifying two thirds of the data correctly is encouraging. Although theseresults do not provide enough confidence to be able to classify new patients with a modeltrained on this data, it does support the idea that a competent classifier can be created basedon the combination of spine and detector data if enough data is available. The successfullabels indicate a significant distinction between the two groups that were difficult to see,even when using statistical tests. However, it is very difficult to train on such a small sampleand real conclusions about a diagnostic model will need to wait until more training databecomes available.

3.7 Challenges

A primary perceived difficulty of this project comes from the sparse data. Only 21 subjectswere used for the analysis, with sample sizes of about 10. Human speech varies greatlybetween people, and a small sample size makes it difficult to draw statistically significantconclusions.

The requirement of hand-transcription makes scalability difficult, despite the fact that theremainder of the tasks are done using automated tools. As we encountered in this analysis,it is important to match the transcription method to the type of analysis being done. If thatis not possible, a uniform way of indicating information that is not relevant to the syntaxanalysis is important, so hand formatting of the data will not be necessary as it was in thepresent case.

Lastly, the structure of the experiment from which the data was gathered may havea significant effect on the phenomena that occur in the subjects’ speech. In this case,the method was to give the subjects a picture book and ask them to read it aloud asthough reading it to a child. When people are speaking to children, as well as when theyare pretending to speak to children, they tend to simplify their speech. We don’t expectchildren to follow the complex logical structures typically found in a Wall Street Journalarticle. Therefore, if future experiments are done to collect data for this type of analysis, itis important to take into consideration all confounding variable that may produce inaccurateresults.

4 Future Work

The work here can be applied to additional problems with a similar challenge, constructinga speaker profile by identifying syntactic patterns in their speech. One such case is the work

21

mentioned in the introduction by Dahl et al. (1978) regarding use of syntax analysis to eval-uate countertransference. Countertransference is a phenomenon of Freudian psychoanalysis,the analysts side of the patients transference (Frosh, 2002, pp. 99). A psychoanalysis sessionis an intense dialogue where the analyst directs the patient to introspection and to confronthis or her own unconscious thoughts and repressed memories. The process is an intenseone for the patient, and the bond with the analyst becomes emotionally charged. This pro-jection of emotions onto the psychotherapist is what Freud originally coined transference.The therapist too may develop strong feelings in response to the patient, stemming fromthe analytic process, and these constitute the countertransference. The system presentedhere can be fairly easily modified to assist in this problem. As discussed by Dahl et al., atherapist’s true feelings be present in the subconscious and syntactic expression may be away to express these emotions in a safe way. Syntax analysis may present information thatcould not be retrieved using simple hand-annotation, because of the personal biases and lackof objectiveness of the psychoanalyst.

Additionally, there are future steps to explore the topic of this paper further. The resultspresented in this paper were overall successful. The detectors on their own did not provideas much information about the specific phenomena as had been hypothesized. This couldbe a function of the choice of detectors. Perhaps a future step would include developing amore complete suite of detectors. The difficulty of a small data set challenges us yet again,since using the data to provide an idea of important structures will result in overfitting andinaccurate conclusions. If another data set is collected, detectors resulting from combing thecurrent data set can be run on the new set, and then correctly tested for significance.

The maximal projection data was very useful with raw counts. Another method wouldbe considering proportions of FTDs to controls rather than raw counts or normalizing bythe total number of counts. It would be important to see whether the results maintainedtheir significance under these circumstances.

Classification still remains somewhat unsolved. Although the two-thirds accurate classi-fications is not bad, however the probabilities of incorrect assignments were very high. Thus,the current data does not appear to be enough to train a successful model for diagnosing newsubjects. It would be interesting to see what additional data could be added to the featuresets of each subject that would aid in the training process. Again, as mentioned many times,the expectation is that more data would greatly improve these results.

5 Conclusion

This topic of this project arose from a unique blend of interests, using computational meth-ods to solve real world problems and a love for psychology and the study of people. Withoutprevious experience in linguistics, I undertook the project with the hope that through thelens of computer science, I could deepen my understanding of linguistics and psychology. Animportant aspect of the project was the use of real data. It is always easier to use data thatis prepared for the specific task, but it is impossible to recognize the difficulty of a task untilone encounters the noise and deviations that invariably occur in this type of research. The

22

ability to deal with these difficulties is an important skill that I believe was further developedthrough this project.

Additionally, this research incorporates a number of different fields within computer sci-ence. To complete the various parts of this paper, I had to learn about parsing, speechtranscribing, linguistics and machine learning. This presented a new challenge at every turn,but also provided me with exposure to research I had never encountered before. The factthat Penn is a center for computational linguistics contributed immensely to my progress,and I benefitted from those with years of experience in the field.

The research orientation was also a factor in choosing this project. The added motiva-tion of searching for something unknown pushed the project along. At each point duringdevelopment, when we encountered surprising results, or a lack of results, it was our task togenerate hypotheses to explain it. From the start it was not clear whether the results wouldshow any significant findings, however there was an excitement of discovering how to showone way or the other. The results presented here have shown that a publishable paper mayfollow this report, as they are of interest to the academic community.

I had hoped to hone skills that would allow me to translate my programming and algo-rithm skills to fields that could benefit from them outside computer science. I have enjoyedthe interaction between these separate arenas and would like to pursue the path of applyingcomputational methods to real world problems. Additionally, I learned about linguistics andparse trees and how to take advantage of the tools that have already been developed to dealwith these structures.

This project suited my interests and what I had hoped to gain out of a research experi-ence of this magnitude. I have gained many skills that will assist me if I choose to continuein this type of research.

6 Acknowledgements

I would like to extend my gratitude to my advisor, Dr. Mitch Marcus, who assisted me inevery aspect of this project. Beginning with the searching for data, through meeting me eachweek to patiently explain the challenge of the week and finally for his continuous excitementabout the project as it developed.

I would also like to thank Ryan Gabbard, a graduate student in the computer sciencedepartment who taught me everything I wanted to know about machine learning and clas-sifiers, and remained patient every time I needed a clarification.

A special thanks to Dr. Max Mintz for his help in brainstorming for project sources,directing me to Dr. Marcus, assisting with assessing the correct statistical approach to thedata analysis and general motivational support.

23

7 References

Ash, S., Moore, P., Antani, S., McCawley, G., Work, M. and Grossman, M.

(2006). Trying to tell a tale: Discourse impairments in progressive

aphasia and frontotemporal dementia. Neurology, 55:1405-1413.

This article presents an original study on patients with frontotemporal

dementia. The primary authors of the paper are PhDs and MDs in linguistics

and psychology. The article was published in Neurology, a scholarly, peer-

reviewed journal whose audience is physicians and clinical neurologists. The

publication is very recent and further work on this data and this set of studies

is still in progress. These are good indicators for the reliability and quality of

the research presented in the article.

This work provided the grounding for this project. The data used for the

syntax analysis was that which was collected by Ash, et al, in the above study.

Additionally, direct discussions with the authors in light of their work

contributed to the completeness and success of this project.

The information presented in the article can be considered objective and

reliable and was directly relevant to the idea and development of the current

research.

Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas

and Selected Topics (pp. 349-363). San Fransisco: Holden-Day.

This textbook on statistics contributed to the understanding of the non-

parametric U statistic used to evaluate results of two sample groups of which

the distribution is unknown. The book is a popular choice for mathematical

statistics. Both the co-authors are professors of statistics at the University of

California, Berkeley, both have many papers published in the area of

mathematical statistics.

Bies, A., Ferguson, M., Katz and K., MacIntyre, R. (1995) Bracketing

Guidelines for Treebank II Style Penn Treebank Project. Unpublished.

The Penn Treebank manual is a comprehensive guide to the conventions used

in representing parse trees as text. This was used as a guide to the parser

output, guiding the design of detector patterns. This work is unpublished, but

written by those involved in the Penn Treebank, and is intended for use with

Treebank parse trees.

Charniak, E. (2000). A Maximum-Entroby-Inspired Parser. Proceedings of

NAACL. Retrieved November 12, 2007, from ACL Anthology database

(NAACL). http://acl.ldc.upenn.edu/A/A00/A00-2018.pdf

This scholarly article presents a unique approach to the parsing problem. The

work was published in the proceeding of the NAACL conference, run by the

Association for Computational Linguistics, the leading professional society for

computational linguists. The author has a PhD in computer science from MIT

and is currently a professor in the computer science department at Brown

University. He has written multiple books on artificial intelligence and

computation linguistics, and is one of the most cited authors on parsing.

Despite the existence of other parsers, this is a standard choice in the field, due

to its high accuracy.

Work by Roark and Hollingshead made use of the Charniak parser for their

analysis of patients with Mild Cognitive Impairment. The MCI study in

methodology was similar in construction to that presented in this report, and

therefore understanding their choice of tools was important. Both the

Charniak and Collins parsers show high accuracy, but for the present work

Collins’ parser was chosen because it has been modified to make use of dash-

tags and null elements.

Collins, M. (1999). Head-Driven Statistical Models for Natural Language

Processing. PhD Thesis. Retrieved October 28, 2007 from

http://people.csail.mit.edu/mcollins/publications.html.

Michael Collins is highly regarded in the field in terms of parser development.

This paper was Collins’ PhD dissertation at the University of Pennsylvania.

Although the paper is not recent, and further work as been done since its

publication in terms of parser development, Collins’ parser is considered an

excellent choice in the field. Modifications have been made to the original

parser and the current version can handle dash tags and null elements, which

contribute significantly to this project.

The fact that the work was accepted as a PhD thesis, and the prominence

of the author in the field of statistical linguistics assures of the articles

reliability and objectivity. The presented technique for a statistical method for

parsing based on a top-down approach. . The methodology used is fully

explained, however it is beyond the ability of this paper to delve into these

details. The contribution of the parser to the pipeline of syntax analysis is

invaluable. Without the accuracy of this parser, any form of analysis would

have been very difficult.

Dahl, H., Teller, V., Moss, D. and Trujillo, M. (1978). Countertransference

Examples of the Syntactic Expression of Warded-Off Contents. The

Psychoanalytic Quarterly, 47:339-363.

This scholarly article was published in a long-established peer-reviewed

journal, Psychoanalytic Quarterly. The journal targets psychiatrists from

different branches of psychoanalysis. The work is not current, as it was

published 30 years ago. However, not much work has been done in the

connection between Freudian psychoanalysis and syntax, and due to the non-

technology based methodology of the study, the age does not negatively impact

the relevance of this work. The paper presents original work done by Dahl, a

psychoanalytic researcher and Teller, a linguistics professor in the department

of computer science at Hunter College. The authors are each highly regarded

in their respective fields.

The original work discussed in this paper used hand annotation and

manual syntax analysis to find patterns in speech that reveal underlying

emotional states. This paper provided the original incentive to approach the

analysis of syntax in transcribed speech of psychology patients. Teller

provided us with the data and additional data, which stood behind their

work, however due to constraints discussed in the paper, analysis of this data

was not possible.

Farmer and Grossman, (2005). Frontotemporal Dementia: An overview.

Alzheimer's Care Quarterly. 6 (3), 225-232.

This article presents a comprehensive summary of what is known about

frontotemporal dementia. This includes a decomposition to subtypes, as

described in this report, and specification of symptoms.

The article was published in a journal about which sparse information is

available. However, a primary author of the article is the neurologists who

headed the experiment which provided the data to this research. Additionally,

an additional paper of a similar character published in a peer-reviewed

journal incorporated much of the same information (see Peele and Grossman,

2007)

The unique contribution of this paper is that it is written in a less technical

manner. This made it easy to learn about the subject at the start, when first

gaining expertise on the subject of frontotemporal dementia.

Frosh, S. (2002) Key Concepts in Psychoanalysis (pp. 99-103). New York: New

York University Press.

The author of this book on psychoanalysis is a professor of psychology at

Birkbeck college at the University of London. The book was used to elucidate

the concepts brought forth in the discussion of syntax in psychoanalysis data

(see Dahl, et. al.) The information is a summary from the history of the field

rather than original work and is intended for the understanding of those not

well-read in Freudian analysis.

McCallum, A. K. (2002) "MALLET: A Machine Learning for Language

Toolkit." http://mallet.cs.umass.edu.

Mallet is an open-source tool, available under the common public license.

Work on Mallet is supported by numerous grants, including the Center for

Intelligent Information Retrieval and the NSF. Mallet is a popular toolkit

using a maximum-entropy algorithm for training and using classifiers. This

was relevant to the present work when attempting to create a model that could

classify new subjects by training a model on the current data set. Mallet is

commonly used in NLP research and was chosen here due to it’s prevalence in

research at the University of Pennsylvania.

Pang, L., Lee, L. and Vaithyanathan, S. (2002). Thumbs up? Sentiment

classification using machine learning techniques. Proceedings of the

2002 Conference on Empirical Methods in Natural Language Processing,

pp. 79-86.

This original research was presented in the context of a scholarly conference

held by SIGDAT, a sub-group of ACL, specializing in linguistic data and

corpus-based approached to language processing. The conference is

competitive and peer-reviewed and thereby reinforces the reliability of the

source.

This article presents a method of sentiment analysis, a relative of syntax

analysis. The authors provide a clear explanation of their study and conclude

with an in-depth analysis of the difficulty of the problem.

This work is important at a precedent in terms of classification by syntax

analysis. However, they attempted to classify free text such as movie reviews,

which proved to be very difficult due to the way in which people typically write

these reviews. In our work, we chose a different method to side-step some of the

difficulties encountered here. This paper contributed to the construction of the

analysis of results in this report, learning from the results presented.

Peele, J. E. and Grossman, M. (2007). Language Processing in

Frontotemporal Dementia: A Brief Review. Language and Linguistics

Compass, Volume 1.

This article provides an in-depth summary of the linguistic aspect of

frontotemporal dementia. The article was published in a peer-reviewed

journal, which focuses on reviews of current research. One of the two authors

of this paper is Dr. Grossman, a primary author on research on which this

report based itself (see Asch, el. al.). This paper is the most current work on

this subject and provides important information for understanding what is

already known about the correlations of linguistic deficits and frontotemporal

dementia. This highlights the area that this report deals with, and that is the

specific connections between social/executive frontotemporal dementia and

syntactic phenomena.

Ratnaparkhi, A. (1996). A maximum Entropy Model for Part-Of-Speech

Tagging. Retrieved November 12, 2007 from ACL Anthology database

(Workshops), http://acl.ldc.upenn.edu/W/W96/W96-0213.pdf

The author of this research was a graduate student at the University of

Pennsylvania. His work on part-of-speech tagging describes the statistical

algorithm for the tagging tool he developed. Despite the distant date of

publication, this tool is still available and is being used at Penn as well as in

other research on computation linguistics. The paper was published as part of

an ACL workshop. The tool developed in conjunction with the article provided

accurate results on the data and contributed to the ability to analyze the data

for syntactic structures.

Reynar, J. and Ratnaparkhi, A. (1997). A maximum Entropy Approach to

Identifying Sentence Boundaries. From Proceedings of the Fifth

Conference on Applied Natural Language Processing.

The authors of this research were graduate students at the University of

Pennsylvania. His work on part-of-speech tagging describes the statistical

algorithm for the tagging tool he developed. The reliability and usefulness of

this source are nearly identical to that of Ratanaprkhi, 1996. The paper was

published in the proceeding of an ACL-hosted conference.

The tool developed in conjunction with the article was not ultimately

used in the analysis of data in the current report. That is due to the

transcription style of the data, requiring manual changes to the text. However,

the tool was incorporated as part of the preparatory pipeline, and would be

available for use in future analyses.

Roark, B., Mitchell, M., Hollingshead, K. (2007). Syntactic Complexity

Measures for Detecting Mild Cognitive Impairment. Retrieved

November 12, 2007, from ACL Anthology database (Workshops),

http://acl.ldc.upenn.edu/W/W07/W07-1001.pdf.

The authors of this paper are a faculty member, PhD candidate and staff of

the Center for Spoken Language Understanding at Oregon Health & Science

University. The paper was published as part of an ACL workshop, very

recently.

This work draws many similarities to the work done in this report.

However, a crucial difference was the focus on cross validation of analyses

between the automated and the manual methods. Additionally, rather than

consider a large amount of individual syntactic phenomena, this research

used two measure which mapped performance to a scalar measure, thus

providing the ability to compare the subjects, although leaving out

information that may be important that is not captured by these measures.

The paper provided insight into the methodology involved in this type of

research. Additionally, their results showed the reliability of using automated

methods for text analysis, which gives meaning to all that is done in the

context of this report.

8 Appendix A: List of Part of Speech Tags

CC Coordinating conjunctionCD Cardinal numberDT DeterminerEX Existential thereFW Foreign wordIN Preposition or subordinating conjunctionJJ Adjective

JJR Adjective, comparativeJJS Adjective, superlativeLS List item markerMD ModalNN Noun, singular or massNNS Noun, pluralNNP Proper noun, singularNNPS Proper noun, pluralPDT PredeterminerPOS Possessive endingPRP Personal pronounPRP$ Possessive pronounRB Adverb

RBR Adverb, comparativeRBS Adverb, superlativeRP Particle

SYM SymbolTO toUH InterjectionVB Verb, base form

VBD Verb, past tenseVBG Verb, gerund or present participleVBN Verb, past participleVBP Verb, non-3rd person singular presentVBZ Verb, 3rd person singular presentWDT Wh-determinerWP Wh-pronounWP$ Possessive wh-pronounWRB Wh-adverb

As retrieved from the URL: http:www.ling.upenn.educoursesFall 2007/ling001/penn treebank pos.htmlon April 16, 2008 (due to a broken link from the Penn Treebank Project website.)

25

9 Appendix B: U Statistics for Detectors

Detector U statistic a Low rank group

Sentence Count 45.0 — —Simple Declaratives 42.5 — —Wh-questions 43.0 — —Yes No Questions 49.0 — —Fragment Tag 53.0 — —Top Level not S 26.5 0.1 CtrlLeading And 46.5 — —Leading But 29.0 0.1 FTDPassive Sentences 32.5 0.1 FTDS Conjunctions 49.5 — —NP Conjunctions 41.0 — —VP Conjunctions 26.5 0.1 FTDNN without Determiner 43.0 — —Subordinate clause 34.5 0.1 FTDIncomplete Sentence 21.0 0.05 CtrlAll Leading Modifiers 51.0 — —All questions 46.0 — —All Fragments 35.5 — —All conjunctions 52.0 — —Fragments in Complete Sentences 43.0 — —

26

10 Appendix C: Spine Analysis Data

Table 1: Spine Analysis Output

Spine Type Total count FTD Count Control Count

Total trees 985 529 456Number excluded 32 17 15

Elementary trees rooted by RBRB –> PRT 1 0 1RB –> NP 4 1 3RB –> ADVP –> S –> SBAR 4 2 2RB 235 122 113RB –> ADVP 420 217 203

Elementary trees rooted by UHUH –> NP –> S –> SBAR 1 0 1UH –> NP 3 1 2UH 6 2 4UH –> INTJ 28 10 18

Elementary trees rooted by WP$WP$ –> WHNP 1 0 1

Elementary trees rooted by CDCD –> QP 1 0 1CD –> NP 29 15 14CD 46 29 17

Elementary trees rooted by MOREMORE 3 1 2

Elementary trees rooted by JJ LIKEJJ LIKE –> ADJP 16 9 7

Elementary trees rooted by JJJJ –> ADJP –> S –> SBAR 7 1 6JJ –> NP 21 11 10JJ –> ADJP 132 53 79JJ 345 156 189

Elementary trees rooted by VBGVBG –> VP –> S –> SBAR –> S –> SBAR 1 0 1VBG –> PP 1 1 0VBG –> NP 2 1 1VBG –> VP –> S 31 24 7VBG 36 20 16VBG –> VP 45 32 13

27



VBG –> VP –> S –> SBAR 293 163 130

Elementary trees rooted by MDMD –> VP –> S –> SBAR 2 0 2MD 51 26 25

Elementary trees rooted by NNNN –> NP –> S –> SBAR 7 1 6NN 152 78 74NN –> NP 1857 1092 765

Elementary trees rooted by VBPVBP –> VP 1 1 0VBP –> VP –> S 16 12 4VBP 82 48 34VBP –> VP –> S –> SBAR 141 79 62

Elementary trees rooted by WRBWRB 1 1 0WRB –> WHNP 2 1 1WRB –> WHADVP 38 10 28

Elementary trees rooted by RB CLOSERB CLOSE –> ADJP 1 1 0

Elementary trees rooted by RPRP 1 0 1RP –> PRT 188 82 106

Elementary trees rooted by ININ –> PP –> NP 2 1 1IN 120 47 73IN –> PP 1025 567 458

Elementary trees rooted by VBDVBD –> VP 17 6 11VBD –> VP –> S 67 19 48VBD 149 53 96VBD –> VP –> S –> SBAR 321 96 225

Elementary trees rooted by TOTO –> VP –> S –> SBAR 3 2 1TO –> PP 85 38 47TO 126 59 67

Elementary trees rooted by VBNVBN –> NP 1 0 1

28



VBN 5 2 3VBN –> ADJP 10 5 5VBN –> VP –> S 12 6 6VBN –> VP 18 7 11VBN –> VP –> S –> SBAR 82 39 43

Elementary trees rooted by RBRRBR 1 0 1

Elementary trees rooted by DTDT –> QP 2 2 0DT –> NP 30 18 12DT 1677 972 705

Elementary trees rooted by VBVB –> INTJ 1 0 1VB –> NP 1 1 0VB –> VP –> S –> SBAR –> VP –> S –> SBAR 1 1 0VB –> VP 2 1 1VB –> VP –> S 4 4 0VB 28 10 18VB –> VP –> S –> SBAR 227 105 122

Elementary trees rooted by VBZVBZ –> VP –> S –> SBAR –> S –> SBAR 1 0 1VBZ –> VP 13 6 7VBZ –> VP –> S 53 39 14VBZ 208 130 78VBZ –> VP –> S –> SBAR 392 249 143

Elementary trees rooted by RB ASRB AS 3 0 3RB AS –> ADVP 4 3 1

Elementary trees rooted by NNSNNS –> VP –> S 1 1 0NNS –> VP –> S –> SBAR 1 1 0NNS 10 7 3NNS –> NP 295 169 126

Elementary trees rooted by JJSJJS –> NP 1 0 1JJS 2 2 0

Elementary trees rooted by NNP

29



NNP –> ADJP –> NP 1 0 1NNP –> NP –> S –> SBAR 2 0 2NNP 46 6 40NNP –> NP 247 33 214

Elementary trees rooted by CCCC 838 448 390

Elementary trees rooted by POSPOS –> NP 41 27 14

Elementary trees rooted by PRP$PRP$ –> NP 1 1 0PRP$ 197 102 95

Elementary trees rooted by JJRJJR 1 0 1JJR –> ADJP 4 2 2

Elementary trees rooted by PDTPDT 8 2 6

Elementary trees rooted by WPWP –> WHNP –> SBAR 1 1 0WP –> NP 1 0 1WP –> WHNP 53 25 28

Elementary trees rooted by WDTWDT –> WHNP –> WHPP 1 0 1WDT –> WHNP 2 1 1

Elementary trees rooted by EXEX –> NP 75 52 23

Elementary trees rooted by PRPPRP 2 1 1PRP –> NP –> S –> SBAR 3 1 2PRP –> NP 674 304 370

Elementary trees rooted by DT DEMDT DEM –> PP 1 1 0DT DEM –> WHNP 23 13 10DT DEM –> NP 26 18 8DT DEM 49 20 29

30

clinical psychology applications of automated syntax analysis

Documents