quantitative and network co-occurences analysis in literature teaching, by luca cinacchio

Upload: luke-staredsky

Post on 10-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    1/42

    Quantitative and

    Network Co-Occurrences Analysis

    in Literature TeachingPresentation at

    Mathematica UserGroup Meeting Italia 2010

    by Luca Cinacchio

    [email protected]

    Universit di Torino, Corso di Laurea in Fisica

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    2/42

    Abstract

    Literature for many high school students is a boring discipline.

    But often, what is more boring to the students, is the apparent discretion of judgment that afflicts the analysis of a

    book.

    Some clear evidence proofing the judgment, can be indeed very useful to the student, in the understanding of the

    critics.

    It's here that a quantitative analysis of the text can play a successful role.

    2 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    3/42

    Teaching Literature only with a traditional approach...

    Literature for many high school students is a boring discipline.

    But often, what is more boring to the students, is the apparent discretion of judgment that afflicts the analysis of a

    book.

    For example, analyzing a novel that narrates the story of a family during the birth of their country , a critic can say:

    The book is a celebration of the family and its values..

    And the students, overall if he has not read the book, can think: Why is the book a celebration of the family? Why is

    it not a celebration of the roots of this family's country?

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 3

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    4/42

    ...and also with a

    quantitative approach

    Some clear evidence proofing the judgment, can be indeed very useful to the student, in the understanding of the

    critics.

    It's here that a quantitative analysis of the text can play a successful role.

    For example, we can add to the judgment The book is a celebration of the family and its values some quantitative

    information like: in fact, the word 'family' is the most recurring one inside the text, and this can be a good starting

    point in helping the student to understand why that judgment has been expressed.

    4 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    5/42

    The past...

    Quantitative analysis of text were made before computers but that required a long time.

    Just to perform the simplest analysis, the ranking of the occurrences (how many times each word occurs and where)

    people had to patiently compile lots of card and annotate each occurrence.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 5

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    6/42

    ...and the present

    With the computer everything has become simpler : many different kinds of quantitative analysis can be done in just

    a few seconds (or a fraction!) relying on software dedicated to this task.

    There are many different quantitative analysis that can be performed on a text and there is a huge bibliography on the

    subject.

    6 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    7/42

    Resources

    The availability of a large number of electronic literary texts has increased the attractiveness of quantitative

    approaches: right now it is easy to look on the internet to find a good collection of major works of any classic author.

    For Italian literature a good starting can be the Progetto Manuzio at the url http://www.liberliber.it/biblioteca/.

    Here you will find a large collection of Italian Classics and the books are downloadable in different formats: plain text

    (txt), HyperText Markup Language (HTML) or Acrobat (pdf).

    For use with the utilities provided in this paper I recomend to use the txt format.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 7

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    8/42

    Playing with the text

    But what is sometimes underestimated is how this kind of approach can be helpful in the school, of course as

    integration of the most traditional one.

    What we are looking for, is a data-centric approach to novels, that is, we can utilize graphs, maps, and charts.

    Doing quantitative analysis on a text, the student can feel that they are in-charge of the text analysis. They become

    an active actor and not just be a passive subject, like when they have to blindly trust what is written inside the

    textbook of the course.

    The need for some kind of comparative norm suggests that counting more than one text will often be required and the

    nature of the research will dictate the appropriate comparison text. In some cases, other texts by the same author will

    be selected, or contemporary authors.

    Having at hand a series of tools that allow the student to quickly and easily perform different kinds of analysis, can

    end with a sort of recreational approach to the text.

    8 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    9/42

    The genesis of MathText

    There are many dedicated software for text quantitative analysis. Unfortunately, some of them that allow sophisti-

    cated analysis, like co-occurrences network, are not free.

    Two years ago, two friends of mine wanted to do some quantitative analysis on two different text: the first one on the

    full corpus of the TV series Lost, and the second one on an obscure old French text, Hypolite by Gabriel Gilbert (the

    first one is still a work in progress, the second one ended with a Tesi di Laurea at the Universit di Torino, Facolt di

    Lingue e letterature straniere ).

    So I wrote in Mathematica a collection of small utilities to do some quantitative text analysis, and I called them

    MathText.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 9

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    10/42

    What MathText can do

    Some basic operations of data cleansing

    Almost any item, feature, or characteristic of a text that can be reliably identified can be counted. Decisions about

    what to count can be obvious, problematic, or extremely difficult, and poor initial choices can lead to wasted effort

    and worthless results. Even careful planning leaves room for surprises, fortunately often of the happy sort that call for

    further or different quantification.

    For example counting and including in the analysis the articles is not very useful: we already know that in an English

    text the article "the" will be ranked first, and in some analysis like co-occurrences network it will make weird the

    graphics

    So it can be a wise choice to not include in the analysis words or symbols like:

    articles

    prepositions (simple and, for the Italian, articulated)

    punctuation

    numbers (although for some text they can be useful)

    conjunctions

    10 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    11/42

    What MathText can do (cont.)

    MathText provides basic tools to do this kind of basic data cleansing.

    Data cleansing is very basic and very weirdly written, but in this way no Mathematica dummies will be able to

    personalize this section according to their needs.

    Unfortunately a thing that MathText cannot do is the reduction of different tense of the same verb and/or different

    persons of the same verb to a common root.

    i.e. mangio and mangiavo will be considered as 2 different occurrences.

    i.e. mangio and mangiano will be considered as 2 different occurrences.

    Writing this kind of tool was beyond my skill. I know that there are some utilities accomplishing this task: I hope that

    somebody can maybe in the future implement it in a better version of MathText.

    Another thing that you must be aware of is that at the present moment MathText considers the singular and the

    plural form of a word as 2 different occurrences.

    i.e. home and homes are 2 occurrences; casa and case (Italian) are 2 occurrences.

    (Here let me open a short digression: as I told, MathTExt has been written for two friends of mine. My skill in Mathe-

    matica programming is very basic, so the result is not so professional as it could be if it was written by some

    Mathematica geek that is here today. But everybody can share it, and over all can improve it: if you do that, please,

    redistribute your improved version!)

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 11

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    12/42

    What MathText can do (cont.)

    Count of the words inside the text

    Count of the different words inside the text

    Index of ' vocabulary' s richness'

    The last one is the ratio ofCount of the different words inside the text and Count of the words inside the text.

    The maximun theoric index is 1, and it represents a text were all the words are different.

    It can be useful in comparative analysis; i.e. are all the works of this Author with roughly the same index? What is the

    index of the Author and the index of other similar Authors? And so on...

    12 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    13/42

    What MathText can do (cont.)

    A table, with all the words contained in the text, in alphabetical order, and the number of occurrences for

    each word.

    A very large output was generated. Here is a sample of it:

    ", 1, abbagliante, 2, abbaglianti, 1, abbaiamenti, 3,

    abbaiando, 3, abbaiano, 1, abbaiare, 5, abbaiava, 3, abbai, 3,

    abbandonare, 2, abbandonarla, 1, abbandonarono, 1, abbandonato, 5,

    abbandonava, 1, 9139, zampe, 5, zanne, 3, zanzare, 1,

    zanzariera, 1, zeppa, 1, zigzag, 1, zitto, 15, zucchero, 2,

    zuffolato, 1, zuffolava, 1, zuffoli, 1, zuffolo, 1, zuppa, 1

    Show Less Show More Show Full Output Set Size Limit...

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 13

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    14/42

    What MathText can do (cont.)

    A ranked table of occurrences ranking

    A very large output was generated. Here is a sample of it:

    844, si, 1, 829, non, 2, 608, tremalnaik, 3, 378, ma, 4,

    327, era, 5, 310, kammamuri, 6, 278, disse, 7, 268, tu, 8,

    268, , 9, 266, pi, 10, 249, mi, 11, 9144, 1, abbandon, 9156,

    1, abbandono, 9157, 1, abbandoni, 9158, 1, abbandoner, 9159,

    1, abbandoneremo, 9160, 1, abbandonava, 9161, 1, abbandonarono, 9162,

    1, abbandonarla, 9163, 1, abbaiano, 9164, 1, abbaglianti, 9165, 1, ", 9166

    Show Less Show More Show Full Output Set Size Limit...

    14 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    15/42

    What MathText can do (cont.)

    Co-occurrences with triples

    You select a word. MathText will split all the text in overlapping triples (units of 3 words), then will extract and present

    to you all the triples containing the selected word.

    Here an example with the word "barba":

    barba, nera, arruffata, barba, occhi, scintillanti, barba, nera, ma,

    barba, nera, occhi, barba, nerissima, folta, barba, quattro, uomini,

    barba, grigia, cav, piccola, barba, nera, nera, barba, occhi,

    lunga, barba, nera, folta, barba, nera, quarant'anni, barba, nerissima,

    mordeva, barba, quattro, mare, barba, grigia, coperto, piccola, barba,

    lunga, nera, barba, d'una, lunga, barba, feroce, folta, barba,

    statura, quarant'anni, barba, si, mordeva, barba, lupo, mare, barba

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 15

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    16/42

    What MathText can do (cont.)

    Co-occurrences with triples (cont)

    Then, it will show you a list of co-occurrences of all the words that occur with your selected word inside the triples.

    Be aware! This produces a sort of "weight" of each word occurrences related to your work. In fact if a word is directlyat the side of your word, it will be counted twice. If a word is still in the triple, but two position away from your word, it

    will be counted only once.

    The first row represent your selected word: the numeric value is again computed from the triples, and it is just how

    many time it is contained in the triples. Having your word in the first row can be useful for further computations, if you

    want to quickly identify to which word that list of list result was related to.

    Here an example with the same word "barba"

    barba, 21, nera, 8, occhi, 3, folta, 3, lunga, 3, nerissima, 2,

    quattro, 2, grigia, 2, piccola, 2, quarant'anni, 2, mordeva, 2,

    mare, 2, arruffata, 1, scintillanti, 1, ma, 1, uomini, 1, cav, 1,

    coperto, 1, d'una, 1, feroce, 1, statura, 1, si, 1, lupo, 1

    16 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    17/42

    What MathText can do: co-occurences networks

    Co-occurrence networks are the collective interconnection of terms based on their paired presence within a specified

    unit of text.

    Rules to define co-occurrence within a text corpus can be set according to desired criteria.

    The criteria that I used works as follows:

    you select a list of words.They are chosen accordingly to the hypothesis that you would like to explore.

    Let me give you an example (it's pure fantasy!). We can imagine that we are analyzing a corpus of speeches

    of a political man.

    We can start creating an occurrences ranking: what are the words that he use more often

    We discover that these words are family, nation and communist.

    Now we can use a list of these 3 words to see what are the connections linking them in the speeches of our

    political man.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 17

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    18/42

    What MathText can do: co-occurences networks (cont.)

    For each occurrence of each your word in the list, will be created a "window", or lexical unit, with a specified

    number of words existing to the left (before) and the to the right (after)

    e.g.: if you are looking for the words "range" and this words is contained in the sentence

    "It unifies a broad range of programming paradigms"

    if you choose 2 as a parameter for the window (or lexical unit) , will be created this list of therms:

    {a broad, broad range, range of, of programming}

    18 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    19/42

    What MathText can do: co-occurences networks (cont.)

    We can think to a network like this :

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 19

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    20/42

    What MathText can do: co-occurences networks (cont.)

    Now let assume that our word rangeis contained inside another one sentence of our text:

    There are many things inside range strongly connected with love

    This time we can think to have a network like this:

    20 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    21/42

    What MathText can do: co-occurences networks (cont.)

    If all our text was made of these two sentences, and our analysis was limited to the word range, the final network

    that we obtain looks like this one:

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 21

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    22/42

    What MathText can do: co-occurences networks (cont.)

    This was a really simple example. What is practically done is a little bit more complex.

    In fact, we look also for links between the words contained inside our extracted lexical units.

    So, imagine to have one more sentence in our corpus:

    [] inside broad range []

    Now the network will be like this:

    22 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    23/42

    What MathText can do: co-occurences networks (cont.)

    As you can see one more link has been added in between inside and broad.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 23

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    24/42

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    25/42

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    26/42

    MathText :the code

    26 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    27/42

    MathTExt

    Some utilities for text analysisby Luca Cinacchio - Universit di Torino - Corso di Laurea in Fisica

    Importing, data-cleansing, exporting and re-importing

    MathTExt works with any txt file in plain text. It has been tested with big texts with no problem.

    Ok, to have the file in the proper format I used a dirty trick: after the import of the file and some cleansing, I export it

    as txt and suddenly I reimport it with the option "Words".

    Data cleansing is very basic and very unelegant, but in this way also no Mathematicadummies will be able to

    personalize this section according to their needs.

    If you scroll the StringReplace list, you find inside it a section that is only (* comment *): these are string deleting

    instructions for ENGLISH LANGUAGE only!

    Be careful, since each word in the list will be erased from the original text. These were the settings used for the

    example analysis of Lost used in this notebook by my friend.

    Take care: you must setup the full path of your txt file, and also change the path of the exported-reimported file.

    temp Import"C:\\mathematicafiles\\salgarimisteri.txt" ; change the path with your own;file should be in ".txt"format.

    StringReplacetemp, "," "";StringReplace, "." " ";StringReplace, ";" "";StringReplace, "" "";StringReplace, "" "";StringReplace, "?" "";StringReplace, "" "";StringReplace, "i" "";StringReplace, "A" "a";StringReplace, "B" "b";StringReplace, "C" "c";

    StringReplace, "D" "d";StringReplace, "E" "e";StringReplace, "F" "f";

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 27

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    28/42

    StringReplace, "G" "g";StringReplace, "H" "h";StringReplace, "I" "i";StringReplace, "J" "j";StringReplace, "K" "k";StringReplace

    , "L" "l"

    ;

    StringReplace, "M" "m";StringReplace, "N" "n";StringReplace, "O" "o";StringReplace, "P" "p";StringReplace, "Q" "q";StringReplace, "R" "r";StringReplace, "S" "s";StringReplace, "T" "t";StringReplace, "U" "u";StringReplace, "V" "v";StringReplace, "W" "w";StringReplace, "X" "x";StringReplace, "Y" "y";StringReplace, "Z" "z";StringReplace, "0" "";StringReplace, "1" "";StringReplace, "2" "";StringReplace, "3" "";

    StringReplace, "4" "";StringReplace, "5" "";StringReplace, "6" "";StringReplace, "7" "";StringReplace, "8" "";StringReplace, "9" "";

    START ITALIAN SECTION

    StringReplace

    , " g l i " " "

    ;

    StringReplace, " il " " ";StringReplace, " lo " " ";

    28 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    29/42

    StringReplace, " la " " ";StringReplace, " le " " ";StringReplace, " i " " ";StringReplace, " c h e " " ";StringReplace, " a " " ";StringReplace

    , " a' " " "

    ;

    StringReplace, " di " " ";StringReplace, " da " " ";StringReplace, " in " " ";StringReplace, " c o n " " ";StringReplace, " su " " ";StringReplace, " p e r " " ";StringReplace, " t r a " " ";StringReplace, " f r a " " ";StringReplace, " d e l " " ";StringReplace, " dello " " ";StringReplace, " della " " ";StringReplace, " delle " " ";StringReplace, " degli " " ";StringReplace, " d e i " " ";StringReplace, " al " " ";StringReplace, " a l l o " " ";StringReplace, " a l l a " " ";StringReplace, " a g l i " " ";

    StringReplace, " a l l e " " ";StringReplace, " ai " " ";StringReplace, " s u l " " ";StringReplace, " sullo " " ";StringReplace, " sulla " " ";StringReplace, " sulle " " ";StringReplace, " s u i " " ";StringReplace, " sugli " " ";StringReplace, " d a l " " ";StringReplace

    , " dallo " " "

    ;

    StringReplace, " dalla " " ";StringReplace, " dalle " " ";

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 29

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    30/42

    StringReplace, " d a i " " ";StringReplace, " dagli " " ";StringReplace, " n e l " " ";StringReplace, " nello " " ";StringReplace, " nella " " ";StringReplace

    , " nelle " " "

    ;

    StringReplace, " negli " " ";StringReplace, " n e i " " ";StringReplace, " e " " ";StringReplace, " ed " " ";StringReplace, " un " " ";StringReplace, " u n a " " ";StringReplace, " u n o " " ";StringReplace, " a...a... " " ";

    END OF ITALIAN SECTION

    START ENGLISH SECTION inside the comment some string replacements

    only for ENGLISH LANGUAGE Be careful,

    since each word in the list will be erased from theoriginal text. These were the settings for the

    example analysis of Lost used in this notebook

    StringReplace, " a " " ";StringReplace, " a n " " ";StringReplace, " little " " ";StringReplace, " f e w " " ";StringReplace, " t h e " " ";StringReplace, " t h i s " " ";StringReplace

    , " these " " "

    ;

    StringReplace, " t h a t " " ";StringReplace, " those " " ";

    30 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    31/42

    StringReplace, " t h a n " " ";StringReplace, " a s " " ";StringReplace, " o n e " " ";StringReplace, " o n e s " " ";StringReplace, " m a n y " " ";StringReplace

    , " m u c h " " "

    ;

    StringReplace, " a l l " " ";StringReplace, " e a c h " " ";StringReplace, " every " " ";StringReplace, " b o t h " " ";StringReplace, " neither " " ";StringReplace, " either " " ";StringReplace, " s o m e " " ";StringReplace, " a n y " " ";StringReplace, " n o " " ";StringReplace, " n o n e " " ";StringReplace, " everyone " " ";StringReplace, " every " " ";StringReplace, " everybody " " ";StringReplace, " everything " " ";StringReplace, " e l s e " " ";StringReplace, " anybody " " ";StringReplace, " another " " ";StringReplace, " o n e " " ";

    StringReplace, " s o m e " " ";StringReplace, " w h o " " ";StringReplace, " whose " " ";StringReplace, " w h o m " " ";StringReplace, " which " " ";StringReplace, " w h a t " " ";StringReplace, " w h y " " ";StringReplace, " w h e n " " ";StringReplace, " where " " ";StringReplace

    , " h o w " " "

    ;

    StringReplace, " ' s " " ";StringReplace, " ' d " " ";

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 31

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    32/42

    StringReplace, " ' v e " " ";StringReplace, " ' r e " " ";StringReplace, " m y " " ";StringReplace, " m i n e " " ";StringReplace, " yours " " ";StringReplace

    , " y o u r " " "

    ;

    StringReplace, " y o u " " ";StringReplace, " h i s " " ";StringReplace, " h e r " " ";StringReplace, " i t s " " ";StringReplace, " h e r s " " ";StringReplace, " o u r s " " ";StringReplace, " o u r " " ";StringReplace, " theirs " " ";StringReplace, " their " " ";StringReplace, " m e " " ";StringReplace, " u s " " ";StringReplace, " w e " " ";StringReplace, " t h e y " " ";StringReplace, " t h e m " " ";StringReplace, " ' m " " ";StringReplace, " i t " " ";StringReplace, " o f " " ";StringReplace, " a t " " ";

    StringReplace, " m o s t " " ";StringReplace, " t o " " ";StringReplace, " t o o " " ";StringReplace, " f o r " " ";StringReplace, " f r o m " " ";StringReplace, " a t " " ";StringReplace, " o n " " ";StringReplace, " b y " " ";StringReplace, " before " " ";StringReplace

    , " i n " " "

    ;

    StringReplace, " since " " ";StringReplace, " during " " ";

    32 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    33/42

    StringReplace, " t i l l " " ";StringReplace, " untill " " ";StringReplace, " afterwards " " ";StringReplace, " after " " ";StringReplace, " i n t o " " ";StringReplace

    , " o n t o " " "

    ;

    StringReplace, " o f f " " ";StringReplace, " o u t " " ";StringReplace, " o u t o f " " ";StringReplace, " above " " ";StringReplace, " o v e r " " ";StringReplace, " under " " ";StringReplace, " below " " ";StringReplace, " beneath " " ";

    ENDO OF ENGLISH SECTION

    StringReplace, "" " ";StringReplace, "&" " ";StringReplace, " " " ";StringReplace, " " " ";temp StringReplace, ":" "";Export"C:\\mathematicafiles\\cleanfile.txt", temp; change with your own path

    testo

    Import"C:\\mathematicafiles\\cleanfile.txt", "Words"; if you've changed previous path

    substitute with the right one

    Occurrencies

    Execute this cell, and the results will be printed.

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 33

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    34/42

    Print"The length of the text is ",Lengthtesto, " words"

    tabellaricorrenze SortTallyFlattentesto;vettorericorrenze

    FlattenTableParttabellaricorrenze, i, 2,

    i, 1, Lengthtabellaricorrenze;tabellafrequenze TallyReverseSortvettorericorrenze;tabellafrequenze2 Transpose

    Last tabellafrequenze, First tabellafrequenze ;Print"The text contains ", Lengthtabellaricorrenze,

    " different words"Print"The text has a 'vocabulary's richness' of ",

    Lengthtabellaricorrenze Lengthtesto N ," 1 corresponds to maximum theoric index"

    Print"Here the occurrencies table. Its data arestored in the variable tabellaricorrenze"

    tabellaricorrenze

    Print"Here the frequencies table Its dataare stored in the variable tabellafrequenze"

    tabellafrequenze

    The length of the text is 49 250 words

    The text contains 9166 different words

    The text has a 'vocabulary's richness' of

    0.186112 1 corresponds to maximum theoric index

    Here the occurrencies table. Its data are stored in the variable tabellaricorrenze

    A very large output was generated. Here is a sample of it:

    ", 1, abbagliante, 2, abbaglianti, 1, abbaiamenti, 3,

    abbaiando, 3, abbaiano, 1, abbaiare, 5, abbaiava, 3,

    abbai, 3, abbandonare, 2, abbandonarla, 1, abbandonarono, 1,

    abbandonato, 5, abbandonava, 1, abbandoneremo, 1, 9136,

    zagaglia, 1, zampaccie, 1, zampe, 5, zanne, 3, zanzare, 1,

    zanzariera, 1, zeppa, 1, zigzag, 1, zitto, 15, zucchero, 2,zuffolato, 1, zuffolava, 1, zuffoli, 1, zuffolo, 1, zuppa, 1

    Show Less Show More Show Full Output Set Size Limit...

    Here the frequencies table Its data are stored in the variable tabellafrequenze

    34 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    35/42

    844, 1, 829, 1, 608, 1, 378, 1, 327, 1, 310, 1, 278, 1, 268, 2,

    266, 1, 249, 1, 248, 1, 235, 1, 231, 1, 229, 2, 215, 1, 212, 1,

    193, 1, 189, 1, 187, 1, 185, 1, 171, 1, 167, 1, 166, 1, 164, 1,

    163, 1, 161, 1, 154, 1, 148, 1, 141, 1, 140, 1, 138, 1, 137, 1,

    134, 2, 132, 1, 129, 1, 126, 1, 125, 1, 123, 1, 119, 1, 117, 3,

    114, 1, 113, 1, 109, 1, 107, 1, 105, 1, 103, 2, 100, 2, 99, 2,

    98, 3, 97, 2, 94, 2, 92, 4, 91, 1, 89, 1, 87, 1, 86, 1,

    85, 1, 84, 2, 83, 2, 82, 1, 81, 1, 80, 1, 79, 3, 78, 3, 77, 2,

    76, 5, 75, 1, 74, 2, 73, 1, 72, 2, 71, 2, 70, 2, 69, 2, 68, 2,

    67, 5, 66, 2, 65, 1, 64, 4, 63, 2, 62, 3, 60, 1, 59, 2, 58, 3,

    57, 3, 56, 2, 55, 6, 54, 1, 53, 3, 52, 1, 51, 3, 50, 2, 49, 4,

    48, 3, 47, 2, 46, 9, 45, 8, 44, 2, 43, 8, 42, 4, 41, 8, 40, 3,

    39, 6, 38, 1, 37, 7, 36, 8, 35, 9, 34, 7, 33, 11, 32, 11,

    31, 7, 30, 12, 29, 8, 28, 7, 27, 10, 26, 11, 25, 8, 24, 11,

    23, 14, 22, 21, 21, 15, 20, 20, 19, 20, 18, 27, 17, 25, 16, 36,

    15, 41, 14, 45, 13, 45, 12, 47, 11, 82, 10, 91, 9, 91, 8, 132,

    7, 140, 6, 206, 5, 310, 4, 472, 3, 695, 2, 1444, 1, 4812

    Occurrencies Ranking

    tabellaricorrenze2 TransposeLast tabellaricorrenze, First tabellaricorrenze;

    rank Tablei, i, 1, Lengthtabellaricorrenze2;tabellaricorrenze2 Transpose

    Last tabellaricorrenze, First tabellaricorrenze;tabellaricorrenze3 ReverseSorttabellaricorrenze2;tabellaricorrenze4 Table

    AppendParttabellaricorrenze3, i, i,

    i, 1, Lengthtabellaricorrenze3;Print

    "The Ranking table in the order are showed: number ofoccurrencies, word, rank. Its data areastored in the variable tabellaricorrenze4"

    tabellaricorrenze4

    The Ranking table in the order are showed: number of occurrencies,

    word, rank. Its data are astored in the variable tabellaricorrenze4

    A very large output was generated. Here is a sample of it:

    844, si, 1, 829, non, 2, 608, tremalnaik, 3, 378, ma, 4, 327, era, 5,

    310, kammamuri, 6, 278, disse, 7, 268, tu, 8, 268, , 9,

    266, pi, 10, 249, mi, 11, 248, come, 12, 235, io, 13, 9141,

    1, abbassando, 9155, 1, abbandon, 9156, 1, abbandono, 9157,

    1, abbandoni, 9158, 1, abbandoner, 9159, 1, abbandoneremo, 9160,

    1, abbandonava, 9161, 1, abbandonarono, 9162, 1, abbandonarla, 9163,

    1, abbaiano, 9164, 1, abbaglianti, 9165, 1, ", 9166

    Show Less Show More Show Full Output Set Size Limit...

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 35

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    36/42

    Co-Occurrencies for a single word

    triplette Partitiontesto, 3, 1;triplette1 ;; 11;selecttriples m_, word_ :

    JoinSelect m, MatchQ1 , word &,Select m, MatchQ2 , word &,Select m, MatchQ3 , word &

    Usage example

    Write in the following cell your words (i.e. "destiny"). Don't forget to write your word as a string, in between to the " ".

    The result will be a list of overlapping triples, each containing your selected word. These are all the triples of the text

    with your word inside.

    q selecttriplestriplette, "barba"

    barba, nera, arruffata, barba, occhi, scintillanti, barba, nera, ma,

    barba, nera, occhi, barba, nerissima, folta, barba, quattro, uomini,

    barba, grigia, cav, piccola, barba, nera, nera, barba, occhi,

    lunga, barba, nera, folta, barba, nera, quarant'anni, barba, nerissima,

    mordeva, barba, quattro, mare, barba, grigia, coperto, piccola, barba,

    lunga, nera, barba, d'una, lunga, barba, feroce, folta, barba,

    statura, quarant'anni, barba, si, mordeva, barba, lupo, mare, barba

    Executing the following cell a list of co-occurecies of al the words that occur with your selected word will be pro-

    duced.

    HOW IT WORKS: I use the triples produced in the former instruction, counting the frequency of each word. This

    produces a sort of "weight" of each word occurrencies related to your work. In fact if a word is directly at the side of

    your word, it will be counted twice. If a word is still in the triple, but two position away from your word, it will be

    counted only once.

    The first row represent your selected word: the numeric value is again computed from the triples, and it is just how

    many time it is contained in the triples. Having your word in the first row can be useful for further computations, if you

    want to quicly identify to wich word that list of list result was related to.

    ReverseSortTallyFlattenq, 12 22 &

    barba, 21, nera, 8, occhi, 3, folta, 3, lunga, 3, nerissima, 2,

    quattro, 2, grigia, 2, piccola, 2, quarant'anni, 2, mordeva, 2,

    mare, 2, arruffata, 1, scintillanti, 1, ma, 1, uomini, 1, cav, 1,coperto, 1, d'una, 1, feroce, 1, statura, 1, si, 1, lupo, 1

    36 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    37/42

    Co-occurrences table based on a list of selected words

    You can choose how many words you want. For each of them a list of co-occurrencies will be produced, and all the

    data will be aggregated in a co-occrrencies table, containing in the columns all the words co-occuring with your

    selected words, and on the rows your selected words. Crossing the two units will give you the number of co-occurren-

    cies for the couple.

    Again, for the co-occurrencies I use the triples, counting the frequency of each word. This produces a sort of

    "weighting" each word occurrencies related to your work. In fact if a word is directly at the side of your word, it will be

    counted twice. If a word is still in the triple, but two psition away from your word, it will be counted only once.

    Insert here your words (don't forget the " " ):

    vecparolescelte "famiglia", "sposa","moglie", "figlio", "figli", "figlia";

    selectparolalista_, parola_ :

    Selectlista, MatchQ2 , parola &Table

    IfMatchQselectparolatabellaricorrenze3, vecparolesceltei,, Print" WARNING MESSAGE

    One of the choosen words is not in the text. Cooccurences

    table requires that all the words are presente

    in the text. Aborting procedure."; Abort,Print"Check words passed", i, 1,Lengthvecparolescelte;

    Check words passed

    Check words passed

    Check words passed

    Check words passed

    Check words passed

    Check words passed

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 37

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    38/42

    posparolalista_, parola_ :Flatten Positionlista, parola

    selecttriples m_, word_ :JoinSelect m, MatchQ1 , word &,

    Select m, MatchQ2 , word &,Select m, MatchQ3 , word &

    righecollect ;tripletemp

    Tableselecttriplestriplette, vecparolesceltei,i, 1, Lengthvecparolescelte, 1;

    Print"The list of the 'words space'associated to your choosen words"

    listparole UnionSortFlattentripletemp

    The list of the 'words space' associated to your choosen words

    acque, ad, ada, all'ultimo, amato, ancora, andato, avuto, baleni, bengalese,

    bevanda, bravi, capitano, capriccio, chiamava, chiese, ci, cibo, colpo, comanda,

    compresi, conta, corishant, darei, dell'india, dinanzi, disse, diventar, dov',

    d'un, d'una, , ella, empio, entro, era, erro, esclam, famiglia, farebbe,

    ferma, figli, figlia, figlio, finalmente, fu, gatto, giammai, gl'indiani,

    gridando, guardava, ha, intera, inviato, io, irremovibile, jungla, kl,

    l'hai, liberi, l'indiano, lui, ma, mai, mano, me, meglio, mia, miei, minaccia,

    moglie, morire, morta, n, nome, non, notte, o, oh, ordinai, ordinate, palla,

    parlo, patria, pietrificato, piombo, poi, povera, prode, punto, pure, rapire,

    rapita, rinchiusi, ripet, rispose, ritorna, s'accorse, sacre, salve, sar,

    sarai, saremo, sar, scomparve, scompose, scorsi, sdegnosamente, se, selvaggio,

    si, s, siete, spasimo, spilla, sposa, stata, stessa, sua, suoi, suyodhana,

    taci, t'amo, thugs, tremalnaik, tu, tua, uomo, va', vago, vecchio, vostra

    38 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    39/42

    righecollect ;Clearriga1;initvaluevector Table0, i, 1, Lengthlistparole;valuevector initvaluevector;

    Tablevaluevector initvaluevector;

    Tablevaluevectorposparolalistparole,

    ReverseSortTallyFlattentripletempk, 12 22 &i, 1

    ReverseSortTallyFlattentripletempk, 12 22 &i, 2;

    riga1 valuevector, i, 1,LengthReverseSortTallyFlattentripletempk

    , 12 22 &, 1;

    righecollect Appendrighecollect, riga1,k, 1, Lengthvecparolescelte, 1;cooccurences InsertTablerighecollecti,

    i, 1, Lengthvecparolescelte, 1, listparole, 1;Print"The cooccurences table, computed

    with the requested words"cooccurences TableForm

    The cooccurences table, computed with the requested words

    acque ad ada all'ultimo amato ancora andato avuto baleni bengalese bevanda bravi c

    0 1 0 0 0 0 0 0 0 0 1 0 0

    0 0 0 2 0 0 0 0 0 0 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 0 0

    11 0 0 0 1 0 2 0 1 2 0 0 0

    0 0 0 0 0 0 0 0 0 0 0 2 0

    0 0 3 0 0 1 0 2 0 0 0 0 1

    Network of co-occurencies of the text

    Plot of co-occurencies of the text. Only words directly connected are taken in account (a word is connected with twoneighbours: the one before and the one after the word).

    Warning!!! Apply this analysis to the full text produce unreadable networks, with too many points, and quiet often an

    out of memory kernel quit. This is the reason because of you can specify a sort of "window" for the analysis, setting

    the init number word and the final number words of the text for your window.

    Each node has a "tooltip": rollovering the mouse will result on a label

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 39

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    40/42

    numini 1000; specificy the number of theinitial word from which you want to start

    numfin 1600; specificy the number of the endig

    word to which you want to stop. Far from more than

    5000 words you risk a 'out of memory' warning qsx Map "a" &, testo;qsx All, 2 InsertDroptesto, 1, Lasttesto, 1;qsx Dropqsx, 1;part Takeqsx, numini, numfin;GraphPlot part, VertexLabeling Tooltip

    Network of co-occurencies with a list of words and free length for the lexical unit

    Here a graph of co - occurencies word for your list of selected words is produced.

    For each occurrence of each your word will be create a "window", or lexical unit, with numwords existing to the left

    and numwords existing to the right.

    So, if you are looking for the words "range" and this words is contained in the sentence "It unifies a broad range of

    programming paradigms and uses its unique concept of symbolic programming", if you choose num = 2, this noes

    and links will be generated for the network: { a broad, broad range, range of, of programming}

    Tips: play with the number n, associated with strong or weak deletions of simple words (i.e. the, a, of, ...) in the data-

    cleansing section.

    With num = 1, if you delete the most ranked word, usually you get a network with separate components, and it can be

    harder to catch meaningful relationship in the text.

    Using greater num allow you to erase part of the most common words and still retain a net of relationships between

    the words.

    You can insert also a single word, still inside the { } and still inside the " " , instead a list of words

    40 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    41/42

    num 3; insert here the number of words that will beconsidered before and after each selected words

    vecparolescelte2 "figlio", "padre"; insert here your words

    selectparolalista_, parola_ :Selectlista, MatchQ2 , parola &

    TableIfMatchQselectparola

    tabellaricorrenze3, vecparolescelte2i, ,Print" WARNING MESSAGE

    One of the choosen words is not in the text. Cooccurences

    table requires that all the words are presente

    in the text. Aborting procedure."; Abort,

    Print"Check words passed", i, 1,Lengthvecparolescelte2;

    listposition

    FlattenTablePositiontesto, vecparolescelte2k,k, 1, Lengthvecparolescelte2;

    createsingolnpla posizione_, num_ :

    FlattenReverseposizione

    Rangenum, posizione, posizione Rangenum

    createallnpleposizione_, num_ :Tablecreatesingolnplai, num, i, Flattenposizione

    couples FlattenTablePartitioncreateallnplelistposition, numk, 2, 1,k, 1, Lengthlistposition, 1;

    grafdatabis TableTaketesto, couplesi, 1, couplesi, 2,i, 1, Lengthcouples;

    grafdata2bis Map1 2 &, grafdatabis;GraphPlotgrafdata2bis, VertexLabeling TooltipGraphPlotgrafdata2bis,

    VertexLabeling True, ImageSize 900

    Check words passed

    Check words passed

    cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb 41

  • 8/8/2019 Quantitative and Network Co-Occurences Analysis in Literature Teaching, by Luca Cinacchio

    42/42

    t'h

    sei irremovibile

    io

    guizzava pesciolino

    dorato

    padre

    miodiss'egli

    voce vennegalla

    prosegu

    l'indianorapido

    tu

    vergine

    pagoda

    sacra

    corishant

    rispose

    strangolatore

    narrare

    simili

    cose

    quell'infelice

    negapatnan

    giammai

    parlare

    cosa

    sono

    ada

    ebbene

    stavo

    ucciderle

    ah

    l'orribile

    trama

    finalmente

    rivedo

    aveva

    gridato

    giovanetta

    42 cinacchio_Quantitative and Network Co Occurences Analysis in Literature Teaching.nb