quita - katedra českého jazyka · quita quantitative index text analyzer miroslav kubát...

QUITA Quantitative Index Text Analyzer Miroslav Kubát Vladimír Matlach Department of General Linguistic, Palacký University, Czech Republic Acknowledgement: QUITA was supported by the student project IGA (no. FF_2013_031) of the Palacký University, Olomouc. oltk.upol.cz/software Our aim is to provide a user-friendly tool of quantitative text analysis for researchers from various disciplines (linguistics, criticism, history, sociology, psychology, politics, biology, etc.). QUITA combines all important parts of any quantitative research: obtaining results, statistical testing and graphical visualization. There is no need to use any additional software such as spreadsheet applications or special statistical programs. INDICATORS TO COMPUTE Frequency Structure indicators o Type-Token Ratio ( TTR ) o h -point ( h ) o Vocabulary Richness ( R 1 ) o Repeat Rate ( RR ) o Relative Repeat Rate of McIntosh ( RR mc ) o Hapax Legomenon Percentage ( HL ) o Lambda ( Λ ) o Gini Coefﬁcient ( G ) o Vocabulary Richness ( R 4 ) o Curve length ( L ) o Curve length Indicator ( R ) o Entropy ( H ) o Adjusted Modulus ( A ) Miscellaneous indicators o Verb Distances ( VD ) o Activity ( Q ) & Descriptivity ( D ) o Writer’s View ( α ) o Average Tokens length ( ATL ) o Thematic Concentration ( TC ) o Secondary Thematic Concentration ( STC ) TEXT-PROCESSING Pre-processing o Tokenizer (word, line, char, DNA Triplet, DNA Nucleotide) o Multilingual lemmatizer (AR, CZ, DE, DK, EN, ES, FI, FR, IT, NL, PT, RO, RU, SE) o POS Tagger (It distinguishes parts of speech in a text) Post-processing o N-grams (QUITA enables creating char, word or whatever n-grams) o Text length reduction STATISTICAL COMPARISON CREATING CHARTS

Upload: others

Post on 28-Sep-2020

8 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

QUITAQuantitative Index Text Analyzer

Miroslav KubátVladimír Matlach

Department of General Linguistic, Palacký University, Czech Republic

Acknowledgement: QUITA was supported by the student project IGA (no. FF_2013_031) of the Palacký University, Olomouc.

oltk.upol.cz/software

Our aim is to provide a user-friendly tool of quantitative text analysis for researchers from various disciplines (linguistics, criticism, history, sociology, psychology, politics, biology, etc.). QUITA combines all important parts of any quantitative research: obtaining results, statistical testing and graphical visualization. There is no need to use any additional software such as spreadsheet applications or special statistical programs.

INDICATORS TO COMPUTE

Frequency Structure indicatorso Type-Token Ratio (TTR)o h-point (h)o Vocabulary Richness (R1)o Repeat Rate (RR) o Relative Repeat Rate of McIntosh (RRmc)o Hapax Legomenon Percentage (HL)o Lambda (Λ)o GiniCoefficient(G) o Vocabulary Richness (R4) o Curve length (L)o Curve length Indicator (R)o Entropy (H)o Adjusted Modulus (A)

Miscellaneous indicatorso Verb Distances (VD)o Activity (Q) & Descriptivity (D)o Writer’s View (α)o Average Tokens length (ATL)o Thematic Concentration (TC)o Secondary Thematic Concentration (STC)

TEXT-PROCESSING

Pre-processingo Tokenizer (word, line, char, DNA Triplet, DNA Nucleotide)o Multilingual lemmatizer (AR, CZ, DE, DK, EN, ES, FI, FR, IT, NL, PT, RO, RU, SE)o POS Tagger (It distinguishes parts of speech in a text)

Post-processingo N-grams (QUITA enables creating char, word or whatever n-grams)o Text length reduction

STATISTICAL COMPARISON

CREATING CHARTS

Lo arrogante no quita lo valioso - core.ac.uk

AULA FONTÁN · 2021. 1. 14. · Y QUITA Y PON, NO ES UN BICHO DEL MAR, Y QUITA QUITA, QUE ES VOLADOR DEL SOL, Y QUITA Y PON. —¿CuáI es el animal al que si se le da la vuelta,

Quita full2

Department of Psychology, Faculty of Arts, Palacký University Olomouc

Mi Literatura Al Rincon Quita Calzon

SI DIOS ME QUITA LA VIDA

Aguascalientes - Lo Viajado Quién Te Lo Quita

Michaela Martinková Palacký University, Olomouc Czech Republic

Plan quita pañales