introduction to nltk

101
Getting Started with NLTK An Introduction to NLTK Sreejith S [email protected] @tweet2sree FOSSMeet 2011,NIC Calicut 06 February 2011 Sreej ith S  Getting Started with NL TK

Upload: burim-baftijari

Post on 15-Oct-2015

53 views

Category:

Documents


1 download

DESCRIPTION

Natural Language ToolKit written in Python for natural Language Processing

TRANSCRIPT

  • Getting Started with NLTKAn Introduction to NLTK

    Sreejith [email protected]

    @tweet2sree

    FOSSMeet 2011,NIC Calicut

    06 February 2011

    Sreejith S Getting Started with NLTK

  • Just a word about me !!

    Working in Natural Language Processing (NLP), Machine Learning,Text Mining

    Active member of ilugcbe , http://ilugcbe.techstud.org

    Works for 365Media Pvt. Ltd. Coimbatore India.

    @tweet2sree , [email protected]

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer Science

    LinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguistics

    Statistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Introduction - NLP

    Natural Language Processing

    NLP is an inter-disciplinary subject

    Computer ScienceLinguisticsStatistics etc...

    NLP is a sub field of Artificial Intelligence

    NLP - Any kind of computer manipulation of natural language.

    It is a rapidly developing field of study

    Everyday applications of NLP

    Handwriting recognition,Machine translation,Question-answeringsystems,Spell checkers,Grammer checkers etc...

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open source

    Easy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to use

    ModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModular

    Well documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documented

    Simple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Natural Language Toolkit (NLTK)

    A collection of Python programs, modules, data set and tutorial tosupport research and development in Natural Language Processing(NLP)

    Written by Steven Bird, Edvard Loper and Ewan Klien

    NLTK is

    Free and Open sourceEasy to useModularWell documentedSimple and extensible

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • What You Will Learn

    How simple programs can help you manipulate and analyze languagedata, and how to write these programs

    How key concepts from NLP and linguistics are used to describe andanalyze language

    How data structures and algorithms are used in NLP

    How language data is stored in standard formats, and how data canbe used to evaluate the performance of NLP techniques

    Sreejith S Getting Started with NLTK

  • What You Will Learn

    How simple programs can help you manipulate and analyze languagedata, and how to write these programs

    How key concepts from NLP and linguistics are used to describe andanalyze language

    How data structures and algorithms are used in NLP

    How language data is stored in standard formats, and how data canbe used to evaluate the performance of NLP techniques

    Sreejith S Getting Started with NLTK

  • What You Will Learn

    How simple programs can help you manipulate and analyze languagedata, and how to write these programs

    How key concepts from NLP and linguistics are used to describe andanalyze language

    How data structures and algorithms are used in NLP

    How language data is stored in standard formats, and how data canbe used to evaluate the performance of NLP techniques

    Sreejith S Getting Started with NLTK

  • What You Will Learn

    How simple programs can help you manipulate and analyze languagedata, and how to write these programs

    How key concepts from NLP and linguistics are used to describe andanalyze language

    How data structures and algorithms are used in NLP

    How language data is stored in standard formats, and how data canbe used to evaluate the performance of NLP techniques

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zip

    Unzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .

    Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • Installation of NLTK

    Make sure that Ptyhon 2.4 or 2.5 or 2.6 is available in your system

    Install Python Tkinter package

    Install Numpy, Matplotlib, Prover9, MaltParse and MegaM

    Download NLTK and Install it

    If you are installing NLTK from source Downloadhttp://nltk.googlecode.com/files/nltk-2.0b9.zipUnzip it , It will create nltk-2.0b9 .Open terminal and cd in to this folder, Be super user , pythonsetup.py install

    To install data

    Start python interpreter

    >>> import nltk

    >>> nltk.download()

    Now you are ready to play with NLTK !!!

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • NLTK Modules

    NLTK Modules Functionality

    nltk.corpus Courpus

    nltk.tokenize,nltk.stem Tokenizers,stemmers

    nltk.collocations t-test,chi-squared,mutual-info

    nltk.tag n-gram,backoff,Brill,HMM,TnT

    nltk.classify,nltk.cluster Decision tree,Naive bayes,K-means

    nltk.chunk Regex,n-gram,named entity

    nltk.parsing Parsing

    nltk.sem,nltk.interence Semantic interpretation

    nltk.metrics Evaluation metrics

    nltk.probability Probability & Estimation

    nltk.app,nltk.chat Applications

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Let us start the game

    To access data for working out the example in the book

    Start python interpreter

    Some basic work outs from the book

    Concordance

    >>> from nltk.book import *

    >>> text1.concordance("monstrous")

    Similar

    >>> text1.similar("monstrous")

    Dispersion plot - Positional information

    >>> text4.dispersion_plot(["citizens",

    "democracy", "freedom", "duties", "America"])

    >>> text4.dispersion_plot(["and",

    "to", "of", "with", "the"])

    What is it !!! Why ???

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Continued...

    Some basic work outs from the book

    Generate

    >>> text3.generate()

    Counting Vocabulary

    >>> len(text3)

    List of distinct words ,sorted in dictionary order.

    >>> sorted(set(text3))

    Count occurrence of a particular word in a text

    >>> text3.count("and")

    What percentage of text it is taken by a specific word

    >>> 100 * text3.count("and") / len(text3)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Collocation & Bigram

    Collocation

    A collocation is a sequence of words that occur together unusually oftene.g :- red wine , strong teaBut strong computer is not a collocation

    >>> text4.collocations()

    Bigrams

    List of word pairs

    >>> text = "sreejith is talking about NLTK"

    >>> wordlist = text.split()

    >>> bigrams(wordlist)

    what will happen if i do like this

    >>> bigrams(text)

    Sreejith S Getting Started with NLTK

  • Work with our own data

    Populate our own corpora with NLTK and analyse it

    >>> from nltk.corpus import

    PlaintextCorpusReader as ptr

    >>> corpus = /home/developer/Desktop/Sreejith

    >>> wordlist = ptr(corpus,.*)

    >>> wordlist.fileids()

    Let us try to find it out how to count number of characters, wordsand sentences in the corpus

    >>> for fid in wordlist.fileids():

    print len(wordlist.raw(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.words(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.sents(fid))

    Sreejith S Getting Started with NLTK

  • Work with our own data

    Populate our own corpora with NLTK and analyse it

    >>> from nltk.corpus import

    PlaintextCorpusReader as ptr

    >>> corpus = /home/developer/Desktop/Sreejith

    >>> wordlist = ptr(corpus,.*)

    >>> wordlist.fileids()

    Let us try to find it out how to count number of characters, wordsand sentences in the corpus

    >>> for fid in wordlist.fileids():

    print len(wordlist.raw(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.words(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.sents(fid))

    Sreejith S Getting Started with NLTK

  • Work with our own data

    Populate our own corpora with NLTK and analyse it

    >>> from nltk.corpus import

    PlaintextCorpusReader as ptr

    >>> corpus = /home/developer/Desktop/Sreejith

    >>> wordlist = ptr(corpus,.*)

    >>> wordlist.fileids()

    Let us try to find it out how to count number of characters, wordsand sentences in the corpus

    >>> for fid in wordlist.fileids():

    print len(wordlist.raw(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.words(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.sents(fid))

    Sreejith S Getting Started with NLTK

  • Work with our own data

    Populate our own corpora with NLTK and analyse it

    >>> from nltk.corpus import

    PlaintextCorpusReader as ptr

    >>> corpus = /home/developer/Desktop/Sreejith

    >>> wordlist = ptr(corpus,.*)

    >>> wordlist.fileids()

    Let us try to find it out how to count number of characters, wordsand sentences in the corpus

    >>> for fid in wordlist.fileids():

    print len(wordlist.raw(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.words(fid))

    >>> for fid in wordlist.fileids():

    print len(wordlist.sents(fid))

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Continued...

    Ploting conditional frquency distribution

    >>> text = "sreejith is talking about NLTK"

    >>> words = text.split()

    >>> big = bigrams(words)

    >>> gd = nltk.ConditionalFreqDist(big)

    >>> gd.plot()

    Tabulate CFD

    >>> gd.tabulate()

    Plot frequency distribution

    >>> fdist = FreqDist(text1)

    >>> fdist.plot(50,cumulative=True)

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Stemming

    Stemming is the process for reducing inflected (or sometimes derived)words to their stem, base or root form , generally a written word form

    >>> porter = nltk.PorterStemmer()

    >>> word = running

    >>> porter.stem(word)

    >>> lancaster = nltk.LancasterStemmer()

    >>> lancaster.stem(tok[2])

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Stemming

    Stemming is the process for reducing inflected (or sometimes derived)words to their stem, base or root form , generally a written word form

    >>> porter = nltk.PorterStemmer()

    >>> word = running

    >>> porter.stem(word)

    >>> lancaster = nltk.LancasterStemmer()

    >>> lancaster.stem(tok[2])

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Stemming

    Stemming is the process for reducing inflected (or sometimes derived)words to their stem, base or root form , generally a written word form

    >>> porter = nltk.PorterStemmer()

    >>> word = running

    >>> porter.stem(word)

    >>> lancaster = nltk.LancasterStemmer()

    >>> lancaster.stem(tok[2])

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Lemmatization

    Stemming + make sure that the resulting form is a known word in adictionary

    >>> wnl = nltk.WordNetLemmatizer()

    >>> wnl.lemmatize(word)

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Lemmatization

    Stemming + make sure that the resulting form is a known word in adictionary

    >>> wnl = nltk.WordNetLemmatizer()

    >>> wnl.lemmatize(word)

    Sreejith S Getting Started with NLTK

  • Normalizing Text

    Lemmatization

    Stemming + make sure that the resulting form is a known word in adictionary

    >>> wnl = nltk.WordNetLemmatizer()

    >>> wnl.lemmatize(word)

    Sreejith S Getting Started with NLTK

  • POS Tagging

    POS Tagging

    The process of classifying words into their parts-of-speech and labelingthem accordingly is known as part-of-speech tagging, POS tagging

    >>> text = nltk.word_tokenize("we are attending

    FOSS meet at NIC calicut")

    >>> nltk.pos_tag(text)

    Sreejith S Getting Started with NLTK

  • POS Tagging

    POS Tagging

    The process of classifying words into their parts-of-speech and labelingthem accordingly is known as part-of-speech tagging, POS tagging

    >>> text = nltk.word_tokenize("we are attending

    FOSS meet at NIC calicut")

    >>> nltk.pos_tag(text)

    Sreejith S Getting Started with NLTK

  • POS Tagging

    POS Tagging

    The process of classifying words into their parts-of-speech and labelingthem accordingly is known as part-of-speech tagging, POS tagging

    >>> text = nltk.word_tokenize("we are attending

    FOSS meet at NIC calicut")

    >>> nltk.pos_tag(text)

    Sreejith S Getting Started with NLTK

  • Parsing

    Sentence Parsing

    Analyzing sentence structures and create a Parse Tree

    >>> sentence = [("the", "DT"), ("little", "JJ"),

    ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),

    ("at", "IN"), ("the", "DT"), ("cat", "NN")]

    >>> grammar = "NP: {?*}"

    >>> cp = nltk.RegexpParser(grammar)

    >>> result = cp.parse(sentence)

    >>> print result

    >>> result.draw()

    Sreejith S Getting Started with NLTK

  • Parsing

    Sentence Parsing

    Analyzing sentence structures and create a Parse Tree

    >>> sentence = [("the", "DT"), ("little", "JJ"),

    ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),

    ("at", "IN"), ("the", "DT"), ("cat", "NN")]

    >>> grammar = "NP: {?*}"

    >>> cp = nltk.RegexpParser(grammar)

    >>> result = cp.parse(sentence)

    >>> print result

    >>> result.draw()

    Sreejith S Getting Started with NLTK

  • Parsing

    Sentence Parsing

    Analyzing sentence structures and create a Parse Tree

    >>> sentence = [("the", "DT"), ("little", "JJ"),

    ("yellow", "JJ"),("dog", "NN"), ("barked", "VBD"),

    ("at", "IN"), ("the", "DT"), ("cat", "NN")]

    >>> grammar = "NP: {?*}"

    >>> cp = nltk.RegexpParser(grammar)

    >>> result = cp.parse(sentence)

    >>> print result

    >>> result.draw()

    Sreejith S Getting Started with NLTK

  • Machine Translation

    Babelizer Shell

    Translating a sentence from its source langauge to a specified language.NLTK provides babelize shell

    >>> babelize_shell()

    Babel> hello how are you?

    Babel> german

    Babel> run

    Just try Google Translator, Yahoo babelfish

    Sreejith S Getting Started with NLTK

  • Machine Translation

    Babelizer Shell

    Translating a sentence from its source langauge to a specified language.NLTK provides babelize shell

    >>> babelize_shell()

    Babel> hello how are you?

    Babel> german

    Babel> run

    Just try Google Translator, Yahoo babelfish

    Sreejith S Getting Started with NLTK

  • Machine Translation

    Babelizer Shell

    Translating a sentence from its source langauge to a specified language.NLTK provides babelize shell

    >>> babelize_shell()

    Babel> hello how are you?

    Babel> german

    Babel> run

    Just try Google Translator, Yahoo babelfish

    Sreejith S Getting Started with NLTK

  • Machine Translation

    Babelizer Shell

    Translating a sentence from its source langauge to a specified language.NLTK provides babelize shell

    >>> babelize_shell()

    Babel> hello how are you?

    Babel> german

    Babel> run

    Just try Google Translator, Yahoo babelfish

    Sreejith S Getting Started with NLTK

  • What u can do??

    Contribute to NLTK

    GSOC

    NLP Training

    Real time research

    Sreejith S Getting Started with NLTK

  • Reference

    Steven Bird, Edvard Loper and Ewan KlienNatural Language Processing with Python

    Jacob PerkinsPython Text Processing with NLTK2.0 Cookbook

    http://www.nltk.org

    Sreejith S Getting Started with NLTK

  • Questions

    Sreejith S Getting Started with NLTK

  • And finally...

    Sreejith.S

    Sreejith S Getting Started with NLTK