Natural Language Toolkit Guide
Modules and Submodules of NLTK

Natural Language Toolkit Guide

Introduction

It is a platform for building program to work with human language data. Vision behind the introduction of Natural language libraries to interact with human conversation and respond back in his/her understandable form. There are many natural language processing libraries available such as SpaCy, Gensim, Pattern, TextBlob and PolyGlot and Natural Language Toolkit. This article is all about guiding readers with NLTK, its modules and sub modules in an effective and visualized way. Text mining python is one of the area where research is very effective and such is widely used in sentimental analysis. Please see the different types of libraries which comes under NLP

NLP Libraries
NLP Libraries

Superficial Introduction to NLTK

This library is used for Tokenization, chunking, parsing, semantic reasoning, tagging, classification. You can learn these terms from one of my post whose link I am providing hereI will also explain these terms here.

  • Tokenization (nltk tokenize)Tokenization is the act of breaking up a sequence of strings into pieces such as words, keywords, phrases, symbols and other elements called tokens.

[codesyntax lang=”python” container=”pre_table” strict=”yes”]

def tokenization(text):
    token = nltk.word_tokenize(text)
    print("Please see the output of tokenization")
    print (token)
#Calling the function tokenization using the following
output = tokenization("Welcome to AI Sangam.")

[/codesyntax]

  • Chunking: Chunking means dividing the bigger problem in small piece. Next step would be learning the pattern in each piece and combining all the piece at the end. This is effective means of learning.
  • Parsing: It is the process of breaking the data block into smaller chunks.
  • Semantic Reasoning: It refers to inferring logical consequences from a set of asserted facts.
  • Tagging:It refers to dividing the words into parts of speech.
    [codesyntax lang=”python” lines=”normal” container=”div”]

    def tokenization(text):
       token = nltk.word_tokenize(text)
       print("Please see the output of tokenization")
       print (token)
       tag = nltk.pos_tag(token)
       print("Please see the output of tagging at this step")
       print(tag)

    [/codesyntax]

    Let us start with explaining some of the modules of NLTK which will give you overview of such library. Along with that you can also visualize its sub-modules in the form of images.

  • ntlk.chat: A class for simple chatbots. These perform simple pattern matching on sentences typed by users, and respond with automatically generated sentences.
NLTK Chat Module
NLTK Chat Module
  • ntlk.chunk module: This is very important module because chunking helps anyone to remember the things in best possible way. Chunking means splitting the information in different parts and tries to find the pattern in each part and after learning each part separately combine the result at the end. This is the best and effective way of learning.
NTLK Chunk Module
NTLK Chunk Module
  • nltk.cluster: Considering two different types of approaches for machine learning, these are classified as learning with annotated data and learning with unannotated data.  Annotated data is difficult and expensive to obtain in the quantities required. Such problem is common to most natural language processing tasks, thus fueling the need for quality unsupervised approaches. Clustering contains many clusterung algorithm such as k-means, E-M and Group Average Agglomerative Clusterer.
Algorithms under nltk.cluster
Algorithms under nltk.cluster

Conclusion: Natural Language Processing is an effective way of communicating with human language. I have shown above some glimpse of NLP using python. I focused on NLTK library which is a part of NLP Libraries. In this post you have seen some of the modules which comes under the NLTK (not all). Please see the feature image to get the look at different types of modules available with NLTK. Tokenization, Classification, Stemming, Tagging, Parsing, Semantic Reasoning, N-grams, frequency distribution, conditional frequency distribtution, words counts and finding the most common words in the text file. Hope you like the post and gather some information to get started with NLTK.

You can also view one of the applications made by AI Sangam using Natural Language Toolkit (NLTK) entitled as Sentimental Analysis on E-commerce Products Review System. For more business and services related to AI, Machine learning and web integration please look at our main website www.aisangam.com. You may also follow us at different Social Media Sites whose link has been provided in the main website www.aisangam.com. Thanks for sparing some time for reading this post.

 

 

 

 

This Post Has 2 Comments

  1. Aradhana

    I was running the model for chatbot and got the following error. Please help me out how to resolve it

    OSError: [E050] Can’t find model ‘en’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.

    1. AISangam

      Hello Aradhana,

      Hope you are fine and enjoying good health. This error because the package er is not downloaded so below solution will work for you

      python/python3 -m spacy download en
      

Leave a Reply