Python NLTK - Stemming


Python NLTK Stemming

Stemming is a process of cutting some of the common prefixes or suffixes that occur at the beginning or ending of a word (or stem).

Stemming is a very useful Natural Language Processing(NLP) technique that helps clean and reduce the size of input lot.

Following is a simple example, where in the second column denotes the stem of words present in the first column. Part of the word that is marked, are the suffixes, that will be removed by a stemming algorithm.

wordstem
studyingstudi
studystudi
studiesstudi

To perform stemming using Python NLTK, create a PorterStemmer object and call stem() function on the object. Pass the word to the stem function(). stem() function returns the stem of the argument passed.

Example 1: NLTK Stemming

In this example, we shall perform NLTK Stemming on a list of words using stem() function and Python For Loop.

Python Program

from nltk.stem import PorterStemmer 
from nltk.tokenize import word_tokenize 

# create stemmer object 
ps = PorterStemmer() 

#list of words whose stem we shall find out
words = ["study", "studies", "studying", "studied"] 

for w in words: 
    print(w, "-", ps.stem(w))

Output

study - studi
studies - studi
studying - studi
studied - studi

Summary

In this NLTK Tutorial of Python Examples, we learned how to perform NLTK Stemming.