Python NLTK - Stemming
Python NLTK Stemming
Stemming is a process of cutting some of the common prefixes or suffixes that occur at the beginning or ending of a word (or stem).
Stemming is a very useful Natural Language Processing(NLP) technique that helps clean and reduce the size of input lot.
Following is a simple example, where in the second column denotes the stem of words present in the first column. Part of the word that is marked, are the suffixes, that will be removed by a stemming algorithm.
word | stem |
---|---|
studying | studi |
study | studi |
studies | studi |
To perform stemming using Python NLTK, create a PorterStemmer object and call stem() function on the object. Pass the word to the stem function(). stem() function returns the stem of the argument passed.
Example 1: NLTK Stemming
In this example, we shall perform NLTK Stemming on a list of words using stem() function and Python For Loop.
Python Program
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize
# create stemmer object
ps = PorterStemmer()
#list of words whose stem we shall find out
words = ["study", "studies", "studying", "studied"]
for w in words:
print(w, "-", ps.stem(w))
Output
study - studi
studies - studi
studying - studi
studied - studi
Summary
In this NLTK Tutorial of Python Examples, we learned how to perform NLTK Stemming.