Member-only story
Here are 100+ Python one-liners for various natural language processing (NLP) tasks. Note that while these one-liners can be concise, readability and understanding the code’s purpose are also crucial in real-world applications.
Text Tokenization:
1. Word Tokenization:
words = nltk.word_tokenize(text)
2. Sentence Tokenization:
sentences = nltk.sent_tokenize(text)
Text Pre-processing:
3. Lowercase Conversion:
lowercase_text = text.lower()
4. Remove Punctuation:
no_punct_text = ''.join([c for c in text if c not in string.punctuation])
5. Remove Stopwords:
filtered_words = [word for word in words if word not in stopwords]
6. Remove Numbers:
text_no_numbers = ''.join([i for i in text if not i.isdigit()])
7. Remove Extra Whitespaces:
text_clean = ' '.join(text.split())