1. Bag-of-Words (BoW) Using scikit-learn:
# Import necessary libraries
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
# Assuming 'X' is a list of text documents and 'y' is their corresponding labels
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a CountVectorizer to convert text into a matrix of token counts
vectorizer = CountVectorizer()
# Transform the training and testing text into BoW features
X_train_bow = vectorizer.fit_transform(X_train)
X_test_bow = vectorizer.transform(X_test)
# Create a Multinomial Naive Bayes classifier
classifier = MultinomialNB()
# Train the classifier on the training data
classifier.fit(X_train_bow, y_train)
# Predict the labels for the testing data
y_pred = classifier.predict(X_test_bow)
# Calculate and print the accuracy of the classifier
accuracy = accuracy_score(y_test, y_pred)
print("BoW Accuracy:", accuracy)
- This code is a classic example of using a Bag-of-Words representation of text and applying a Multinomial Naive Bayes classifier for classification.