Overview
This module prepares product review data for sentiment analysis by cleaning key columns, generating features, and training a Logistic Regression model using TF-IDF.
Product Column Cleanup
- Removed non-alphanumeric characters.
- Trimmed and normalized whitespace.
Feedback Column Creation
- Combined
title
and reviews feedback
into a new feedback
column.
- Handled missing values gracefully.
Normalized Categories
- Lowercased and trimmed all category entries.
- Removed noise (e.g., "walmart", "retail brand").
- Mapped similar terms to a unified label (e.g., "mobile phones" → "phones").
Sentiment Labeling
- Ratings ≥ 4 → Positive (1)
- Ratings < 4 → Negative (-1)