Column: product
- Cleaned by:
- Removing trailing spaces.
- Stripping extra internal spaces.
- Eliminating special characters.
Column: feedback
(New Column)
- Created by combining the
title
and reviews_feedback
columns.
- This unified text serves as input for the sentiment analysis engine to evaluate the degree of positivity or negativity in customer reviews.
Column: categories
- The original
category
column was highly inconsistent and nested. The following normalization steps were applied:
- Standardization:
- All category values converted to lowercase.
- Whitespace trimmed from both ends.
- Noise Removal:
- Identified and removed noisy or irrelevant patterns such as:
"retail brand"
, "walmart"
, "target"
, "mazon.co.uk"
.
- Category Unification:
- Grouped semantically similar but differently named categories using a mapping dictionary.
- Examples:
"phones"
, "mobile phones"
, "cellular phones"
→ phone
- Similar logic applied for other groups (e.g.,
tablet
, charger
, etc.)
Prediction & Output
- Saved results to
feature_engineering.csv
.