Initial Dataset Overview
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 product 27,900 non-null object
1 source 34,660 non-null object
2 categories 34,660 non-null object
3 date 34,621 non-null object
4 didPurchase 1 non-null float64
5 doRecommend 34,066 non-null float64
6 rating 34,627 non-null float64
7 reviews 34,658 non-null object
8 title 34,654 non-null object
Dropped Columns
date
and didPurchase
- Contain little to no meaningful data.
- Dropped due to irrelevance in current analysis scope.
source
- No missing values, but contains only a single unique value:
"Target"
.
- Does not provide additional descriptive power or segmentation utility.
- Dropped.
Column-Wise Processing
product
- Missing Values: 6,760 records with null product names.
- Reason for drop:
- Product identification is essential for recommendation and analysis.
- While inference from title/review/category using LLMs is possible, it's out of scope for this project.
- Action Taken: Dropped records with missing product names.
- Post-cleaning: 60 unique products retained.