Fragrance Trends Exploratory Data Analysis¶
Mia M.¶
04/20/2024¶
Overview¶
The following exploratory data analysis examines a dataset sourced from a well-known fragrance database, reflecting the status as of March 10, 2024.The following analysis examines a dataset sourced from a well-known fragrance database, reflecting the status as of March 10, 2024.
- The dataset includes rankings for 1,000 unique perfumes.
- Rankings are inferred to be the result of a mix of user ratings and engagement patterns.
- Differences may be observed in comparison to other fragrance database records.
Overview of the Data Collection Methodology¶
I developed a series of Python scripts to systematically gather data on the top 1000 perfumes ranked in a well-known online database. The initial script is designed to swiftly compile essential details such as perfume names, brands, and associated URLs into a single document.
Subsequent to this, the second script processes each URL from the initial document to extract detailed information about each perfume. This script enriches the dataset with diverse attributes including perfume accords, notes, seasonal preferences, rating scores, and the number of ratings. It integrates these details into the dataset by adding new columns to the information collected by the first script. This staged approach allows for efficient updates to the perfume rankings.
A third script sorts the data regarding perfume notes and accords into distinct columns, enhancing data clarity and organization. Optionally, I can manually simplify the notes data to merge similar notes into broader categories to eliminate redundancies. Alternatively, a fourth automated script is available to perform this task.
Additionally, another script updates and rearranges the entire list to reflect the most current rankings based on perfume popularity.
I personally crafted all the scripts used in this process, and the data compiled from these efforts is proprietary and not available to the public. It is for my own personal use.
Rating Count Explained¶
- The 'Rating Count' reflects the number of times users have rated a perfume.
- The influence of automated 'bot' activities on these numbers is uncertain.
- There may be inconsistencies between this count and the number of actual reviews.
Rating Value Insight¶
The 'Rating Value' represents the aggregate score assigned to a perfume. It's worth noting that a higher score doesn't always equate to a higher ranking within the dataset.
Commencing Data Exploration¶
The initial phase of the data analysis will involve loading the dataset, seeking out any null values, and cleaning up the data as required. Once the data is prepped, I'll undertake an exploratory review to set the stage for more in-depth analysis.
import pandas as pd
# Load the dataset
df = pd.read_csv('dataset_analysis.csv')
# Display the first few rows of the dataframe to understand its structure
df.head()
| rank | perfume_name | perfume_brand | year | gender | perfumers | rating_value | rating_count | sentiments | main_accords | ... | note_teak_wood | note_aldehydes | note_almond_milk | note_mystikal | note_black_currant | note_frangipani | note_yuzu | note_wormwood | note_woodsy_notes | note_petalia | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | Angels' Share | By Kilian | 2020.0 | Unisex | Benoist Lapouza | 4.38 | 10342 | {'love': '100', 'like': '35.8974', 'ok': '12.6... | {'woody': '100', 'warm spicy': '97.3637', 'swe... | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2 | Khamrah | Lattafa Perfumes | 2022.0 | Unisex | Unlisted | 4.40 | 6669 | {'love': '100', 'like': '39.2081', 'ok': '10.9... | {'sweet': '100', 'warm spicy': '79.0738', 'amb... | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 3 | Le Male Le Parfum | Jean Paul Gaultier | 2020.0 | Men | Natalie Gracia-Cetto, Quentin Bisch | 4.59 | 10925 | {'love': '100', 'like': '24.389', 'ok': '6.433... | {'warm spicy': '100', 'vanilla': '78.7809', 'a... | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 3 | 4 | Baccarat Rouge 540 | Maison Francis Kurkdjian | 2015.0 | Unisex | Francis Kurkdjian | 3.88 | 18718 | {'love': '100', 'like': '46.4949', 'ok': '19.2... | {'woody': '100', 'amber': '91.8164', 'warm spi... | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 5 | Tobacco Vanille | Tom Ford | 2007.0 | Unisex | Olivier Gillotin | 4.24 | 20888 | {'love': '100', 'like': '49.3364', 'ok': '8.86... | {'vanilla': '100', 'sweet': '94.6943', 'tobacc... | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 498 columns
Checking for missing values and correcting missing data¶
# Check for missing values in the key columns
missing_values = df[['perfume_name', 'perfume_brand', 'year', 'gender', 'rating_value', 'rating_count']].isnull().sum()
# Summary statistics for the year, rating_value, and rating_count
year_distribution = df['year'].describe()
rating_value_distribution = df['rating_value'].describe()
rating_count_distribution = df['rating_count'].describe()
# Distribution by gender
gender_distribution = df['gender'].value_counts()
missing_values, year_distribution, rating_value_distribution, rating_count_distribution, gender_distribution
(perfume_name 0 perfume_brand 0 year 0 gender 0 rating_value 0 rating_count 0 dtype: int64, count 1000.000000 mean 2013.652000 std 12.677445 min 1792.000000 25% 2010.000000 50% 2017.000000 75% 2021.000000 max 2024.000000 Name: year, dtype: float64, count 1000.000000 mean 4.091390 std 0.229733 min 3.080000 25% 3.950000 50% 4.105000 75% 4.260000 max 4.680000 Name: rating_value, dtype: float64, count 1000.000000 mean 4599.518000 std 4467.480919 min 152.000000 25% 1631.000000 50% 3015.500000 75% 5931.250000 max 28831.000000 Name: rating_count, dtype: float64, gender Unisex 367 Women 354 Men 279 Name: count, dtype: int64)
Ratings vs. Gender Designation¶
Average Rating Value by Gender¶
- Men's perfumes have the highest average rating value at approximately 4.22.
- Unisex perfumes follow with an average rating value of about 4.10.
- Women's perfumes have a slightly lower average rating value of around 3.97.
Average Rating Count by Gender¶
- Women's perfumes have the highest average rating count at approximately 6,136, suggesting they may have a broader appeal or higher engagement.
- Men's perfumes follow with an average rating count of about 4,883.
- Unisex perfumes have the lowest average rating count at around 2,902, which might indicate a more niche market.
These findings suggest that while men's perfumes tend to receive slightly higher rating values, women's perfumes attract more ratings, possibly reflecting higher usage or popularity. The data on unisex perfumes suggests a specialized market with a potentially smaller but dedicated user base.
# Calculate the average rating value and count by gender
average_ratings_by_gender = df.groupby('gender')['rating_value'].mean()
average_count_by_gender = df.groupby('gender')['rating_count'].mean()
average_ratings_by_gender, average_count_by_gender
(gender Men 4.224158 Unisex 4.103787 Women 3.973898 Name: rating_value, dtype: float64, gender Men 4882.906810 Unisex 2901.604905 Women 6136.435028 Name: rating_count, dtype: float64)
import matplotlib.pyplot as plt
import seaborn as sns
# 1. Yearly Distribution of Perfumes
yearly_distribution = df['year'].value_counts().sort_index()
# Recalculate the yearly distribution with the corrected data
yearly_distribution_filtered = df['year'].value_counts().sort_index()
# Count the number of fragrances by release year
popularity_distribution_by_year = df['year'].value_counts().sort_index()
Visualizing the distribution of the top 1000 fragrances by year¶
import pandas as pd
import plotly.graph_objs as go
import plotly.offline as pyo
df['year'] = df['year'].astype(int)
summary_df = df.groupby('year').agg(total_ratings=('rating_count', 'sum'), count=('perfume_name', 'count')).reset_index()
summary_df['year_str'] = summary_df['year'].astype(str)
# Preparing hover text
hover_text = []
for i, row in summary_df.iterrows():
hover_text.append(f'Year: {row["year"]}<br>Number of Fragrances: {row["count"]}<br>Total Rating Count: {row["total_ratings"]}')
# Create the bar chart
fig = go.Figure(data=[
go.Bar(
x=summary_df['year_str'],
y=summary_df['count'],
text=summary_df['count'], # This will be displayed on the bar
hoverinfo='text', # Will show custom text on hover
hovertext=hover_text,
marker=dict(color=summary_df['total_ratings'], coloraxis="coloraxis")
)
])
# Color scale
fig.update_layout(coloraxis=dict(colorscale='Viridis'), title='Distribution of Top 1000 Ranked Fragrances by Release Year')
# Layout adjustments
fig.update_layout(
title_x=0.5,
xaxis=dict(
title='Year of Release',
type='category',
tickangle=-45
),
yaxis=dict(
title='Total Perfumes by Year'
),
bargap=0.1, # Adjust this for desired bar thickness
plot_bgcolor='white',
paper_bgcolor='white'
)
# Adding title above the legend
fig.update_layout(legend_title_text='Total Rating Count')
fig.show()
The Reflection of Fashion Trends in Fragrance¶
The data from our analysis highlights the interplay between fragrance and fashion. For instance, examining the distribution of top 1000 fragrances by release year reveals significant insights:
Trend Cycles: Just as fashion sees cycles of trends, with certain styles coming back into vogue, fragrances too have their moments of resurgence.
Brand Influence: High-fashion brands often release fragrances as extensions of their brand identity, contributing to the perceived lifestyle they promote. The popularity of fragrances from fashion-forward brands over time, as seen in the heatmap analysis, underscores the influence these brands wield in shaping consumer preferences not just in clothing but in lifestyle products like perfumes.
Innovation and Nostalgia: Technological advancements allow for new scent discoveries and creation techniques, mirroring the innovation seen in fashion design and materials. Simultaneously, a sense of nostalgia often influences both fashion and fragrance trends, with certain scents evoking past decades or styles.
Cultural Reflections: Fragrances, like fashion, adapt to reflect cultural shifts. The diversity of top fragrances over the years hints at changing societal values and the global fusion of scent preferences, similar to how global influences reshape fashion trends.
Visualizing the Trend¶
The bar graph showing the distribution of top 1000 fragrances by release year vividly encapsulates these points. It not only highlights the years with heightened activity, possibly correlating with significant fashion trends but also illustrates the evolving landscape of fragrance popularity. Peaks in the graph may correspond with years when fashion trends heavily influenced fragrance releases, or when iconic fragrances were launched, capturing the essence of the time.
This graph, akin to a timeline, allows us to visualize the ebbs and flows in fragrance popularity, offering a parallel to fashion trends' dynamic nature. It serves as a testament to the fragrance industry's responsiveness to changing tastes, technologies, and cultural shifts, mirroring the ever-evolving world of fashion.
Further Analysis¶
Next, we explore the dynamics between rating counts and values.
Often, perfumes with fewer reviews display extremes in ratings—either very high or very low. This phenomenon could be attributed to dissatisfied customers wanting to caution others or satisfied users eager to express their pleasure. As the number of reviews increases, these extreme values typically converge towards a more moderate average rating.
This pattern underscores the influence of consumer sentiment in shaping overall rating trends.
# Calculating the average rating value and count by release year for the top 1000 fragrances
avg_rating_value_by_year = df.groupby('year')['rating_value'].mean()
avg_rating_count_by_year = df.groupby('year')['rating_count'].mean()
# Setup for dark mode aesthetic
plt.style.use('dark_background')
plt.figure(figsize=(14, 7))
ax1 = sns.lineplot(x=avg_rating_value_by_year.index, y=avg_rating_value_by_year.values,
marker='o', label='Average Rating Value', color='purple', legend=False)
# Create a second y-axis for the count and plot with green color, suppress its automatic legend too
ax2 = plt.twinx()
sns.lineplot(x=avg_rating_count_by_year.index, y=avg_rating_count_by_year.values,
marker='o', label='Average Rating Count', color='blue', alpha=0.4, ax=ax2, legend=False)
# Setting titles and labels
plt.title('Average Rating Value and Count by Release Year for Top 1000 Fragrances', color='white') # Text color
plt.xlabel('Release Year', color='white')
ax1.set_ylabel('Average Rating Value', color='white')
ax2.set_ylabel('Average Rating Count', color='white')
# Fixing the legend to avoid duplicates and set it properly
lines, labels = ax1.get_legend_handles_labels()
lines2, labels2 = ax2.get_legend_handles_labels()
ax2.legend(lines + lines2, labels + labels2, loc='upper left', frameon=False) # Frameon=False for no legend background
# Change the tick colors
ax1.tick_params(colors='white', which='both') # Change the tick colors to white
ax2.tick_params(colors='white', which='both')
# Reset to default style
plt.style.use('default')
plt.tight_layout()
plt.show()
from scipy.stats import pearsonr
# Assuming avg_rating_value_by_year and avg_rating_count_by_year are Series with the same index
correlation_coefficient, p_value = pearsonr(avg_rating_value_by_year.values, avg_rating_count_by_year.values)
print(f"Pearson Correlation Coefficient: {correlation_coefficient}")
print(f"P-Value: {p_value}")
Pearson Correlation Coefficient: -0.4981272019674901 P-Value: 0.00019948059660849693
The analysis of the top 1000 fragrances reveals a moderate negative correlation between the average rating value and the average rating count across different release years, with a Pearson correlation coefficient of approximately -0.498. This indicates that, on average, years with higher fragrance ratings tend to have fewer rating counts, while years with more rating counts tend to have lower average ratings. The statistical significance of this correlation is supported by a p-value of 0.0002, which is well below the standard significance level of 0.05, suggesting that the observed correlation is unlikely to be due to random chance.¶
This pattern is visually evident in the overlaid line graphs, where peaks in average rating value often correspond with troughs in average rating count, and vice versa. It's important to note that while the negative correlation is statistically significant, it is not strong enough to suggest a deterministic relationship, and other factors not accounted for in this analysis may influence these variables. Consequently, further research is warranted to explore the dynamics between consumer ratings and the number of ratings, which could include examining additional variables or conducting a more granular time-series analysis.¶
Continuing the analysis¶
Brand Trends¶
- Identify which brands have been most popular in certain periods, highlighting shifts in consumer preferences or marketing successes.
New vs. Established Brands¶
- Observe if newer brands are making significant impacts in recent years compared to established ones.
Consistency in Popularity¶
- Determine if certain brands consistently appear in the top list across multiple years, indicating sustained popularity or quality.
Who releases the most fragrances and how do they rank overall?¶
I will begin by finding the perfume brands with the highest counts overall.
Then I will check to see what percentage of the pie these brands have in the top 100 perfumes.
Note: Top brands in this context are the top 10 brands with the most releases.
# Count the total number of fragrances per brand in the top 1000 list
brand_counts = df['perfume_brand'].value_counts().head(10)
# Identify the top brands for further yearly distribution analysis
top_brands = brand_counts.index.tolist()
# Filter the dataset for only top brands and count the number of fragrances per brand per year
top_brands_yearly = df[df['perfume_brand'].isin(top_brands)].groupby(['year', 'perfume_brand']).size().unstack(fill_value=0)
top_brands_yearly.tail(10) # Display the last 10 years
| perfume_brand | Chanel | Dior | Dolce&Gabbana | Giorgio Armani | Guerlain | Lattafa Perfumes | Parfums de Marly | Tom Ford | Xerjoff | Yves Saint Laurent |
|---|---|---|---|---|---|---|---|---|---|---|
| year | ||||||||||
| 2015 | 1 | 2 | 1 | 1 | 0 | 0 | 1 | 2 | 1 | 1 |
| 2016 | 2 | 2 | 0 | 2 | 1 | 2 | 1 | 1 | 0 | 1 |
| 2017 | 1 | 2 | 2 | 2 | 2 | 1 | 2 | 2 | 2 | 1 |
| 2018 | 2 | 2 | 2 | 1 | 1 | 1 | 2 | 3 | 1 | 1 |
| 2019 | 2 | 1 | 1 | 3 | 1 | 0 | 3 | 2 | 3 | 1 |
| 2020 | 1 | 2 | 3 | 3 | 1 | 3 | 2 | 4 | 0 | 1 |
| 2021 | 0 | 6 | 2 | 4 | 0 | 5 | 3 | 3 | 3 | 4 |
| 2022 | 0 | 2 | 1 | 2 | 1 | 8 | 0 | 2 | 0 | 4 |
| 2023 | 0 | 0 | 2 | 2 | 3 | 5 | 2 | 3 | 0 | 2 |
| 2024 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
The analysis of the top perfume brands in the top 1000 list over the last 10 years reveals interesting trends in brand popularity and fragrance releases:
Brand Diversity Over Time: The representation of top brands varies year by year, indicating shifts in popularity and possibly consumer preferences or successful new launches.
Emergence of Newer Brands: Brands like Lattafa Perfumes have seen a notable increase in the number of fragrances making it to the top list in recent years, peaking in 2022 with 8 fragrances. This suggests a growing popularity or successful marketing efforts.
Consistent Presence: Established brands such as Chanel, Dior, Tom Ford, and Yves Saint Laurent maintain a consistent presence in the list, though the number of their top fragrances varies from year to year. Dior, in particular, stands out with a significant presence in 2021 and 2022.
Shifts in Popularity: The fluctuating numbers for each brand from year to year could reflect the competitive nature of the fragrance industry, with consumer interests shifting towards newer or different brands over time.
Exploring the perfumes that appear the most with a heat map¶
# Visualizing the data with a heatmap for a clean overview
# Adjusting the year labels on the heatmap to display as integers
years_int = top_brands_yearly.index.astype(int).tolist()
plt.figure(figsize=(14, 8))
# Using the adjusted integer years for the x-axis labels
sns.heatmap(top_brands_yearly.transpose(), cmap="YlGnBu", annot=True, fmt="d", linewidths=.5, xticklabels=years_int)
plt.title('The distribution of top fragrances by brand across selected years')
plt.xlabel('Year')
plt.ylabel('Brand')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
Rapid Growth: Lattafa Perfumes shows a notable increase in the number of new releases over recent years, underscoring its growing influence and competitiveness in the fragrance industry.
Comparison with Established Brands: When placed alongside established brands like Chanel, Dior, Guerlain, Tom Ford, and Yves Saint Laurent, Lattafa's trajectory highlights its emergence as a significant player, capable of matching or even surpassing the activity levels of traditional powerhouses in certain years.
Market Dynamics: The visualization showcases the dynamic nature of the fragrance market, with newer brands like Lattafa making substantial inroads and challenging the status quo maintained by longer-established brands.
This visualization effectively demonstrates Lattafa's ascendancy and competitive stance within the industry, offering a compelling narrative of its success story. By highlighting the number of yearly releases, we can see Lattafa's increasing market presence and its potential to shape future fragrance trends.
Next, a pie chart will be used to visualize the presence of these brands in the top 100 ranks.¶
# Selecting a subset of iconic fashion-forward brands for the visualization
selected_brands = ['Chanel', 'Dior', 'Dolce&Gabbana', 'Giorgio Armani', 'Guerlain', 'Lattafa Perfumes', 'Parfums de Marly', 'Tom Ford', 'Xerjoff', 'Yves Saint Laurent']
# Filtering the dataset for the selected brands and counting the number of fragrances per brand per year
selected_brands_yearly = df[df['perfume_brand'].isin(selected_brands)].groupby(['year', 'perfume_brand']).size().unstack(fill_value=0)
# Ensure df and selected_brands are defined and valid
# Filter the dataset to include only the top 100 fragrances
top_100_df = df[df['rank'] <= 100]
# Count the number of top 100 placements for the selected iconic fashion brands and sort alphabetically
top_100_counts_alphabetical = top_100_df[top_100_df['perfume_brand'].isin(selected_brands)]['perfume_brand'].value_counts().sort_index()
# Generating a color palette with Seaborn
color_palette = sns.color_palette("Paired", len(top_100_counts_alphabetical))
# Generating the pie chart with the color palette applied alphabetically
plt.figure(figsize=(10, 8))
plt.pie(top_100_counts_alphabetical, labels=top_100_counts_alphabetical.index, startangle=100, autopct='%1.1f%%', colors=color_palette)
plt.title('Presence in Top 100 Fragrances List by Brand')
plt.tight_layout()
plt.show()
Trends in Fragrance Brand Popularity¶
A Shift in Preference Away from Guerlain¶
It's noteworthy to observe the shift in consumer preferences as evidenced by the absence of Guerlain from the top 100 ranked perfumes list, despite a history of frequent releases. Guerlain, once celebrated for creating timeless scents like the renowned Shalimar and Spiritueuse Double Vanille, seems to have experienced a decline in its appeal to the modern fragrance aficionado.
Potential Reasons Behind the Changing Trends¶
Several hypotheses could explain this trend. It's conceivable that the brand's classic appeal may not resonate as strongly with today's market, which often seeks innovation and novelty. Alternatively, the brand may still enjoy prestige and a loyal following in niche circles or among audiences with a preference for vintage styles, potentially reflected in other databases with a demographic that favors such classic fragrances.
Upcoming Analysis on Guerlain¶
Intriguing as these observations are, they prompt a more in-depth analysis of Guerlain's current market position. Before diving into this, however, a closer examination of Lattafa—a brand that has made notable strides in recent rankings—is warranted.
import plotly.express as px
# Filtering the dataset for Lattafa perfumes
lattafa_perfumes = df[df['perfume_brand'] == 'Lattafa Perfumes']
# Plotly scatter plot
fig = px.scatter(
lattafa_perfumes,
x='year',
y='rating_value',
size='rating_count',
color='rating_value',
hover_name='perfume_name', # This will show the perfume name
hover_data=['rank'], # Add 'rank' to the hover tooltip
color_continuous_scale=px.colors.diverging.PiYG, # Using a diverging color scale
title='Lattafa Perfumes: Current Rating Value and Count Per Release',
labels={'year': 'Year of Release', 'rating_value': 'Rating Value', 'rating_count': 'Rating Count', 'rank': 'Rank'},
height=600, # Height of the figure
width=1000, # Width of the figure
template='plotly_dark' # Using a dark theme that inverts the typical color scheme
)
fig.update_layout(
title={
'y':0.9,
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
},
hovermode='closest',
legend_title_text='Rating Value',
legend=dict(
yanchor="top",
y=0.99,
xanchor="left",
x=0.01,
bgcolor='rgba(50, 50, 50, 0.5)' # Semi-transparent legend background
)
)
fig.show()
Observing Lattafa's presence and performance within the fragrance market reveals several notable points about the brand's journey and its reception among consumers. Lattafa Perfumes, originally rooted in offering fragrances inspired by the rich traditions of the Middle East, has increasingly captured the attention of a global audience, as evidenced by its representation in fragrance databases and consumer discussions.
Rising Popularity¶
Lattafa's growth trajectory is highlighted by the increasing number of its fragrances that have garnered attention over time. The scatter plot showing rating values over the years, with points sized by rating count, illustrates not only an upward trend in the number of releases but also an increase in consumer engagement and approval. Larger points in recent years indicate fragrances that have both high ratings and significant user interaction, signaling growing popularity and a positive reception.
Quality and Engagement¶
The visual representation of Lattafa's fragrances, differentiated by rating value and count, showcases a brand that consistently delivers quality as perceived by consumers, with several fragrances achieving high ratings. The variation in point sizes across different releases also reflects the diverse levels of engagement among the audience, with certain scents sparking more discussions and feedback. This diversity suggests Lattafa's ability to cater to a wide range of preferences and olfactory tastes, contributing to its rising profile in the global fragrance community.
Market Position and Consumer Perception¶
Comparing Lattafa with iconic fashion brands within the top 100 list emphasizes its competitive stance in the industry. Even as a relatively newer entrant, Lattafa has managed to secure a place among established names, a testament to its growing influence and appeal.
Conclusion¶
Its increasing presence in discussions and top lists reflects a brand on the rise, poised to become a significant player in the global fragrance scene.
Lattafa's journey underscores the importance of market understanding in building a brand that resonates with consumers worldwide. As it continues to release fragrances that capture the imagination of users, its path forward will be interesting to watch for industry observers and fragrance enthusiasts alike.