French Bakery Sales Analysis
Using a dataset from Kaggle, PostgreSQL was used for EDA and various analyses to pull business insights and recommendations. Visualization was done in Microsoft Excel. The dataset includes 234K records of transaction details from a French bakery during the period 1/1/2021 to 9/30/2022.
Snippet of the dataset below:
Table Details
column | data type | description |
---|---|---|
date | date | date of the order |
time | time | time of the order (military time) |
ticket number | text | ticket identifier for each transaction |
article | text | name of the product (in French) |
quantity | bigint | quantity sold |
unit_price | text | price per product (in Euros) |
During EDA, I also noticed the following:
There are 148 different items (article) and 1 item is labeled “.”
The earliest time recorded was 7:01 AM and the latest time recorded was 8:01 PM.
The highest quantity was 200 and the lowest quantity was -200. Negative quantities were assumed to be refunds. The average quantity per item ordered was 1.5
There are 136,451 transactions which made a total of 362,189 gross sales and -2,201 refunds.
There were arbitrary items at 0,00 Euros. These were assumed to be freebies, redemptions, or service recoveries. There weren’t too many of these (~30 count), so I skipped the analysis.
Each item does not always have a fixed price. Some are dynamic and some were changed throughout the year(s).
Pricing Analysis
I took random items and plotted their unit price over time to understand the pricing structure. Diver’s Sandwich was on the more expensive side, so it is using a secondary axis to see the trends on the same scale. It appeared that the Baguette and Croissant both saw price increases on two occasions: February 7, 2022 and June 29, 2022. On the other hand, the Diver’s Sandwich’s pricing was sporadic and unpredictable.
The following assumptions were made:
Price increases could be due to current events and macro trends such as rising inflation and supply chain issues.
Or regardless of the market condition, perhaps the store changed course and targeted higher prices for fewer items (see second chart below)
Diver’s Sandwich could be a custom item in which customers can choose to pay more for add-ons. It could also be that there are different variations of the sandwiches and the prices are based on the market price of the day.
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
article,
date,
SUM(quantity) AS quantity,
SUM(revenue) AS revenue,
ROUND(SUM(revenue)/SUM(quantity),2) AS price
FROM US_prices
WHERE article IN ('BAGUETTE', 'CROISSANT', 'DIVERS SANDWICHS')
AND unit_price <> 0
GROUP BY 1, 2
ORDER BY article, date;
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
article,
date,
SUM(quantity) AS quantity,
SUM(revenue) AS revenue,
ROUND(SUM(revenue)/SUM(quantity),2) AS price
FROM US_prices
WHERE article IN ('BAGUETTE', 'CROISSANT', 'DIVERS SANDWICHS')
AND unit_price <> 0
GROUP BY 1, 2
ORDER BY article, date;
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
COUNT(ticket_number) AS gross_trans,
SUM(revenue) AS gross_revenue
FROM US_prices
WHERE revenue > 0
GROUP BY 1, 2
ORDER BY 1, 2;
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
COUNT(ticket_number) AS gross_trans,
SUM(revenue) AS gross_revenue
FROM US_prices
WHERE revenue > 0
GROUP BY 1, 2
ORDER BY 1, 2;
Demand Analysis
The second chart above (“Sales and Price Trends”) gave another great insight - seasonal trends. In both years, July and August saw the highest peak in the number of sales while January and February saw the lowest. For a deeper dive, I broke it down by “day of week” and by “time of day”.
Day of week (1st chart below): Aggregated results by day of week showed that the weekend (Saturday, Sunday) had the highest number of transactions while midweek (Wednesday) saw the lowest.
Time of Day (2nd chart below): Aggregated results by time of day showed that 10:15AM to 12:30PM was the peak time for transactions (>700). While this may be generally true, I needed verify that this was the case for each day of week.
Time of Day by Day of Week (3rd chart below): Each day of week generally followed the trends in the 2nd chart. Each day had its varying peaks during 10AM to 12PM while the second peak from 4PM to 6PM was consistent for all days — except for Sunday as it appears that the store closed early at 2PM.
SELECT
CASE
WHEN dow = 1 THEN 'Monday'
WHEN dow = 2 THEN 'Tuesday'
WHEN dow = 3 THEN 'Wednesday'
WHEN dow = 4 THEN 'Thursday'
WHEN dow = 5 THEN 'Friday'
WHEN dow = 6 THEN 'Saturday'
ELSE 'Sunday'
END AS weekday, gross_sales
FROM
(
SELECT
EXTRACT(ISODOW FROM date) AS dow,
COUNT(ticket_number) AS gross_trans
FROM bakery_sales
WHERE quantity > 0
GROUP BY 1
) dow_sales;
SELECT
CASE
WHEN dow = 1 THEN 'Monday'
WHEN dow = 2 THEN 'Tuesday'
WHEN dow = 3 THEN 'Wednesday'
WHEN dow = 4 THEN 'Thursday'
WHEN dow = 5 THEN 'Friday'
WHEN dow = 6 THEN 'Saturday'
ELSE 'Sunday'
END AS weekday, gross_sales
FROM
(
SELECT
EXTRACT(ISODOW FROM date) AS dow,
COUNT(ticket_number) AS gross_trans
FROM bakery_sales
WHERE quantity > 0
GROUP BY 1
) dow_sales;
SELECT
DISTINCT(time),
COUNT(ticket_number) OVER (PARTITION BY time) AS gross_trans
FROM bakery_sales
WHERE quantity > 0
ORDER BY time;
SELECT
DISTINCT(time),
COUNT(ticket_number) OVER (PARTITION BY time) AS gross_trans
FROM bakery_sales
WHERE quantity > 0
ORDER BY time;
WITH dow_sales AS
(
SELECT *,
CASE
WHEN EXTRACT(ISODOW FROM date) = 1 THEN 'Monday'
WHEN EXTRACT(ISODOW FROM date) = 2 THEN 'Tuesday'
WHEN EXTRACT(ISODOW FROM date) = 3 THEN 'Wednesday'
WHEN EXTRACT(ISODOW FROM date) = 4 THEN 'Thursday'
WHEN EXTRACT(ISODOW FROM date) = 5 THEN 'Friday'
WHEN EXTRACT(ISODOW FROM date) = 6 THEN 'Saturday'
ELSE 'Sunday'
END AS weekday
FROM bakery_sales
)
SELECT
weekday,
time,
COUNT(ticket_number) AS gross_trans
FROM dow_sales
GROUP BY 1, 2
ORDER BY 1, 2;
WITH dow_sales AS
(
SELECT *,
CASE
WHEN EXTRACT(ISODOW FROM date) = 1 THEN 'Monday'
WHEN EXTRACT(ISODOW FROM date) = 2 THEN 'Tuesday'
WHEN EXTRACT(ISODOW FROM date) = 3 THEN 'Wednesday'
WHEN EXTRACT(ISODOW FROM date) = 4 THEN 'Thursday'
WHEN EXTRACT(ISODOW FROM date) = 5 THEN 'Friday'
WHEN EXTRACT(ISODOW FROM date) = 6 THEN 'Saturday'
ELSE 'Sunday'
END AS weekday
FROM bakery_sales
)
SELECT
weekday,
time,
COUNT(ticket_number) AS gross_trans
FROM dow_sales
GROUP BY 1, 2
ORDER BY 1, 2;
Product Analysis
After analyzing the pricing structure and demand trends, I wanted to look at the impact of products and identify which ones drove the business for this bakery. I started by pulling the top ten drivers by volume of business (for the entire given period).
In the 1st chart below, we see that the Traditional Baguette generated the highest number of sales (individual purchases) compared to the rest of the items. This translated to the highest revenues as well even though a Traditional Baguette’s average price was only $1.23 (2nd chart below). On the other hand, Coupe which is the top 4 most popular item falls short in revenues; this is not surprising given the unit price is only $0.15.
Finally, I noticed that the number of unique transactions was nearly half of its sales, which indicated that customers would generally purchase 2 Traditional Baguettes per order. The same goes for Croissant and Pain Au Chocolat. Generally, the rest are 1:1.
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
article,
COUNT(ticket_number) AS transactions,
SUM(quantity) AS sales,
SUM(revenue) AS revenue
FROM US_prices
WHERE US_price > 0 AND quantity > 0
GROUP BY 1, 2, 3;
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
article,
COUNT(ticket_number) AS transactions,
SUM(quantity) AS sales,
SUM(revenue) AS revenue
FROM US_prices
WHERE US_price > 0 AND quantity > 0
GROUP BY 1, 2, 3;
Since the top items by revenue was a different list than the top items by sales, I created a different visualization to see the top 10 items by revenue in 2022 and their % growth from the previous year.
From the chart below, the store’s most popular item Traditional Baguette saw over 10% growth YoY. This is a good sign since this particular item drives significant business to the store. Almost every item saw positive growth, except for the declines from Banette and Baguette which were marginal. Additionally, Grand Far Breton saw the biggest decline of over -10%.
What stood out from this chart was Traiteur, which translates to “catering”. This item was only a top 9 revenue generator, but the % growth appeared significant since the previous year’s revenues were minimal.
Lastly, I wanted to see which item(s) were causing the refunds (negative quantities). Not surprisingly, the item that was purchased most frequently, Traditional Baguette, also had the highest number of refunds and revenue loss. Total refund was less than $1K which was minimal relative to gross sales revenues. An item like Coupe barely made a dent in revenues due to its low unit price. On the other hand, Formule Sandwich lost almost as much as Traditional Baguette did in revenues and with merely 1/6 of the number of refunds.
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
article,
COUNT(ticket_number) AS transactions,
SUM(quantity) AS refunds,
SUM(revenue) AS revenue_loss
FROM US_prices
WHERE quantity < 0
GROUP BY article;
WITH US_prices AS
(
SELECT *,
CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS US_price,
quantity * CAST(REPLACE(REVERSE(SUBSTRING(REVERSE(unit_price),3,6)),',', '.') AS NUMERIC) AS revenue
FROM bakery_sales
)
SELECT
article,
COUNT(ticket_number) AS transactions,
SUM(quantity) AS refunds,
SUM(revenue) AS revenue_loss
FROM US_prices
WHERE quantity < 0
GROUP BY article;
Recommendations
Consider dynamic pricing
This store is run by a consistent pattern of seasonal months, day of week, and time of day. To make up for the slow demand during winter months, mid-week, or late afternoon, consider running premium prices during peak times to optimize overall profit.
Run promotions
Most customers buy 1 to 1.5 of the same item in one transaction. By running something like Buy 2 Get 1 or BOGO 50% Off, customers will be inclined to add that second or third item to their order. Promotions could be selective on less popular items, during slow hours, or right before closing.
Refunds
Refunds should be the last resort for service recovery, especially since these items are perishable and not resellable. Understand what is causing the refund; for example, quality, freshness, taste, or not meeting expectations. If the first two, then revisit operations or put out a non-refundable promo rate on the older items. If the last two, consider putting out a tester plate. Instead of ending up with hundreds of refunds, you can forego 1 or 2 items to use as testers.
Marketing
Catering had seen notable success in 2022. Try promoting the store to nearby companies or use cost-effective methods like social media or website to promote catering services.