Starbucks’ Capstone Challenge

ayman Abd Elsattar metwally
9 min readMay 16, 2021

who is engaged in coffee offers?

It’s a capstone project of the data scientist course at Udacity. The data set contains simulated data that mimics customer behaviour on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. A customer who received the proposal would be rewarded when his/her accumulated consumption was over the designed threshold. The data set includes three files, they are:

  • portfolio recorded offer types
  • profile, customer profiles
  • transcript, recorded when the person received the offer, viewed the request, completed the offer and consumption.

It intends to figure out how customers respond to these advertisements and what kind of person has most responsive to each type of offer.

Problem Statement / Metrics

The problem that I chose to solve is to build a model that predicts whether a customer will respond to an offer. My strategy for solving this problem has the following steps.

1- clean and combine the offer portfolio, customer profile, and transaction data. Each row of this combined dataset will describe an offer’s attributes, customer demographic data, and whether the offer was successful.

2- apply a different machine learning model and assess additional accuracy and F1-score. The ‘naive model’ assumes all offers were successful. This provides me with a baseline for evaluating the performance of models that I construct. Accuracy measures how well a model correctly predicts whether an offer is successful.

However, if the percentage of successful or unsuccessful offers is deficient, accuracy is not a good measure of model performance. For this situation, evaluating a model’s precision and recall provides better insight into its account. Hence, I chose the F1-score metric because it is “a weighted average of the precision and recall metrics”.

3-, compare the performance of logistic regression, random forest, and gradient boosting models. And finally, refine the parameters of the model that has the highest accuracy and F1 score.

Data exploring and clean

There are 3 types of offers, BOGO, discount and information sent to customers through mobile, email, social and web. Each request has a different difficulty and reward. For example, a user could receive a discount offer buy 10 dollars get 2 off. Thereafter, if the customer accumulates at least 10 dollars in purchases, the offer is completed, and a reward will be given. However, the information is slightly different. This type of offer has no reward, so we wouldn’t find any records of the offer was completed in the transcript file.

Data Clean

  1. Extract values from the values dictionary for the file of the transcript.
  2. Replace ids including custom ids and offer ids to easier recognizable ids.
  3. Separate the file of the transcript into four tables according to the four types of events.
  4. Convert numerical data of age and became_member_on in the file of profile to category data.

As we know, when the amount of accumulated purchases exceeds the threshold, it will be recorded offer completed. However, we can’t know if the customer purchased the coffee because of the advertisement or just naturally. Maybe we can set criteria to justify who responds to the ad.

Data pre-processing

A customer who received an offer viewed the offer and purchased the amount to the threshold in time should be seen as a person who responds to one of three types of offer.

Then I merge the offer\_received, offer\_viewed, and offer\_completed tables separated from the transcript on the person and offer ID by inner method and name the new table response. Then add the variable of duration from the portfolio table according to offer ids to ensure that every record satisfies the criterion.

The way to identify the person who received offers is informational is slightly different. Due to no offer completed records, another criterion was settled down. A person who received and viewed an informational offer with several purchases greater than zero during the offer received plus duration infers that the person is responsive to the offer.

Exploring Data Analysis

1,973 records are matching the criteria. The overall response rate is 2.59%. Discount offer has the highest response rate of 3.18%, while an informational offer has the lowest rate of 1.91%, and the rate of BOGO is 2.32%.

Customers treat these offers similarly on each age level except young people. They are like a discount, especially for older customers.

3 years members are more likely to be affected by discounts and BOGO. Maybe they are more familiar with the rule of promotion. However, it’s strange why the earliest members don’t care about advertising so much. Maybe the reason for the small sample. But they are more likely to be affected by the information.

Customers treat these offers similarly on each age level except young people. They are like a discount, especially for older customers.

3 years members are more likely to be affected by discounts and BOGO. Maybe they are more familiar with the rule of promotion. However, it’s strange why the earliest members don’t care about advertising so much. Maybe the reason for the small sample. But they are more likely to be affected by the information.

A person with a higher Income intends to shop more if they receive promotional offers. Of course, a rich person doesn’t care about promotions. Why is a lower-income person less influenced by the promotion from Starbucks? One reasonable explanation is that they can’t afford to shop at Starbucks more frequently as only 13% of customers whose Incomes are lower than 40k per year.

Q2: How much someone will spend based on demographics and offer types?

Affordable Income should have a higher positive correlation with consumption. In fact, the correlation between income and the average purchase amount is 0.80 in this given data set. Therefore, it could easily conclude that someone with a high income will purchase more Starbucks on average.

However, that result is not what I intend to know. I’m more careful about if other factors would positively contribute to Starbucks' consumption.

Data pre-processing

The endogenous variable is aggregated purchases, and the exogenous variables are demographics and offer types factors.

The file of the profile provides customers and their demographic factors. Offer types factors come from the response table. This advertisement is only meant for the person who has responded. The aggregate amount of purchases are driven from the transaction table by using a subtotal on the person.

Other variables are categorical data. They will be transformed into dummy variables.

Data processing

  1. The table of received_offer is merged to the table of profile on person id with the left method to identify the type of offer that the identity customer received.
  2. Eliminate the ids that they hadn’t received any offers.
  3. Calculate the sum of the number of transactions grouped by each person, and then join into a table of profiles on person id with left.
  4. Convert the categorical data to a dummy variable.
  5. Split datum into train and test set.

Metrics and Model Selection

Supervised machine learning models are selected to solve this question. Since the aggregated amount of purchases is continuous data, three regression models are chosen as candidates. They are multifactor linear regression model, random forest regression and support vector regression.

Three metrics are con:

* R square, measures how much the predicted values explain the actual values.

* Overfitting, the difference of R square between the train set and the test set.

* Error distribution for the Linear Regression model.

Results

Our predictive model is doing good as it has meagre chances of missing an individual who would respond. As Starbucks would not like to miss sending offers to individuals who would respond to offers, this model would work fine in this case and by using this model, Starbucks would not miss sending offers by the great extent to individuals who would respond to offers, and therefore, overall business revenue would not get affected. Also, Starbucks would not mind sending offers to a few individuals who would not respond if Starbucks can make sure they have covered up the individuals who would respond to offers to a great extent. Therefore, our predictive model is well suited for this case.

Improvement

Due to the amount and Income being highly positively skewed, it should be better to transform them to close normal distribution.

Moreover, outliers ought to be eliminated, and the data set should be scaled.

R squares are slightly better than the original models, especially for the test data set; inferring less overfitting while under-fitting is still challenging. Furthermore, the distribution of errors of the Multifactor linear regression model is more close to the normal distribution with 0.3 skewness and 2.8 kurtoses. Due to similar metrics and ease to understand, the linear model is selected for analysis.

The most interesting aspect of this project that I really liked was how different data sets, i.e. offer data, customer demographic data, and transcript data were combined to gain insights using predictive modelling techniques and analysis to provide better business decisions and value. The toughest part of this entire analysis was finding logic and strategy to make a combined dataset based on the duration of the offer when it was active for customers.

Parameters analysis

Although R2 isn’t good enough, the parameters of the factors driven by the model present more valuable information.

  • Factors of Income, gender, become_on_member and offer_types are all significantly different to zero, whereas ages are not.
  • All offer types based factors have positive contributions. That means that advertising can help to raise revenue for Starbucks.
  • Positive parameter value infers females are more likely to spend than males.

I am looking insight into the demographic factors further.

  • A person whose age is greater than 30 and less than 80 is more likely to consume Starbucks. That makes sense because they have Income. Thus, age doesn’t matter.
  • Similar to the result of visualizing analysis above, customers who registered members in 2015 contributed more consumption.

Offer types based factors tell us:

  • BOGO and discounts stimulate consumption more than informational advertising.
  • Longer duration records more purchases. That raises another question if the criteria to justify customers who respond to these offers are too rigorous. But I don’t want to dig deeper into this question.
  • Higher reward attracts more consumption.

Conclusion

Machine Learning models suggest that advertising can really help increase revenues but not profits necessarily. Informational offer has more negligible effect than BOGO and discount. However, if Starbucks intends to attract more consumers, promotion costs might offset incomes while there aren’t enough customer responses.

However, visualization analysis shows a low response rate. Combining conclusions of the visualization analysis and the machine learning model suggests that advertising higher-income females who are three years member seems a good idea since they are most likely spending more in Starbucks.

“Coming up with features is difficult, time-consuming, and requires expert knowledge. ‘Applied machine learning is feature engineering.” — Prof. Andrew, Therefore, more feature engineering could be performed on offer, customer demographic and transaction data to have a better model.

1. Also, we can improve this project by taking up another problem statement, i.e. determining how much a customer could spend based upon the offer data and demographic data using the supervised machine learning regression algorithms, which in turn would help in finding out if the customer would respond or not as ‘total amount’ which a customer could spend is the topmost feature in the best-trained classifier model.

2. Also, we can improve the project by making our nearly balanced (slightly imbalanced) dataset into a perfectly balanced dataset using [8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset](https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/). This would further improve the performance of the classification model.

3. We can also perform the clustering modelling (behavioural clustering, product-based clustering, brand-based clustering) for customer segmentation into groups based on several variables at once. With it, we can target specific demographics and personas for different targets.[Starbuck’s Capstone Challenge git hub link]

[Github]https://github.com/ayman metwally2020/Starbucks_Project.git

Thank you

--

--

ayman Abd Elsattar metwally

Accomplished Data Scientist researcher with a passion for delivering valuable data through analytical functions(Telecommunication and Supply Chain Analytics)