Multi-level Marketing

Modeling the marketplace using the RStan ecosystem.

Author’s note: The data in this post has been 100% simulated, see the code here.

Our Market

Customers come to us from different places, at different times, for different reasons. We do not control the market, however, it is incumbent on us to know our market to the fullest extent possible. This means knowing our customers, our relationship with them over time, and our broader reach in the community. The purpose of this project is to measure our marketing impact.

What is the effect of our marketing campaigns?
How well do our salespeople connect?
How much does our relationship history matter?

Research Design

The outcome of interest- whether a customer chooses to work with us- exists at the appointment level, ie., did a particular sales call result in a closed deal?

The variables and values included in this example are:

close: binary indicator for whether the appointment led to a sale
customer_id: the unique customer id
lead_source: radio, TV, newspaper, vehicles or the internet
year: year of appointment, 2015 through 2019
city: Davis, Vacaville, Fairfield, Suisun or Woodland
sales_professional: Peter, Susan, Mary, Steven or John
call_to_estimate: Days between initial call and appointment date
product: Product customer is interested in, 3 types- HVAC, water and solar
n_service: Number of service calls prior to the sales appointment
prior_revenue: Binary indicator for previous purchase history for customer

Note: I’ll use the terms “feature,” “variable,” “indicator” and “coefficient” to refer to the above items.

Regression Framework

The magnitude of the effect of each feature on the probability of a closed sale can be measured, and the variance partitioned, by using a regression framework.

The problem with regression is that, as more variables are added to a model, fewer observations fit into a particular category. This data sparsity tends to lead to a problem in regression known as over-fitting.

For example, there may be 50 appointments that resulted from a particular marketing campaign. Within this campaign, however, there may only be 5 appointments attributed to a particular combination of features.

As more variables are added to the model, low sample sizes for subgroups, class imbalances and an exponentially increasing number of subgroups leads to an increased likelihood that some feature effects are over-estimated.

Bayesian Multi-Level Models

In order to improve the estimates of the model coefficients, two additional modifications to the traditional logistic regression will be made.

First, the regression with be executed in a Bayesian framework. Weakly informative prior distributions will regularize the estimates to within a reasonable range. Priors prevent over-estimation by assuming that, for example, a marketing campaign would boost sales by 10-20% at most.

Second, where applicable, the variables will have multi-level structure. A multi-level variable partially pools estimates back to their average. For example, if there were a strong positive response to a particular radio commercial (but only a few observations), the estimate of that response would be reduced to more closely match the average effect of a radio commercial.

RStan Ecosystem

This analysis will be carried out in Stan, an open source probabilistic programming language that utilizes Hamiltonian Monte Carlo to computationally compute Bayesian integrals over the joint posterior distribution.

The RStan ecosystem, which can be called from the R computing environment, includes the following packages:

RStan- the R interface for executing Stan code and returning results.
RStanARM- A simplified syntax for programming regression models in Stan.
Loo- A package for measuring and comparing models’ fit to the data.
ShinyStan- An interactive dashboard for sharing your model with others.
Bayesplot- A package for performing inference with posterior distributions.

What follows is an overview of the analysis in a way that roughly corresponds to the packages above. To see the entire file executed in an R Notebook, see the full R notebook here.

Model Fitting with RStanARM

The full model can be expressed in the following form:

\begin{align} L_i = Binomial(1, logit^{-1}[p_{i…}]) \newline p_{i…} = \alpha_{i…} + x \beta \newline \alpha_{i…} = N(0, \sigma_{i…}) \newline \beta = Cauchy(0, 2.5) \newline \sigma_{i…} = Cauchy(1, 25) \newline \end{align}

Where i represents a particular appointment and … represents the customer, lead source, year, city and sales professional multi-level variables. α represents varying intercepts for each combination of multi-level variables and β represents appointment-level coefficient for product and call_to_estimate.

As shown in the code below, RStanARM expresses multi-level models in a reduced form using lme4 syntax.

mlm_sim <- stan_glmer(
		formula = close ~ (1|customer_id) +
						  (1|lead_source) +
						  (1|year) +
						  (1|city) +
						  (1|sales_professional) +
						  product +
						  n_service +
						  prior_revenue +
						  call_to_estimate - 1, 
    prior = normal(location = 0, scale = 1, autoscale = FALSE),
    prior_intercept = cauchy(location = 0, scale = 2.5, autoscale = FALSE),
    prior_aux = cauchy(location = 0, scale = 25, autoscale = FALSE),
    data = dat,
    family = binomial,
    cores = 3,
    chains = 4,
    iter = 1000,
    warmup = 150,
    seed = 1)

Regularizing prior distributions were chosen and prior predictive distributions were constructed by utilizing the generative features of a Bayesian model.

Model Selection with Loo

In order to verify the superior goodnees-of-fit of the multi-level model, it was compared to a single-level Bayesian regression with identical priors.

The loo package uses leave-one-out cross validation to measure a model’s predictive accuracy for a single observation that is excluded from the model. It does this for all observations in the dataset.

Both models were compared on their expected log pointwise predictive density, and the multi-level model performed better.

Model Diagnosis with ShinyStan

For a number of reasons, models may fail to give accurate results unless they are correctly programmed. How you program the model is part of the model.

The easiest way to look “under the hood” is to do so in ShinyStan, a Shiny app template for a Stan model.

This model passed all diagnostics without any issues and has been shared on my Shiny dashboard.

Model Inference with Bayesplot

Bayesplot makes numeric and graphical model inference easy by providing common functions for analysis of MCMC draws.

It utilizes the ggplot2 graphing library, which allows the analyst to fine tune the plot by adding text, labels, etc.

This is me

Conclusion

That’s a brief overview of the project. For more information, including the prior and posterior predictive distributions, see the full R notebook here.

Blake Shurtz