Version: 3.13

Markov model with additional features

A classical implementation of the Markov model includes channels without incorporating additional available information. In this article, we present an algorithm that performs transaction-level attribution using an ensemble of Markov models, estimating one model per available feature.

The algorithm

Data

Suppose you have the following data:

id_path	channel_pos	channel	region	segment	seconds_to_last_touch	position
0	1	C	v05	w08	1532	first
0	2	C	v02	w11	919	middle
0	3	C	v05	w03	317	middle
0	4	C	v03	w09	0	last
0	5	((CONV))	NA	NA	NA	middle
1	1	C	v01	w13	582	first
1	2	C	v06	w11	0	last
4	1	C	v01	w04	0	first_last
4	2	((CONV))	NA	NA	NA	middle
...	...	...	...	...	...	...

id_path: Represents the customer journey.

channel_pos: Indicates the position of the channel within the customer journey.

channel: The marketing channel, where "((CONV))" denotes a conversion event.

region, segment, position: Categorical variables providing additional journey attributes.

seconds_to_last_touch: A numeric variable indicating the number of seconds from a given channel to the last touchpoint.

Discretization

The first step of the algorithm is discretizing numeric variables.

In our case seconds_to_last_touch is discretized.

id_path	channel_pos	channel	region	segment	seconds_to_last_touch	position
0	1	C	v05	w08	bin_10	first
0	2	C	v02	w11	bin_8	middle
0	3	C	v05	w03	bin_5	middle
0	4	C	v03	w09	bin_1	last
0	5	((CONV))	NA	NA	NA	middle
1	1	C	v01	w13	bin_5	first
1	2	C	v06	w11	bin_1	last
4	1	C	v01	w04	bin_1	first_last
4	2	((CONV))	NA	NA	NA	middle
...	...	...	...	...	...	...

Artificial channel

For each feature, we create an artificial channel by combining channel with that feature. For example, using region, we get:

id_path	channel_pos	channel_region
0	1	C_v05
0	2	C_v02
0	3	C_v05
0	4	C_v03
0	5	((CONV))
1	1	C_v01
1	2	C_v06
4	1	C_v01
4	2	((CONV))
...	...	...

Transposing the data:

path	total_conversions	total_nulls
C_v05 > C_v02 > C_v05 > C_v03	1	0
C_v01 > C_v06	0	1
C_v01	1	0
...	...	...

Using this data, we fit a classical Markov model and calculate the odds for each channel. For example, the odds for $C\_v05$ are:

$\text{Odds}(C\_v05)=\frac{\text{\text{Prob}}(C\_v05)}{1 - \text{\text{Prob}}(C\_v05)}$

where $\text{Prob}(C\_v05)$ is the conversion probability of $C\_v05$ , estimated from the transition matrix of the Markov model. Odds capture the effectiveness of each channel in driving conversions.

These odds are used for transaction-level attribution using them as weights at path level:

id_path	channel_pos	attribution_channel_region
0	1	0.15
0	2	0.40
0	3	0.15
0	4	0.30
0	5	((CONV))
1	1	0.20
1	2	0.40
4	1	0.20
4	2	((CONV))
...	...	...

If you want to know more about Odds, please refer to this link

Now we need to evaluate the predictive performance of the model by calculating the area under the precision-recall curve (AUC-PR). We choose the precision-recall curve over the classical ROC curve because, in this type of problem, the dataset is usually imbalanced, with a higher number of non-converting paths compared to converting paths.

Final attribution

We estimate a different Markov model for each combination of (channel × feature) and collect transaction-level attributions for each, along with the model's predictive performance. The final transaction-level attribution is the weighted mean of all the collected attributions, where the weights are a transformation of the predictive performances.

The transformation we use is:

$weight(X) = \frac{AUC\_PRE\_REC(X)/CR}{1/CR}$ if $AUC\_PRE\_REC(X)>CR$ or $0$ otherwise

where $X$ represents a generic feature, $AUC\_PRE\_REC(X)$ is the area under the precision-recall curve for feture $X$ and $CR$ is the dataset's conversion rate..

$weight(X)$ is a measure between 0 and 1 because the maximum possible value for $AUC\_PRE\_REC(X)$ can reach is 1.

When $AUC\_PRE\_REC(X)<=CR$ , then $weight(X)=0$ , meaning that feature has no significant influence on the target variable.

The algorithm​

Data​

Discretization​

Artificial channel​

Final attribution​

The algorithm

Data

Discretization

Artificial channel

Final attribution