Skip to main content
Version: 3.13

Markov model with additional features

A classical implementation of the Markov model includes channels without incorporating additional available information. In this article, we present an algorithm that performs transaction-level attribution using an ensemble of Markov models, estimating one model per available feature.

The algorithm

Data

Suppose you have the following data:

id_pathchannel_poschannelregionsegmentseconds_to_last_touchposition
01Cv05w081532first
02Cv02w11919middle
03Cv05w03317middle
04Cv03w090last
05((CONV))NANANAmiddle
11Cv01w13582first
12Cv06w110last
41Cv01w040first_last
42((CONV))NANANAmiddle
.....................

id_path: Represents the customer journey.

channel_pos: Indicates the position of the channel within the customer journey.

channel: The marketing channel, where "((CONV))" denotes a conversion event.

region, segment, position: Categorical variables providing additional journey attributes.

seconds_to_last_touch: A numeric variable indicating the number of seconds from a given channel to the last touchpoint.

Discretization

The first step of the algorithm is discretizing numeric variables.

In our case seconds_to_last_touch is discretized.

id_pathchannel_poschannelregionsegmentseconds_to_last_touchposition
01Cv05w08bin_10first
02Cv02w11bin_8middle
03Cv05w03bin_5middle
04Cv03w09bin_1last
05((CONV))NANANAmiddle
11Cv01w13bin_5first
12Cv06w11bin_1last
41Cv01w04bin_1first_last
42((CONV))NANANAmiddle
.....................

Artificial channel

For each feature, we create an artificial channel by combining channel with that feature. For example, using region, we get:

id_pathchannel_poschannel_region
01C_v05
02C_v02
03C_v05
04C_v03
05((CONV))
11C_v01
12C_v06
41C_v01
42((CONV))
.........

Transposing the data:

pathtotal_conversionstotal_nulls
C_v05 > C_v02 > C_v05 > C_v0310
C_v01 > C_v0601
C_v0110
.........

Using this data, we fit a classical Markov model and calculate the odds for each channel. For example, the odds for C_v05C\_v05 are:

Odds(C_v05)=Prob(C_v05)1Prob(C_v05)\text{Odds}(C\_v05)=\frac{\text{\text{Prob}}(C\_v05)}{1 - \text{\text{Prob}}(C\_v05)}

where Prob(C_v05)\text{Prob}(C\_v05) is the conversion probability of C_v05C\_v05, estimated from the transition matrix of the Markov model. Odds capture the effectiveness of each channel in driving conversions.

These odds are used for transaction-level attribution using them as weights at path level:

id_pathchannel_posattribution_channel_region
010.15
020.40
030.15
040.30
05((CONV))
110.20
120.40
410.20
42((CONV))
.........

If you want to know more about Odds, please refer to this link

Now we need to evaluate the predictive performance of the model by calculating the area under the precision-recall curve (AUC-PR). We choose the precision-recall curve over the classical ROC curve because, in this type of problem, the dataset is usually imbalanced, with a higher number of non-converting paths compared to converting paths.

Final attribution

We estimate a different Markov model for each combination of (channel × feature) and collect transaction-level attributions for each, along with the model's predictive performance. The final transaction-level attribution is the weighted mean of all the collected attributions, where the weights are a transformation of the predictive performances.

The transformation we use is:

weight(X)=AUC_PRE_REC(X)/CR1/CRweight(X) = \frac{AUC\_PRE\_REC(X)/CR}{1/CR} if AUC_PRE_REC(X)>CRAUC\_PRE\_REC(X)>CR or 00 otherwise

where XX represents a generic feature, AUC_PRE_REC(X)AUC\_PRE\_REC(X) is the area under the precision-recall curve for feture XX and CRCR is the dataset's conversion rate..

weight(X)weight(X) is a measure between 0 and 1 because the maximum possible value for AUC_PRE_REC(X)AUC\_PRE\_REC(X) can reach is 1.

When AUC_PRE_REC(X)<=CRAUC\_PRE\_REC(X)<=CR, then weight(X)=0weight(X)=0 , meaning that feature has no significant influence on the target variable.