In Data analysis, Games Analytics, How-to guides

Calculating the exact value of each install is a critical component of F2P business planning. This is especially true when each install may be costing you $1-2 in marketing. For this reason, many publishers devote substantial resource into developing accurate forecasts of player lifetime value (LTV). Typically these fall into one of three approaches:

Model  Average Revenue Per Daily Active User (ARPDAU)
Model Transactions
Model Players

In the first model, the day-to-day run rate is forecast. In the second, the number and value of transactions per player is forecast. And finally, in the third, the historical value of players with the same demographics is used. We will go through the calculation of each of these in detail below, as well as a guide you through when a particular approach is most suitable.


1) Modelling ARPDAU: The simple approach


The most prevalent approach is to use ARPDAU and Retention to estimate a player’s lifetime value. Mathematically;


Typically the days played by the average install is estimated from the retention curve by fitting a power-law i.e.


There are a number of ways to fit for α, but the easiest is to use regression to fit log(R)∝log(d). It is not good practice to use linear regression as the errors on the log (R) values are not normal.

Once the slope of the power law has been estimated, the total days played by d=D after install is then calculated from the integral of the retention power-law, i.e.



This value can then be combined with the ARPDAU to give an estimate of LTV.

As a worked example let’s assume that ARPDAU = $0.1 and our D1 to D7 retention is 35%, 28%, 25%, 21%, 18%, 15% and 13%. Fitting with a power law gives α=1.3, which leads to TotalDaysPlayed = 2.77 days at D=365 days.

This means our LTV = $0.28.

This approach is by far the simplest method for projecting LTV. However, this simplicity comes with a number of drawbacks. First of all, for a long-lived game, ARPDAU will be dominated by existing players who may be more willing to spend than the cohort you are trying to project. In addition, retention for all players tends to be worse than that of spenders. Both of these mean that the typical error in projections using this method is 30-40%, for cohorts of 500+ players.


2) Modelling transactions: Looking at spenders


A more sophisticated approach is to model the number of transactions a player will make in their lifetime.

A statistical model can be constructed to give P(T|D), i.e. the probability a player will make a transaction on day since install D. If Nt¬ is the number of transactions observed at day = t then at some future day=D the projected number of transactions is:


Conversion to payer can be modeled similarly, i.e. P(C|D) is the probability that a player will convert to a spender on day = D.

These probability distributions can be described by a number of long-tailed distributions. Different types of games suit different distributions. For example, PC games tend to be well fit by power-laws, while social casino games are fit by gamma distributions.

These models can be fit using numerical methods. Packages in R and python allow this to be done easily. In R, the fitdistr function allows a range of probability distributions to be fit to a dataset by numerically finding the maximum likelihood parameters. Once the best fit values for the models are obtained when the LTV at day=D can be calculated by

equation 5

Where   is the mean IAP value, ND the projected number of transactions by day = D and CD the projected number of conversions by day = D.

This approach has the advantage that it is modelling the behavior of the players that generate LTV, spenders.  Accurately modelling the conversion is essential for games which convert many players later, e.g. MMOs. While modelling transactions is important for games which have a high number of transactions per spender, e.g. casual puzzle games or those without a premium currency.

While this method should have superior accuracy than 1), it is still a cohort-based method and so requires a reasonable volume of players to achieve accuracy, i.e. it cannot give you a good predictor of the LTV of an individual user.

Putting this constraint to one side, by using appropriate distributions and testing, this type of model can typically achieve around 20% accuracy (for cohorts of 500+ players).


3) Modelling Players: Predicting individual LTV


Ideally, a reliable estimate of the LTV of each individual player would be available. This would not only allow decisions about acquisition and viability to be made, but also change the way the game interacts with players, e.g. low LTV players get more ads, high LTV players get VIP offers.

To predict LTV at the player level requires more detailed information about the player to be used, including demographics and behavioral. An example of the type of metrics that could be used are: country, device type, frequency of play, success rate, number of in-game friends, etc.

Using historical datasets, these metrics can be regressed against LTV to build a model. Depending on the underlying metric distributions, a single model can be used, or players can be segmented into different groups that have different characteristics, e.g. it may be required to use a completely different regression model for players on iOS compared to Android.

Once these models have been developed, they can be used to predict the LTV of individual players. Cohort LTV can be estimated by taking the average of the individual LTV predictions.

Al of these functions can be performed using statistical packages in R or Python. In R, kmeans or hclust could be used to segment the data, while glm would be used to regress the metrics against LTV.

This approach has the clear advantage that it produces an individual LTV. However, it has the drawback that it needs extensive, and relevant, historical data for it to be valid. This means that it cannot be used when a new game is launched, or after significant game updates.


Producing accurate models


While all 3 LTV models can give good results, the question of which model to use really depends on the situation your game is in. If you have a small number of players with relatively short lifespan (e.g. a few weeks), then model 1) is probably best.

If you have a significant number of players, and expect big variations in the lifespan and spending patterns depending on cohort, then model 2) is needed.

Finally, if you have a well-established game with a stable game version and player base, model 3) can offer significant advantages.

In all cases, the most important thing is to test your models against historical data to ensure that they give good accuracy and their limitations are well understood. Finally, it is never possible to produce reliable LTV models from a very small sample (e.g.<100 players). In these cases, ‘rule of thumb’ like LTV = 4 x ARPDAU will likely give a better picture than any of the statistical approaches.


If you enjoyed this, you may be interested in reading Why players leave: Understanding Day 1 Retention.

Recommended Posts
  • mohammed

    what does this operator mean and ?

Leave a Comment

Start typing and press Enter to search

why you need a different approach to CRM - title imagecomparison table blog title image