I've found that even if I make no changes to an app, the retention and virals fluctuate quite a bit for no apparent reason, and the fluctuations are big enough that it makes long term forecasting really more of guesswork than anything else.
Also, there are second order effects that are hard to model as well. For instance, improving virals can improve retention (user A invites friend B, user A stays for longer because their friend uses it).
I've gone through the process of modeling a couple apps, and it quickly gets to a point where the relationships become circular and small variations cause exponential differences down the line.
It is important to make informed decisions about virals and retention, but I don't think such a model is the way to do it. I think it is more important to think about optionality and decision making in opaque environments rather than trying to model the unmodelable.
I think it's just we write less about it, not so much that we aren't aware of its importance. "How we managed to retain our users for 4 months" sounds admittedly less sexy than "How we got a bazillion users in less than 72 hours", but the truth is a tech startup with no strong retention strategy is basically dead in the water, and generally folks know this.
We denote time by t with units, say, days.
The number of customers at time t is the (real valued function of a real variable) y(t).
We assume that at the present t = 0 and that we have y(0), that is, the current number of customers.
We let the number of customers who will ever try our business be b. That is, b is our intended 'market potential'.
Initially we assume that once we get a person as a customer, we do not ever lose them but keep them forever.
As usual, we let y'(t) = dy(t)/dt be the calculus first derivative of y(t). Then y'(t) is number of new customers per day, that is, the 'rate' at which we gain customers.
For 'virality' we notice that that is proportional to (1) the number of customers y(t) we have 'talking' about our business and (2) the number of people
b - y(t)
yet to be our our customers hearing the talking.Then we have that for some constant of proportionality k
y'(t) = k y(t) (b - y(t))
So we have an initial value problem (that is, we
know y(0)) for a first order (we use only the first
derivative) ordinary (no partial derivatives)
differential equation.Then from calculus,
y(t) = y(0) b exp(bkt) /
( y(0)( exp(bkt) - 1) + b))
So this solution grows (1) initially slowly, (2)
then more rapidly, (3) then more slowly and
approaches b asymptotically from below.In case we lose some customers forever at some rate r, then we get the same solution except k and b get adjusted.
Once there was a startup (now a major company) that was struggling and had as an investor a major company with a Board seat and at the startup two representatives, one in finance and the other in aeronautical engineering.
The two representatives had asked for some revenue growth projections.
People around the HQ considered what the startup hoped, intended, thought might happen, etc., but found nothing credible.
One guy who remembered calculus reluctantly got involved, formulated and solved the differential equation above, and showed the solution to a Senior VP of Planning (SVP) who reported to the founder, CEO, COB. The SVP was responsible for the projections. The SVP took the guy's calculus solution as the basis of the projections and on a Friday sat with the guy with a pocket calculator and some graph paper and graphed solutions to the differential equation for selected values of the constant k and picked one of the solutions as the official projection.
The next day, Saturday, at about noon, the guy was in his office working on some other math problems and got a call from a person asking if he knew about the projections for the Board and if he could come over to the HQ? Sure. When the guy arrived, the situation was grim: The two representatives of the major Board Member were standing in the hall with their bags packed with airline tickets back to Texas. The startup was about to die.
The SVP was traveling and out of town.
The person who had called got the graph of projections from the previous day and asked the guy to reproduce a point on the graph. Using the calculator, the solution above, and a few keystrokes, the point on the graph was reproduced. After several more points were reproduced, the area became happier; the two representatives on the Board stayed, and the startup was saved.
Later the person who had called explained that that Saturday was a Board meeting, the growth projection graph was shown, and the two representatives had asked how the projections were calculated. The rest of the company tried to reproduce the graph but could not. The Board meeting stopped. The two representatives lost patience with the startup, got airline tickets back to Texas, returned to their rented rooms, packed their bags, and as a last chance returned to the startup to see if there was an answer to how the projections were calculated.
Ah, one saved startup! One reason to take calculus seriously!
For k, might fit to past data. For given y(0) and b, all k does is adjust how fast the curve rises to the asymptote. So basically all we are doing is interpolating between y(0) and b.
Otherwise, all viral curves are the same.
So, an advantage of my derivation is a simple, explicit equation for a fairly general solution.
The article has a comment claiming that biology addresses a similar problem and gets a 'logistic' curve. The comment didn't say just what was meant by a logistic curve, but I suspect that my solution here is an example. If so, then here we have an 'axiomatic' derivation of the logistic curve.
It is true that the growth of some products, e.g., TV sets, look to the eye very much like one of the curves from my solution for selected values of y(0), b, and k.
Could also make a Markov assumption: So, assume that get new customers (and, if wish, lose old customers) at some 'rates' and, thus, get a continuous time, discrete state space Markov process. Then as is well known the solution is a matrix exponential. Could evaluate the matrix exponential or just use Monte Carlo to generate a few thousand sample paths. Then could put some confidence limits on the deterministic solution.
Since no one guessed the war story, the startup was FedEx, the SVP was Mike Basch, the CEO, of course, was Fred Smith, the person who called on the phone was Roger Frock, and the investor was General Dynamics. The arithmetic was courtesy of an HP-35. So, HP might run an ad saying how they saved FedEx!
I found agent-based modelling much more interesting. For example, mean-field models struggle with non-uniform spaces.
For the interest of persons here attending, I've uploaded my crappy code on this topic. It's 3 years old and not production suitable.