Yet I love writing code in it. So much can be done in so little code. I am always amazed at how little code I write to accomplish a task.
The RStudio IDE ( http://rstudio.org/ ) is a very pleasant environment to write code in.
I’d rewrite this example as something like:
num.steps <- 1000
num.walks <- 100
step.std.dev <- 0.03
start.value <- 15
rand.row <- function() rnorm(num.steps, 1, step.std.dev)
walk <- function () cumprod(rand.row()) * start.value
all.walks <- t(replicate(100, walk()))
plot(colMeans(all.walks), type = "l")
[I’m not an R guy, so that might not be the most typical style ever, but you get the idea...]For example:
replicate(2, rnorm(2))
creates a 2x2 matrix of independent random values. Whereas: x <- rnorm(2)
replicate(2, x)
Instead creates a 2x2 matrix with a single random value for each row, repeated across all columns. And so if you want to decompose it, you need to do: x <- function() rnorm(2)
replicate(2, x())
Which will re-call x twice inside the replicate function. replicate <- function (n, expr, simplify = TRUE)
sapply(integer(n),
eval.parent(
substitute(function(...) expr)),
simplify = simplify)
The replicate function uses R's compute-on-the-language to construct a new function that has expr as its body.A closure's argument will be evaluated at most once.
The way he did it is nice and minimal, and in 2 months when you go back to change something you will have no idea what you did.
I've had to do a few things in R and I always dread revisiting my code. Half the time it's easier and faster to just rewrite everything.
But I like dealing with numpy/scipy much, much more than R. Python as a language is I think much better designed, and numpy is a really nice tool for interacting with multidimensional arrays. When I write Python code, or read Python code written by anyone competent, I find program intent very easy to design/follow. Most of the R code I’ve seen “in the wild” is kind of a mess, because it is written by non-programmers many of whom have little experience or concept of code style. Additionally, as soon as a program has to do anything other than statistical analysis (examples: text munging, internet scraping, network communication, dealing with file formats, user interaction, etc.) Python is miles ahead.
The big advantages of R that I saw: (1) it has become the tool of choice in the academic statistics community, meaning that there is quite a lot of existing code for doing various sophisticated things, some of which you might have to implement yourself in Python, (2) it has some really nice graphing tools, (3) there seemed to be a few examples where a particular few lines of R code were more compact and clearer than the equivalent Python (can’t think of anything off-hand though).
In my mind, Matlab's (Octave's) array based programming makes sense. This? This does nothing that I expected it to do!
replicate seems to pretty randomly takes a function. Is that an R thing? I think what confuses me most is that there is nothing about this syntax that tells me that "cumprod (rnorm (1000, 1, 0.03))" hasn't already been evaluated! I could not, for the life of me, figure out why replicate didn't just create 100 exact copies. For example, why does replicate(...) evaluate, but the internals don't? This is driving me crazy!
How often do you want to generate random walks of this type where the variance of the process isn't dependent on its current level?
Observe that, as you increase the standard deviation of the random normal (to even .1), your "random walk" always walks to zero.
I don't mean to be a beady-eyed-pterodactyl but, as cool as clever one-liners sometimes are, often they solve toy problems. I say this as someone who loves R and uses it everyday and constantly is forced to brute-force with ugly inlined Rcpp code. (Which is fine)
As for the usefulness of this kind of walk: the process we're modelling is an evolutionary one, where the rate of change is fixed (in this case within the species) and we'd like to detect 'random' (non-selected) evolutionary paths by comparing simulations to historical data.
plot(mean((cumprod(randn(100,1000) .* 0.03) .* 15))
Well, that is assuming one wants a line plot of the mean value of the "location" at each time point across the population of walks. Personally, I find R's rather idiosyncratic approaches to data handling and function wrangling a bit hard to digest.
Here he is computing the history of states of a random walk as scan (*) initial updates. If you wanted to simulate branching random walks, you'd work over the data type of multiway trees instead of arrays, but the algorithm to compute the history of intermediate states would otherwise be identical, using the scan function for multiway trees.