Forbes Magazine, Names You Need to Know in 2011: R Data Analysis Software: http://blogs.forbes.com/smcnally/2010/11/10/names-you-need-to-know-in-2011-r-data-analysis-software (and links therein)
And some of its developers are suggesting that they scrap it and start over (don't know if the whole "Core Team"'s on board tho'): http://www.stat.auckland.ac.nz/~ihaka/downloads/Compstat-2008-Slides.pdf
Are there parallels like this in the development of other languages/environments/ecosystems (e.g., Python3, Perl6 "revisions")? How do these efforts usually end up (I guess we're still waiting to see about Python3 and Perl6...) -- and how would it affect your business's decision to develop a library in this environment?
Programming in the business world is screwed up beyond all imagination. The more money a given application is responsible for, the more likely it is that it's a house-of-cards (pun intended for MVS nerds). They're always mishmashes of COBOL, SAS, DFSORT, and random proprietary languages that have never been the subject of a third-party book, and were sold to a company that was sold to a company that was sold to CA Technologies back in the 1970s. Whatever these languages can't do is implemented through Escher-painting constructions of Excel references and VBA macros.
So, when people say that R has some issues, I say, "boo fucking hoo".
Most businesses suffer from an unnatural separation between IT and the business end. If business people want something programmed, they call IT. They don't learn Python and do it themselves, because Python is a "programming language". R is the first real language that business people are being encouraged to learn, because it's an "analysis environment". You have no idea how often I have to edit the word "programming" out of my presentations for this reason.
R will win in business because it's decent, and it's been around long enough to not be scary to managers. I'd be cautious about drawing comparisons to other languages that have undergone big design changes, because, as far as I can tell, the existence of a decent language in the business world is entirely without precedent.
(Edit: In case the above came off as sounding like a "non-hackers are idiots" rant, it wasn't meant as such. Many of the people that produce these hideous monstrosities of SAS and VBA code have PhDs in statistics and atmospheric sciences. You can be pretty smart without knowing how to write software well.)
Makes me wonder if it is a UI problem. In the macro spreadsheet, you have data with code tied to it. In the programming world, you generally have code accessing data.
The abstractions might just be wrong for business people and a "simple" change could reduce a lot of IT pain.
I recall that SQL was intended to be used by business people... and it probably has been, sometimes; but I don't think it happens much. The days of early adoption might have differed, through appealing to the more adventurous business people (as R might be now).
One thing I know for a fact: business people use spreadsheets. I think making something that easy to get things done in is an incredible achievement. As an example, I think PHP has approached but not attained it.
I'm familiar with exactly the same phenomemon in the hardware verification world, where new tools and languages keep being invented just so that validation engineers can relax that they're not expected to "program" in a real language.
R is really becoming huge in academia. As far as I can tell, health sciences is the last SAS holdout. I expect it to take over business as well. Biz types will love it because it's so powerful as a scripting environment, but the programmers building and maintaining stuff with it will come to loathe it. R will become the PHP of analysis; ubiquitous but hated, and no one will have the chutzpah to fix it.
Random aside, anyone notice that the Kiwis are all over R? The original creators and the guy who wrote ggplot2 among many others.
Even for stuff that a lot of other programs can do just fine, SAS often has an edge. For example, everybody and their brother can do a logistic regression model... but SAS can give you confidence intervals for all kinds of crazy parts of the model that SPSS won't even bother calculating and that R will only give you point estimates for.
The other great thing about SAS is that a lot of the good statistics books from the last twenty or thirty years include SAS sample code- for example, I'm currently having to do some off-the-beaten-path ANOVA stuff, and the reference I'm using (Edwards' "Analysis of Variance for the Behavioral Sciences") uses SAS as its language of choice.
That said, I personally find the SAS "language" to be alternatively bewildering and nostalgia-inducing (the "cards" command, anybody?). SAS is the only language about which I can honestly say "it makes R's syntax look clean and predictable". Also, the Windows version of SAS is an absolute abomination from a UI standpoint. And, their licensing schemes are draconian, and installing the damn thing can easily take an entire day, especially if (say, for example) the installer gets confused because you've already got a JDK installed on your computer. Not that I'm bitter, or anything...
Of course, as others have noted, in bioinformatics, R either is already the default or is almost there. I know that in my department's bioinformatics courses, they use R, Python, and Perl almost exclusively, and only break out the SAS when there's something specific they need it to do.
I disagree that no one will have the chutzpah to fix R - I know of at least three groups including one driven by an extremely serious computer scientist, who are either working on rewrites of the internals or complete new implementations of the language. Even though R has been around longer than languages like Python and Ruby, it hasn't excited the interest of so many CS people, so it's at an early stage of it's evolution - it's only now at a point where serious alternative implementations of the core engine are starting to come out.
Personally, I've been working on making many of the core library more cleaner and more consistent. I'm completely biased, but I think if you use my packages (ggplot2 for graphics, plyr for apply functions, stringr for strings, lubridate for dates, ...) you'll have many fewer problems. And if you do find inconsistencies, I'm committed to fixing them.
It's just a shame to see a whole language popping out of something that could just be a library.
The idea of rewriting a large body of code in a different language does not make much sense.
Also, being a niche language has some nice consequences:
* R has been there for a long time through its predecessor S
* R is a specialized language: little chance to see it being screwed up by some library which wants to change everything, as it happens too often in python
* Because it is a niche language, its behavior is consistent across platforms (it is just easier to do with R than with python, or other "real" languages).
Note how being a "real" language goes against those advantages. Also, most researchers are very lousy programmers. Often, their software is super smart, but the code quality is awful and write-only. A less powerful language may mitigate those issuesWe propose developing an R-like language on top of a Lisp-based engine for statistical computing that provides a paradigm for modern challenges and which leverages the work of a wider community. - Ross Ihaka (co-developed the R statistical programming language with Robert Gentleman) and Duncan Temple Lang (core developer of R)
However, the libraries are great. Anything you'd want in statistics is already there. So I do use it all the time.
Just to say something nice, I do like data frames (a two dimensional matrix, where each column can have a different type).
Looks like there might be a ceiling for prestige, and that income is not as related to it as I would have thought. But, what are the units?
R works for me right now so that's what I'll stick to.
As long as there aren't more than 10 books written about the yet unborn data-cruncher saviour and as long as the brand new alternative isn't adopted in courses, I wouldn't bother -- unless you want to be the saviour's father (i.e. developer) of course.
It's attractive to embed into databases like Neteeza, Teradata and other analytic databases, and vastly easier to use than SAS.
Even if it was rewritten in python, I think that would be unlikely to slow down it's adoption, which is driven by grad-students, researchers and quants who often have no real programming background (and frequently aren't interested in learning more than they need to generate figures for their publications).