Luke Tierney explains the move from %>% to a native pipe |> here [1]. The native pipe aims to be more efficient as well as addresses issues with the magrittr pipe like complex stack traces.
Turns out the |> syntax is also used in Julia, Javascript and F#.
The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).
[1] https://www.youtube.com/watch?v=X_eDHNVceCU&feature=youtu.be...
Although F# is its most well known early popularizer, it originated in Isabelle/ML, 1994, proposed by Tobias Nipkow.
Here is a blog post by Don Syme which embeds the email thread of its invention: https://web.archive.org/web/20190217164203/https://blogs.msd...
It's a fascinating look through time.
Of course, I should note this is the history for the pipe-forward operator for chaining (reverse) function application used in a programming language. The general concept is even earlier, as attested by the shell syntax for chaining anonymous pipes https://en.wikipedia.org/wiki/Pipeline_(Unix)#History.
Metanote: I was surprised I was unable to find an answer to who invented the (|>) pipe syntax through google. I could only find this Elm thread https://elixirforum.com/t/which-language-first-introduced-th... which got close but did not have the answer. I am therefore writing this here to hopefully surface it for future searches and "question answering AIs".
And given that I'm currently staring at Isabelle code most of the day for my Master's thesis at the chair of Prof. Nipkow, it's sightly surreal to learn about this here, heh.
https://stat.ethz.ch/pipermail/r-devel/2020-December/080173....
The reason for announcing the new lambda syntax at the same time seems to be to enable certain workflows that the magrittr pipe supports. The %>% operator, by default, pipes to the first argument of a function. If you want to pipe to a different argument, you can do:
a %>% func(x, arg2 = .)
It seems like the native pipe doesn't support a placement argument, but you can use the new, more concise lambda operator:
a |> \(d) func(x, arg2 = d)
A little more verbose, but it's not a very common use case, it's more general, and I'd happily trade a little more verbosity for the rest of the improvements. (That said, I haven't played around with the magrittr 2.0 improvements yet, so maybe the difference is going to end up being less than the presentation suggests.)
I tend to use it a lot if I'm just piping a vector to base functions (gsub/grep have x as their third argument.
This syntax looks like it makes that a little harder, but the new error messages are going to make everything so much better that I'm totally fine with it.
lm(y ~ ., data = my_dataframe)
already means "regress the variable y on all other columns in `my_dataframe`." For big, interactive regresions, it's really natural to write my_original_dataframe %>%
do_a_bunch_of_tranformations() %>%
select(...) %>% # Pull out just the columns you want
lm(y ~ ., data = .)
and god knows how that last line is going to be interpreted. So disambiguating through some mechanism is necessary anyway. A lambda is much better than some temporary variable that just holds the formula `y ~ .`.Note that for JS it's still just a proposal and has been stuck in an indeterminable bikeshedding phase for most of this year.
I'm excited for it, though, and if the partial application syntax `func(a, ?)` gets ratified then we'll have a nice concise way of describing operations.
Personally I think it would be a good idea if you could e.g. configure your keyboard so that AltGr+L produces λ, which you can then use in place of \
But alas, the Haskell community has decided against this:
The proposed lambda syntax for R is `\(x) x+1` so `\` will just be shorthand for `function`.
function(x) {x + 1}
is already logically equivalent to and from some perspectives an arguable syntax improvement on \(x) x + 1
Giving everyone two ways of doing one thing just means the tutorials will be fragmented and beginners even more confused.Tierney mentioned that tidyverse found function(x) too verbose and uses fomula syntax. Given how tidyverse often uses the "y ~ x" formula notation, this might actually be picking up deficiencies in R's macro system rather than in the function notation and the problem got misdagnosed.
\(x) x+1
instead of function(x) x+1
not only saves a few keystrokes.It will also produce shorter, and clearer, lines of code.
What I don’t understand is the reference to “formula syntax”. What is the issue and how does the new syntax solve it?
## Edit
Corrected a notation.
https://www.tidyverse.org/blog/2020/11/magrittr-2-0-is-here/
The pipe `x |> f` is indeed the same.
I used to love using lambdas in Python, along with map/reduce/filter but for whatever reason the Python community has turned against it. Map and filter can now be nicely done with list comprehensions, although I still haven't found a decent one-line equivalent for reduce (other then importing functools).
Other comments have mentioned that functional idioms make code harder to read for devs unfamiliar with the concepts but the pipe operator IMHO has no downsides (I am not even sure what it really has to do with funcational programming, other than that it happens to be used in more functional languages).
What it has to do with functional programming is, first, that it's right side operand is a function, and, second, that it's a technique for unrolling the deeply nested function calls common in expression-oriented functional programming without resorting intermediate assignments which are natural in statement-oriented imperative programming but less so for single-user and not independently semantically important values in expression-oriented functional programming.
y(1) |> x would be equivalent to x(y(1)) in python.
The equivalent to the function of the first is python’s lambda syntax, there's no simple syntax providing the same function as the pipe.
This is also why pandas API is so bloated. A lot of pandas special-case functionality, like dropping duplicate rows, can be replicated in R by chaining together more flexible and orthogonal primitives.
Never really thought I'd be writing that in a public forum.
I think it's functional orientation and the way that for loops are neglected make it seem completely insane to many people. But this is a far smaller fraction of people today than it was in 2000. back then, Java and C++ didn't even have lambdas. Since then, procedural languages have gained a lot the "mind-breaking" functional features of Lisp-derived languages. Python and JavaScript have become far more common. All the things that made the language of R "weird" and unusable to the Java/C++ crowd have been adopted elsewhere.
I do like Pandas concept of row indices, which I know Julia (and I believe, R) lack.
Julia doesn't have these problems, and I've found it so much nicer to use for data analysis. You can even call Python libs, if you really have to.
[0] https://pandas.pydata.org/pandas-docs/stable/reference/api/p...
I wonder if RStudio will change its `ctrl` + `shift` + m shortcut from the magrittr ( %>% ) style pipes to these new pipes ( |> )
The R Core team is very cautious about breaking anything, including universally acknowledged terrible defaults (I never thought `stringsAsFactors = TRUE` would ever go away). They know the majority of R programmers are not experts in programming. These users just want to write a script, debug it, and then use it for years with complete trust in the results.
Maybe Python just needs a bigger repository with more stringent rules for what will be allowed?
I don't actually know the relation between magrittr and the RStudio shortcut, but I've always assumed a shortcut for typing the pipe characters exist because RStudio employs Hadley Wichkham, who in turn is really big on tidyverse and pipes.
Would %>% mean anything in R if you didn't import magrittr?
[1] https://github.com/tidyverse/magrittr/blob/8b3d510f2a333b224...
magrittr created %>% which, when used in infix: x %>% f() calls the function on the right side with the argument on the left side f(x).
There are package that provide tons more. For example: https://github.com/moodymudskipper/inops . And you can easily create your own:
`%sum%` <- function(x, y) x + y
1 %sum% 2
[1] 3%>% does not mean anything in base R.
also just discovered that ctrl shift m in firefox does something weird, looks like mobile view or something..
Same principle in octave: https://github.com/tpapastylianou/chain-ops-octave
> f <- function(expr) eval(substitute( function(x) expr ))
> sapply(1:4, f({ a <- x^2 ; b <- x^3 ; a+b }))
[1] 2 12 36 80
but I'm not sure it would have worked well. The new syntax is of course also more flexible.DT[..][..][..]
This is not a pipe -René Magritte
Much better to use the OCaml, F# syntax.
Think Julia uses it too.
Whereas what is proposed here is simply syntactic sugar for creating an anonymous function; from the little that is said in the announcement there is no reason to think this syntax would provide any guarantees that state changes due to lexical scoping won't affect the function's output.
Python has comparative advantages over R in production roles. R has comparative advantage in statistical libraries, visualization, and meta programming. Neither are exemplars for production deployment or meta programming (R is an exemplar for stats libraries however).
Yeah, you've probably never heard of it
ggplot is something that I don't think matplotlib is comparable to at all, though. I am so much faster at iterating on a visualization with R/ggplot than Python/matplotlib. Maybe it is my tooling, though. How about others who have used both? What are your experiences?
Once I was a lead on a new project and asked the intern to write some basic ETL code for data in some spreadsheets. I said she could write it in Python if she wanted, because "Python is good for ETL", right?
This intern was not dumb by any means, but she wrote code that took 5 minutes to do something that can be done in <1 second with the obvious dplyr approach.
Also, if your bank analysts pick up dplyr, they can use dbplyr to write SQL for them :)
Personally, I prefer R for my use case which is longitudinal analysis of experimental data.
I teach classes involving data analysis, some in Python and some in R (different topics). The amount of time the Python students spend fighting pandas---looking up errors, trying to parse the docs, trying out new arcane indexing strategies---is obscene. On the other hand, the R students progress rapidly. I'd move everything to R if I could, but Python is still better for NLP pipelines.
Python is wonderful but the cognitive load for switching in industry and academia without a clear cost benefit isn't worth it to most people I know in my shoes. I encourage new coders to learn Python but discounting R feels a bit asinine.
Hadley is still actively doing work for R which has led to a graphing packages that is substantially better than anything in Python (last I check). I have no doubt that Python will steal it and implement it eventually (as they should) but R is still doing firsts that Python hasn't (note the native implementation of Piping, they're late to the party on lambda functions obviously)
Also I used to love Python... Until I got a full time job and learned why static typing exists.
So for example, I recently saw a paper with a quite complex estimator based on dynamic panels and network (or spacial) interdependence that could identify missing network ties. For that, an R package exists.
If you want to use it in Python, you'd have to replicate a whole estimation infrastructure yourself, starting by extending the basic models in statsmodels.
That example is quite typical in my opinion.
Like I said, really like to code in Python and I don't like R all that much. But if someone says: "Why would you use R, Python is better", then we can confidently say the person does not know what R is actually used for.