R adds native pipe and lambda syntax (opens in new tab)

(developer.r-project.org)

250 points_fnhr5y ago144 comments

144 comments

wenc5y ago

I'm sure some of us who are out of the loop might be wondering: what about the magrittr pipe operator (%>%) that we all know and love?

Luke Tierney explains the move from %>% to a native pipe |> here [1]. The native pipe aims to be more efficient as well as addresses issues with the magrittr pipe like complex stack traces.

Turns out the |> syntax is also used in Julia, Javascript and F#.

The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

[1] https://www.youtube.com/watch?v=X_eDHNVceCU&feature=youtu.be...

Cybiote5y ago

In the case anyone is curious about the origin of the (|>) pipeline symbol:

Although F# is its most well known early popularizer, it originated in Isabelle/ML, 1994, proposed by Tobias Nipkow.

Here is a blog post by Don Syme which embeds the email thread of its invention: https://web.archive.org/web/20190217164203/https://blogs.msd...

It's a fascinating look through time.

Of course, I should note this is the history for the pipe-forward operator for chaining (reverse) function application used in a programming language. The general concept is even earlier, as attested by the shell syntax for chaining anonymous pipes https://en.wikipedia.org/wiki/Pipeline_(Unix)#History.

Metanote: I was surprised I was unable to find an answer to who invented the (|>) pipe syntax through google. I could only find this Elm thread https://elixirforum.com/t/which-language-first-introduced-th... which got close but did not have the answer. I am therefore writing this here to hopefully surface it for future searches and "question answering AIs".

mckirk5y ago

Woah, I hadn't known that!

And given that I'm currently staring at Isabelle code most of the day for my Master's thesis at the chair of Prof. Nipkow, it's sightly surreal to learn about this here, heh.

cwyers5y ago

Thanks, the video helped explain some things, along with this post from the R-devel list:

https://stat.ethz.ch/pipermail/r-devel/2020-December/080173....

The reason for announcing the new lambda syntax at the same time seems to be to enable certain workflows that the magrittr pipe supports. The %>% operator, by default, pipes to the first argument of a function. If you want to pipe to a different argument, you can do:

a %>% func(x, arg2 = .)

It seems like the native pipe doesn't support a placement argument, but you can use the new, more concise lambda operator:

a |> \(d) func(x, arg2 = d)

A little more verbose, but it's not a very common use case, it's more general, and I'd happily trade a little more verbosity for the rest of the improvements. (That said, I haven't played around with the magrittr 2.0 improvements yet, so maybe the difference is going to end up being less than the presentation suggests.)

disgruntledphd25y ago

The use of "." as an argument is actually probably one of my most common wtf's with pipes in general.

I tend to use it a lot if I'm just piping a vector to base functions (gsub/grep have x as their third argument.

This syntax looks like it makes that a little harder, but the new error messages are going to make everything so much better that I'm totally fine with it.

grayclhn5y ago

It is particularly infuriating in R, because

    lm(y ~ ., data = my_dataframe)

already means "regress the variable y on all other columns in `my_dataframe`." For big, interactive regresions, it's really natural to write

    my_original_dataframe %>%
        do_a_bunch_of_tranformations() %>%
        select(...) %>% # Pull out just the columns you want
        lm(y ~ ., data = .)

and god knows how that last line is going to be interpreted. So disambiguating through some mechanism is necessary anyway. A lambda is much better than some temporary variable that just holds the formula `y ~ .`.

1 more reply

dash25y ago

I think magrittr 2.0 has addressed that problem also.

dugmartin5y ago

|> is also used in Elixir where it is implemented as a macro so it’s a little less flexible since it can’t be assigned as a value.

crooked-v5y ago

> Turns out the |> syntax is also used in Julia, Javascript and F#.

Note that for JS it's still just a proposal and has been stuck in an indeterminable bikeshedding phase for most of this year.

pkage5y ago

Admittedly, the `|>` javascript syntax is complicated by unclear async behavior.

I'm excited for it, though, and if the partial application syntax `func(a, ?)` gets ratified then we'll have a nice concise way of describing operations.

amelius5y ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

Personally I think it would be a good idea if you could e.g. configure your keyboard so that AltGr+L produces λ, which you can then use in place of \

But alas, the Haskell community has decided against this:

https://gitlab.haskell.org/ghc/ghc/-/issues/1102

sieste5y ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

The proposed lambda syntax for R is `\(x) x+1` so `\` will just be shorthand for `function`.

roenxi5y ago

The anonymous function change is probably a (small) mistake.

    function(x) {x + 1}

is already logically equivalent to and from some perspectives an arguable syntax improvement on

    \(x) x + 1

Giving everyone two ways of doing one thing just means the tutorials will be fragmented and beginners even more confused.

Tierney mentioned that tidyverse found function(x) too verbose and uses fomula syntax. Given how tidyverse often uses the "y ~ x" formula notation, this might actually be picking up deficiencies in R's macro system rather than in the function notation and the problem got misdagnosed.

kgwgk5y ago

Having the option of writing

    \(x) x+1

instead of

    function(x) x+1

not only saves a few keystrokes.

It will also produce shorter, and clearer, lines of code.

What I don’t understand is the reference to “formula syntax”. What is the issue and how does the new syntax solve it?

1 more reply

laretluval5y ago

If you’re worried about giving programmers too many options, R is already a nightmarish lost cause...

c06n5y ago

Thanks for clearing that up. I was wondering what I have been using all these years in lapply ...

submeta5y ago

Beautiful. - Elixir has this as well. Love it. - In Mathematica you'd have to write `data // f // g` to denote `g(f(data))`.

## Edit

Corrected a notation.

nonfamous5y ago

I don’t know Mathematica, but wouldn’t data // g // f make more sense for f(g(data)) ?

submeta5y ago

You‘re right. That was I typo.

dnautics5y ago

if you use vscode this is an invaluable vscode snippet:

https://slickb.it/bits/70

z3t45y ago

Its only a proposal in JS. For long function chains i like to use intermediate variables as they make the code easier to understand.

antipaul5y ago

Genuinely curious why not combine syntax? Do we need 2 different pipes in R? When to use which? Thanks for your thoughts!

sieste5y ago

wenc's comment (currently top) links to a video where luke tierney explains why the magrittr pipe is not optimal so they are looking for a native solution.

cwyers5y ago

The R "userland" pipe magrittr has been altered to make it more compatible with the proposed base pipe, as well:

https://www.tidyverse.org/blog/2020/11/magrittr-2-0-is-here/

burlesona5y ago

That backslash syntax is pretty funky, but then again all of R is a little funky. Very nice addition to the language though!

joelthelion5y ago

I must say I don't really see the point, given that function() does the same thing, with only a few more characters.

smabie5y ago

that's a lot more characters. useless, unnecessary characters I might add.

bachmeier5y ago

Borrowed from Haskell. (Which of course does not make it unfunky.)

https://wiki.haskell.org/Anonymous_function

lottin5y ago

In R all functions are anonymous functions. Functions are created anonymously and then (usually, but not always) assigned to a variable using the assignment operator.

f6v5y ago

AFAIK the same as in Julia.

kkoncevicius5y ago

I think that symbol is used as an approximation of "lambda" character (λ) on a standard US keyboard.

desine5y ago

It is, and it's not the first language to use it as such. But for many programmers it always triggers the 'escape' alarm in the mind, and it will always cause slight discomfort seeing it used in the raw.

1 more reply

johannes_ne5y ago

Here is a tweet thread explaining this: https://twitter.com/rabaath/status/1335226691304775681?s=20

improbable225y ago

Julia's lambdas look like `x -> f(x,2)`, with no backslashes.

The pipe `x |> f` is indeed the same.

bluenose695y ago

These will both come in pretty handy, although the first is just a formalization of something that's already available in packages, the lambda syntax will clean up code a fair bit, and make realtime analysis easier to type.

saeranv5y ago

Question by someone who is ignorant but interested in functional programming: what is the closest equivalent to these functions in Python? (Or correct me if I'm asking the wrong question).

I used to love using lambdas in Python, along with map/reduce/filter but for whatever reason the Python community has turned against it. Map and filter can now be nicely done with list comprehensions, although I still haven't found a decent one-line equivalent for reduce (other then importing functools).

valzam5y ago

There is none and after having used Elixir for a year going back to Python for anything non-trivial input/output parsing feels really cumbersome now.

Other comments have mentioned that functional idioms make code harder to read for devs unfamiliar with the concepts but the pipe operator IMHO has no downsides (I am not even sure what it really has to do with funcational programming, other than that it happens to be used in more functional languages).

dragonwriter5y ago

> Other comments have mentioned that functional idioms make code harder to read for devs unfamiliar with the concepts but the pipe operator IMHO has no downsides (I am not even sure what it really has to do with funcational programming, other than that it happens to be used in more functional languages).

What it has to do with functional programming is, first, that it's right side operand is a function, and, second, that it's a technique for unrolling the deeply nested function calls common in expression-oriented functional programming without resorting intermediate assignments which are natural in statement-oriented imperative programming but less so for single-user and not independently semantically important values in expression-oriented functional programming.

valzam5y ago

Thanks for the explanation, that makes sense!

RussianCow5y ago

I'm not sure what specific "functions" you're talking about, but Python generally encourages a procedural style of programming as opposed to functional. The rationale is that functional code can be really difficult to read if you aren't already familiar with the idioms and terminology, whereas it's pretty easy to mentally parse and understand a `for` loop. So in that sense, list comprehensions are about as far as Python goes in that direction; there is no syntactic equivalent to the pipe operator, and no way to write reduce or similar operations as succinctly as you can in functional languages.

nightski5y ago

Having read plenty of python numerical code, I'm not sure "easy to parse and understand" is what exactly comes to mind.

empthought5y ago

Most of the problem comes from Pandas which is of course R-inspired.

1 more reply

vizzier5y ago

And just in brief as to what the pipe generally does without the syntax available. If you have functions x and y which take 1 argument. Where |> is the piping syntax

y(1) |> x would be equivalent to x(y(1)) in python.

dragonwriter5y ago

> Question by someone who is ignorant but interested in functional programming: what is the closest equivalent to these functions in Python?

The equivalent to the function of the first is python’s lambda syntax, there's no simple syntax providing the same function as the pipe.

NashHallucinate5y ago

A pipe merely "pipes" the output of one function as an input to another. For example, | in bash. In Python this can be done the trivial way (by composing) or by using decorators.

dragonwriter5y ago

Yes, it can be done the trivial way in most languages. For deep nestung, that's ugly and awkward, which is why some languages have piping/composition operators [or threading macros] (sometimes more than one). Python has no close equivalent of a piping operator or threading macro (decorators don't seem helpful at all here.)

syntonym25y ago

You might be interested in coconut [0] which extends python with functional concepts and compiles to python.

[0] http://coconut-lang.org/

identity05y ago

I wish more languages gave us a "|>" operator. Too many languages settle with dot notation, which confuses encapsulation/method calling with syntactical convenience.

civilized5y ago

This, along with inferior metaprogramming affordances, is the biggest reason pandas will never be as productive an analyst tool as R's dplyr. In R you can pipe anything into anything. In pandas you're stuck with the methods pandas gives you.

This is also why pandas API is so bloated. A lot of pandas special-case functionality, like dropping duplicate rows, can be replicated in R by chaining together more flexible and orthogonal primitives.

kaitai5y ago

As someone who is now bouncing back & forth between Python and R on a weekly basis, I've been surprised (after making fun of R sometimes) how much I miss the piping when I leave R for Python. Pandas seems so inflexible by comparison, so nitpicky for little gain. I've been surprised again and again how much dplyr supports near-effortless fluency and productivity.

Never really thought I'd be writing that in a public forum.

epistasis5y ago

Personally, I think R as a language is absolutely beautiful. The libraries have tons of warts, and many different styles. And there are perhaps too many object orientation systems built on it. But that you could even build multiple object oriented systems points to how powerful the language is.

I think it's functional orientation and the way that for loops are neglected make it seem completely insane to many people. But this is a far smaller fraction of people today than it was in 2000. back then, Java and C++ didn't even have lambdas. Since then, procedural languages have gained a lot the "mind-breaking" functional features of Lisp-derived languages. Python and JavaScript have become far more common. All the things that made the language of R "weird" and unusable to the Java/C++ crowd have been adopted elsewhere.

smabie5y ago

I'm not an R user, but you should try Julia for data analysis. It seems as flexibility (maybe more) than R, while also having blazing performance.

I do like Pandas concept of row indices, which I know Julia (and I believe, R) lack.

3 more replies

smabie5y ago

I think the obvious limitations of Python is a big reason, but probably not the main reason why Pandas isn't orthogonal. The reason why Pandas is such an ungodly mess is because it must be, in order to be even halfway efficient. When you do try to compose things, or even have the audacity to use a python lambda or an if statement or whatever, you suddenly suffer a 100x slowdown in performance.

Julia doesn't have these problems, and I've found it so much nicer to use for data analysis. You can even call Python libs, if you really have to.

harryposner5y ago

Pandas does have the .pipe() method [0], which allows you to put an arbitrary callable in a method chain, but it is a bit more cumbersome than in R.

[0] https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

smabie5y ago

Except you can't actually use it, because it will kill the performance of your program.

wodenokoto5y ago

Kind of odd they didn't decide to go with the magrittr syntax, which is in common use and heavily promoted in dplyr / tidyverse.

I wonder if RStudio will change its `ctrl` + `shift` + m shortcut from the magrittr ( %>% ) style pipes to these new pipes ( |> )

vharuck5y ago

I'm sure they wanted to not replace magrittr pipes. R is introspective, and some reckless people (like myself) will mess with functions' guts in a few scripts. Replacing the `%>%` function with a syntax symbol will break those scripts. Even with scripts that don't metaphorically shove their hands down the garbage disposal, programmers might've relied on certain behaviors of the magrittr pipe.

The R Core team is very cautious about breaking anything, including universally acknowledged terrible defaults (I never thought `stringsAsFactors = TRUE` would ever go away). They know the majority of R programmers are not experts in programming. These users just want to write a script, debug it, and then use it for years with complete trust in the results.

ImaCake5y ago

It seems to have been worth the caution. R has a great reputation for stability. The contrast between my experiences in R and python datascience tools is stark. Pandas syntax has changed wildly since I started learning it, but R and tidyverse hasn’t really changed at all. Admittedly pandas was in rapid and early development at the time.

disgruntledphd25y ago

I think the R reputation for stability is entirely driven by CRAN. If your package doesn't build on the latest version of R, it is marked unavailable. This means that people can build on R-current, in a way that simply isn't possible with the state of python packaging.

Maybe Python just needs a bigger repository with more stringent rules for what will be allowed?

1 more reply

johnmyleswhite5y ago

Isn't the reason that magrittr was able to use that syntax that it already existed, so your proposal is a breaking change?

wodenokoto5y ago

I thought %% was some sort of macro expansions and that is how magrittr creates pipes. But browsing through the magrittr github repo, it looks like they just define %>% and the other pipes as functions [1].

I don't actually know the relation between magrittr and the RStudio shortcut, but I've always assumed a shortcut for typing the pipe characters exist because RStudio employs Hadley Wichkham, who in turn is really big on tidyverse and pipes.

Would %>% mean anything in R if you didn't import magrittr?

[1] https://github.com/tidyverse/magrittr/blob/8b3d510f2a333b224...

kkoncevicius5y ago

All %fun% constructs are just simple functions that can be created by the user and can be used as infix operators. Base R has a few of those: %in%, %o%, %*%, %x%, %%, %/%.

magrittr created %>% which, when used in infix: x %>% f() calls the function on the right side with the argument on the left side f(x).

There are package that provide tons more. For example: https://github.com/moodymudskipper/inops . And you can easily create your own:

    `%sum%` <- function(x, y) x + y

    1 %sum% 2
    [1] 3

kilbuz5y ago

To define a new binary operator in R (which are just functions), it must be between % characters.

%>% does not mean anything in base R.

_2d305y ago

You can't do that if you wanted it to be a symbol of its own. The use of wrapping "%" is native functionality so making `%>%` into a symbol would break the consistency of that.

grayclhn5y ago

I'm pretty happy that they didn't, because silently replacing one operation with a similar operation that inevitably has different bugs and different ways of handling edge cases would be pretty frustrating. Letting both live side-by-side as people transition would be my preference.

SubiculumCode5y ago

argh. and here I've been typing %>%. thanks!!!

also just discovered that ctrl shift m in firefox does something weird, looks like mobile view or something..

civilized5y ago

This sounds promising, but how do we type that pipe easily if we're going to be using it all the time? I actually like %>% because it's easy to reach the keys and hammer it out. Agreed on the ugly stack traces.

ellisv5y ago

Most likely via RStudio or VS Code hotkeys. It might replace Ctrl/Cmd+Shift+M

SubiculumCode5y ago

Your hands are obviously quite different than mine, because |> is so much easier to reach than %>%

kgwgk5y ago

It may be the keyboard layout which is different.

tpoacher5y ago

For people asking how one could possibly ever manage to chain operations in python without a pipe operator: https://github.com/tpapastylianou/chain-ops-python

Same principle in octave: https://github.com/tpapastylianou/chain-ops-octave

kgwgk5y ago

The lambda thing may be useful. Sometimes I was tempted to do something like this

  > f <- function(expr) eval(substitute( function(x) expr ))
  
  > sapply(1:4, f({ a <- x^2 ; b <- x^3 ; a+b }))
  [1]  2 12 36 80

but I'm not sure it would have worked well. The new syntax is of course also more flexible.

iaw5y ago

I've literally written something similar 4 times in the last month, I hate having to load the temp function to be sure the next line runs. This is such a useful addition.

clircle5y ago

I think I'm going to appreciate the native pipes, it will likely improve the readability of my data.table chains.

dm3195y ago

Do you use the Magrittr pipes? Works well with data.table for me.

clircle5y ago

I've seen them used with data.table, but I don't use them myself. Reason being that I don't want to load a lib just to make my chains look a bit better. I usually have chains short enough that it's okay doing

DT[..][..][..]

arthurcolle5y ago

This is awesome. Wonder if they picked up on that specific syntactic operator from Elixir or from another world.

stewbrew5y ago

Well, afaik it isn't really a pipe but syntactic sugar. A pipe streams data from one output stream to an input stream. This rewrites the code as if the input were passed as an argument.

_2d305y ago

That's exactly right. This is useful in the context of a large portion of R's most popular data wrangling packages known as the "Tidyverse". These packages used an equivalent pipe function that was non-native and had some perf and understandability issues.

epistasis5y ago

This is not a pipe -René Magritte

r-w5y ago

Julia crew, where you at?

ellisv5y ago

:wave:

But I’m also part of the R crew so :shrug:

f6v5y ago

Better late than never, but dplyr solved the pipe problem long time ago.

kkoncevicius5y ago

There is a big difference between what is allowed from within a package and what is possible in native implementation. As an example - currently this pipe operator seems to be implemented at the parser level. The parser simply takes the pipe expression and translates it into standard nested function call f(g(x)). Which means - there will be almost no cost in speed (%>% was notoriously slow). In addition the user will get the usual error stack in case something within this pipe fails.

melling5y ago

i’m new to R but I really prefer the |> to %>%

Much better to use the OCaml, F# syntax.

Think Julia uses it too.

data_ders5y ago

Agreed but |> instead of %>% is certainly easier to type! I’ll be interested in trying it out at least

tpoacher5y ago

nitpicking, but this is not a lambda. it's an anonymous function.

clircle5y ago

Care to elaborate?

tpoacher5y ago

A lambda, at least as understood in a functional programming context, is pure.

Whereas what is proposed here is simply syntactic sugar for creating an anonymous function; from the little that is said in the announcement there is no reason to think this syntax would provide any guarantees that state changes due to lexical scoping won't affect the function's output.

Huntsecker5y ago

personally Im surprised R is still in active development when the main use case for people to use R (at least when I was using it) was for statistical analysis. Python with its libraries (a lot I believe ported from R) just does is nicer, and faster.

peatmoss5y ago

R vs. Python flamewars always strike me as a Budweiser vs. Miller kind of argument. Neither is really a “craft beer” of programming languages. Neither are super remarkable as programming languages. Both made a bunch pragmatic tradeoffs to appeal to large audiences that share similar values—both are “average joe” beers.

Python has comparative advantages over R in production roles. R has comparative advantage in statistical libraries, visualization, and meta programming. Neither are exemplars for production deployment or meta programming (R is an exemplar for stats libraries however).

canjobear5y ago

Tidyverse absolutely has a hipster craft beer feel to it. I think it's great, but it's true.

ct05y ago

There is nothing more hip than library(tidyverse) that I've found in python.

civilized5y ago

I'm really into this package that lets you manipulate tabular data using dozens of different systems with the exact same code

Yeah, you've probably never heard of it

civilized5y ago

Nah, it's not nicer. dplyr is way better than pandas. But there is no end to the supply of Python fanbois who only know Python and assume that whatever's in Python just has to be better

CameronNemo5y ago

I don't mind pandas so much, although dplyr is quite nice IMO (feels like natural language and declarative/SQL like, whereas pandas ends up with lots of procedural idioms).

ggplot is something that I don't think matplotlib is comparable to at all, though. I am so much faster at iterating on a visualization with R/ggplot than Python/matplotlib. Maybe it is my tooling, though. How about others who have used both? What are your experiences?

dm3195y ago

No, same here. I tried to recreate some covid rate graphs in python. The ggplot code did facetting and fitted a LOESS to the data. Nothing ground breaking, but it really hit the limits of what seaborn was able to do, and I wasn't able to tinker with it much further. It got to the point where to make it look good I needed to calculate all the curves manually.

t_serpico5y ago

ggplot >> matplotlib and dplyr >> pandas. its not even close imo.

hated5y ago

Pandas is used in some top 10 banks for analytics. Its performance is abysmal at the scale used there. Nobody wants to invest resources in training analysts to write high performance code so here we are. I have never viewed SQL more highly after seeing the mess that analysts make when writing imperative code.

civilized5y ago

No surprise there - pandas encourages ugly, inefficient code with its bloated, unintuitive API.

Once I was a lead on a new project and asked the intern to write some basic ETL code for data in some spreadsheets. I said she could write it in Python if she wanted, because "Python is good for ETL", right?

This intern was not dumb by any means, but she wrote code that took 5 minutes to do something that can be done in <1 second with the obvious dplyr approach.

Also, if your bank analysts pick up dplyr, they can use dbplyr to write SQL for them :)

2 more replies

smabie5y ago

Pandas/python is amazingly prevalent at trading firms. And everyday, we bitch about the performance, we bitch about the stupid API, we bitch about the GIL, the lack of expressiveness. The list goes on and on. But for some braindead reason, we never switch to Julia. It's masochistic.

1 more reply

fithisux5y ago

I use Python at work, but R is the uber weapon.

SubiculumCode5y ago

yeah. Pick one, learn it, and you'll be fine, no patter if you chose Python or R.

Personally, I prefer R for my use case which is longitudinal analysis of experimental data.

canjobear5y ago

About 8 years ago I agreed with this point, but with the development of tidyverse, R has become far superior to Python for anything involving dataframes.

I teach classes involving data analysis, some in Python and some in R (different topics). The amount of time the Python students spend fighting pandas---looking up errors, trying to parse the docs, trying out new arcane indexing strategies---is obscene. On the other hand, the R students progress rapidly. I'd move everything to R if I could, but Python is still better for NLP pipelines.

iaw5y ago

I know R because that's what we used at my first company. I would love to switch to Python/Pandas but I'm comfortable with R and it does everything I need it to with one exception over ten years of heavy use.

Python is wonderful but the cognitive load for switching in industry and academia without a clear cost benefit isn't worth it to most people I know in my shoes. I encourage new coders to learn Python but discounting R feels a bit asinine.

Hadley is still actively doing work for R which has led to a graphing packages that is substantially better than anything in Python (last I check). I have no doubt that Python will steal it and implement it eventually (as they should) but R is still doing firsts that Python hasn't (note the native implementation of Piping, they're late to the party on lambda functions obviously)

Icathian5y ago

I made the switch years ago and there is lots that python does better. I really, really wish for a perfect port of dplyr and ggplot2. Those are what I truly miss, everything else I'm pretty happy with.

rjmorris5y ago

plotnine isn't a perfect port of ggplot2, but it's pretty close. https://plotnine.readthedocs.io/en/stable/

civilized5y ago

It will never happen. Python doesn't trust programmers with the power to make packages like dplyr and ggplot

civilized5y ago

R already has a better lambda than Python, simply by virtue of having first class functions. This is just a bit shorter notation for something that already existed.

1 more reply

zwaps5y ago

I use Python whenever I can, but R has loads and loads of statistical libraries that Pyrhon doesn’t. It is not even close.

Emphere5y ago

Yeah, basically this. I assume HN has a higher number of people who work in ML jobs in fields like finance etc. If you're working in any sort of social/public health research, then most new methods seem to be implemented as R packages. I'm thinking of things like new methods for propensity score, sequential trial designs etc. Also seems to be the preferred language on the Stats Stack Exchange posts.

free2OSS5y ago

What kind of stat problems?

Also I used to love Python... Until I got a full time job and learned why static typing exists.

zwaps5y ago

Any sort of statistical or econometric estimator is typically published as an R package.

So for example, I recently saw a paper with a quite complex estimator based on dynamic panels and network (or spacial) interdependence that could identify missing network ties. For that, an R package exists.

If you want to use it in Python, you'd have to replicate a whole estimation infrastructure yourself, starting by extending the basic models in statsmodels.

That example is quite typical in my opinion.

Like I said, really like to code in Python and I don't like R all that much. But if someone says: "Why would you use R, Python is better", then we can confidently say the person does not know what R is actually used for.

j / k navigate · click thread line to collapse

144 comments

wenc5y ago

I'm sure some of us who are out of the loop might be wondering: what about the magrittr pipe operator (%>%) that we all know and love?

Luke Tierney explains the move from %>% to a native pipe |> here [1]. The native pipe aims to be more efficient as well as addresses issues with the magrittr pipe like complex stack traces.

Turns out the |> syntax is also used in Julia, Javascript and F#.

The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

[1] https://www.youtube.com/watch?v=X_eDHNVceCU&feature=youtu.be...

Cybiote5y ago

In the case anyone is curious about the origin of the (|>) pipeline symbol:

Although F# is its most well known early popularizer, it originated in Isabelle/ML, 1994, proposed by Tobias Nipkow.

Here is a blog post by Don Syme which embeds the email thread of its invention: https://web.archive.org/web/20190217164203/https://blogs.msd...

It's a fascinating look through time.

mckirk5y ago

Woah, I hadn't known that!

And given that I'm currently staring at Isabelle code most of the day for my Master's thesis at the chair of Prof. Nipkow, it's sightly surreal to learn about this here, heh.

cwyers5y ago

Thanks, the video helped explain some things, along with this post from the R-devel list:

https://stat.ethz.ch/pipermail/r-devel/2020-December/080173....

a %>% func(x, arg2 = .)

It seems like the native pipe doesn't support a placement argument, but you can use the new, more concise lambda operator:

a |> \(d) func(x, arg2 = d)

disgruntledphd25y ago

The use of "." as an argument is actually probably one of my most common wtf's with pipes in general.

I tend to use it a lot if I'm just piping a vector to base functions (gsub/grep have x as their third argument.

This syntax looks like it makes that a little harder, but the new error messages are going to make everything so much better that I'm totally fine with it.

grayclhn5y ago

It is particularly infuriating in R, because

    lm(y ~ ., data = my_dataframe)

already means "regress the variable y on all other columns in `my_dataframe`." For big, interactive regresions, it's really natural to write

    my_original_dataframe %>%
        do_a_bunch_of_tranformations() %>%
        select(...) %>% # Pull out just the columns you want
        lm(y ~ ., data = .)

1 more reply

dash25y ago

I think magrittr 2.0 has addressed that problem also.

dugmartin5y ago

|> is also used in Elixir where it is implemented as a macro so it’s a little less flexible since it can’t be assigned as a value.

crooked-v5y ago

> Turns out the |> syntax is also used in Julia, Javascript and F#.

Note that for JS it's still just a proposal and has been stuck in an indeterminable bikeshedding phase for most of this year.

pkage5y ago

Admittedly, the `|>` javascript syntax is complicated by unclear async behavior.

I'm excited for it, though, and if the partial application syntax `func(a, ?)` gets ratified then we'll have a nice concise way of describing operations.

amelius5y ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

Personally I think it would be a good idea if you could e.g. configure your keyboard so that AltGr+L produces λ, which you can then use in place of \

But alas, the Haskell community has decided against this:

https://gitlab.haskell.org/ghc/ghc/-/issues/1102

sieste5y ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

The proposed lambda syntax for R is `\(x) x+1` so `\` will just be shorthand for `function`.

roenxi5y ago

The anonymous function change is probably a (small) mistake.

    function(x) {x + 1}

is already logically equivalent to and from some perspectives an arguable syntax improvement on

    \(x) x + 1

Giving everyone two ways of doing one thing just means the tutorials will be fragmented and beginners even more confused.

kgwgk5y ago

Having the option of writing

    \(x) x+1

instead of

    function(x) x+1

not only saves a few keystrokes.

It will also produce shorter, and clearer, lines of code.

What I don’t understand is the reference to “formula syntax”. What is the issue and how does the new syntax solve it?

1 more reply

laretluval5y ago

If you’re worried about giving programmers too many options, R is already a nightmarish lost cause...

c06n5y ago

Thanks for clearing that up. I was wondering what I have been using all these years in lapply ...

submeta5y ago

Beautiful. - Elixir has this as well. Love it. - In Mathematica you'd have to write `data // f // g` to denote `g(f(data))`.

## Edit

Corrected a notation.

nonfamous5y ago

I don’t know Mathematica, but wouldn’t data // g // f make more sense for f(g(data)) ?

submeta5y ago

You‘re right. That was I typo.

dnautics5y ago

if you use vscode this is an invaluable vscode snippet:

https://slickb.it/bits/70

z3t45y ago

Its only a proposal in JS. For long function chains i like to use intermediate variables as they make the code easier to understand.

antipaul5y ago

Genuinely curious why not combine syntax? Do we need 2 different pipes in R? When to use which? Thanks for your thoughts!

sieste5y ago

wenc's comment (currently top) links to a video where luke tierney explains why the magrittr pipe is not optimal so they are looking for a native solution.

cwyers5y ago

The R "userland" pipe magrittr has been altered to make it more compatible with the proposed base pipe, as well:

https://www.tidyverse.org/blog/2020/11/magrittr-2-0-is-here/

burlesona5y ago

That backslash syntax is pretty funky, but then again all of R is a little funky. Very nice addition to the language though!

joelthelion5y ago

I must say I don't really see the point, given that function() does the same thing, with only a few more characters.

smabie5y ago

that's a lot more characters. useless, unnecessary characters I might add.

bachmeier5y ago

Borrowed from Haskell. (Which of course does not make it unfunky.)

https://wiki.haskell.org/Anonymous_function

lottin5y ago

In R all functions are anonymous functions. Functions are created anonymously and then (usually, but not always) assigned to a variable using the assignment operator.

f6v5y ago

AFAIK the same as in Julia.

kkoncevicius5y ago

I think that symbol is used as an approximation of "lambda" character (λ) on a standard US keyboard.

desine5y ago

1 more reply

johannes_ne5y ago

Here is a tweet thread explaining this: https://twitter.com/rabaath/status/1335226691304775681?s=20

improbable225y ago

Julia's lambdas look like `x -> f(x,2)`, with no backslashes.

The pipe `x |> f` is indeed the same.

bluenose695y ago

saeranv5y ago

Question by someone who is ignorant but interested in functional programming: what is the closest equivalent to these functions in Python? (Or correct me if I'm asking the wrong question).

valzam5y ago

There is none and after having used Elixir for a year going back to Python for anything non-trivial input/output parsing feels really cumbersome now.

dragonwriter5y ago

valzam5y ago

Thanks for the explanation, that makes sense!

RussianCow5y ago

nightski5y ago

Having read plenty of python numerical code, I'm not sure "easy to parse and understand" is what exactly comes to mind.

empthought5y ago

Most of the problem comes from Pandas which is of course R-inspired.

1 more reply

vizzier5y ago

And just in brief as to what the pipe generally does without the syntax available. If you have functions x and y which take 1 argument. Where |> is the piping syntax

y(1) |> x would be equivalent to x(y(1)) in python.

dragonwriter5y ago

> Question by someone who is ignorant but interested in functional programming: what is the closest equivalent to these functions in Python?

The equivalent to the function of the first is python’s lambda syntax, there's no simple syntax providing the same function as the pipe.

NashHallucinate5y ago

A pipe merely "pipes" the output of one function as an input to another. For example, | in bash. In Python this can be done the trivial way (by composing) or by using decorators.

dragonwriter5y ago

syntonym25y ago

You might be interested in coconut [0] which extends python with functional concepts and compiles to python.

[0] http://coconut-lang.org/

identity05y ago

I wish more languages gave us a "|>" operator. Too many languages settle with dot notation, which confuses encapsulation/method calling with syntactical convenience.

civilized5y ago

kaitai5y ago

Never really thought I'd be writing that in a public forum.

epistasis5y ago

smabie5y ago

I'm not an R user, but you should try Julia for data analysis. It seems as flexibility (maybe more) than R, while also having blazing performance.

I do like Pandas concept of row indices, which I know Julia (and I believe, R) lack.

3 more replies

smabie5y ago

Julia doesn't have these problems, and I've found it so much nicer to use for data analysis. You can even call Python libs, if you really have to.

harryposner5y ago

Pandas does have the .pipe() method [0], which allows you to put an arbitrary callable in a method chain, but it is a bit more cumbersome than in R.

[0] https://pandas.pydata.org/pandas-docs/stable/reference/api/p...

smabie5y ago

Except you can't actually use it, because it will kill the performance of your program.

wodenokoto5y ago

Kind of odd they didn't decide to go with the magrittr syntax, which is in common use and heavily promoted in dplyr / tidyverse.

I wonder if RStudio will change its `ctrl` + `shift` + m shortcut from the magrittr ( %>% ) style pipes to these new pipes ( |> )

vharuck5y ago

ImaCake5y ago

disgruntledphd25y ago

Maybe Python just needs a bigger repository with more stringent rules for what will be allowed?

1 more reply

johnmyleswhite5y ago

Isn't the reason that magrittr was able to use that syntax that it already existed, so your proposal is a breaking change?

wodenokoto5y ago

Would %>% mean anything in R if you didn't import magrittr?

[1] https://github.com/tidyverse/magrittr/blob/8b3d510f2a333b224...

kkoncevicius5y ago

All %fun% constructs are just simple functions that can be created by the user and can be used as infix operators. Base R has a few of those: %in%, %o%, %*%, %x%, %%, %/%.

magrittr created %>% which, when used in infix: x %>% f() calls the function on the right side with the argument on the left side f(x).

There are package that provide tons more. For example: https://github.com/moodymudskipper/inops . And you can easily create your own:

    `%sum%` <- function(x, y) x + y

    1 %sum% 2
    [1] 3

kilbuz5y ago

To define a new binary operator in R (which are just functions), it must be between % characters.

%>% does not mean anything in base R.

_2d305y ago

You can't do that if you wanted it to be a symbol of its own. The use of wrapping "%" is native functionality so making `%>%` into a symbol would break the consistency of that.

grayclhn5y ago

SubiculumCode5y ago

argh. and here I've been typing %>%. thanks!!!

also just discovered that ctrl shift m in firefox does something weird, looks like mobile view or something..

civilized5y ago

ellisv5y ago

Most likely via RStudio or VS Code hotkeys. It might replace Ctrl/Cmd+Shift+M

SubiculumCode5y ago

Your hands are obviously quite different than mine, because |> is so much easier to reach than %>%

kgwgk5y ago

It may be the keyboard layout which is different.

tpoacher5y ago

For people asking how one could possibly ever manage to chain operations in python without a pipe operator: https://github.com/tpapastylianou/chain-ops-python

Same principle in octave: https://github.com/tpapastylianou/chain-ops-octave

kgwgk5y ago

The lambda thing may be useful. Sometimes I was tempted to do something like this

  > f <- function(expr) eval(substitute( function(x) expr ))
  
  > sapply(1:4, f({ a <- x^2 ; b <- x^3 ; a+b }))
  [1]  2 12 36 80

but I'm not sure it would have worked well. The new syntax is of course also more flexible.

iaw5y ago

I've literally written something similar 4 times in the last month, I hate having to load the temp function to be sure the next line runs. This is such a useful addition.

clircle5y ago

I think I'm going to appreciate the native pipes, it will likely improve the readability of my data.table chains.

dm3195y ago

Do you use the Magrittr pipes? Works well with data.table for me.

clircle5y ago

DT[..][..][..]

arthurcolle5y ago

This is awesome. Wonder if they picked up on that specific syntactic operator from Elixir or from another world.

stewbrew5y ago

Well, afaik it isn't really a pipe but syntactic sugar. A pipe streams data from one output stream to an input stream. This rewrites the code as if the input were passed as an argument.

_2d305y ago

epistasis5y ago

This is not a pipe -René Magritte

r-w5y ago

Julia crew, where you at?

ellisv5y ago

:wave:

But I’m also part of the R crew so :shrug:

f6v5y ago

Better late than never, but dplyr solved the pipe problem long time ago.

kkoncevicius5y ago

melling5y ago

i’m new to R but I really prefer the |> to %>%

Much better to use the OCaml, F# syntax.

Think Julia uses it too.

data_ders5y ago

Agreed but |> instead of %>% is certainly easier to type! I’ll be interested in trying it out at least

tpoacher5y ago

nitpicking, but this is not a lambda. it's an anonymous function.

clircle5y ago

Care to elaborate?

tpoacher5y ago

A lambda, at least as understood in a functional programming context, is pure.

Huntsecker5y ago

peatmoss5y ago

canjobear5y ago

Tidyverse absolutely has a hipster craft beer feel to it. I think it's great, but it's true.

ct05y ago

There is nothing more hip than library(tidyverse) that I've found in python.

civilized5y ago

I'm really into this package that lets you manipulate tabular data using dozens of different systems with the exact same code

Yeah, you've probably never heard of it

civilized5y ago

Nah, it's not nicer. dplyr is way better than pandas. But there is no end to the supply of Python fanbois who only know Python and assume that whatever's in Python just has to be better

CameronNemo5y ago

I don't mind pandas so much, although dplyr is quite nice IMO (feels like natural language and declarative/SQL like, whereas pandas ends up with lots of procedural idioms).

dm3195y ago

t_serpico5y ago

ggplot >> matplotlib and dplyr >> pandas. its not even close imo.

hated5y ago

civilized5y ago

No surprise there - pandas encourages ugly, inefficient code with its bloated, unintuitive API.

This intern was not dumb by any means, but she wrote code that took 5 minutes to do something that can be done in <1 second with the obvious dplyr approach.

Also, if your bank analysts pick up dplyr, they can use dbplyr to write SQL for them :)

2 more replies

smabie5y ago

1 more reply

fithisux5y ago

I use Python at work, but R is the uber weapon.

SubiculumCode5y ago

yeah. Pick one, learn it, and you'll be fine, no patter if you chose Python or R.

Personally, I prefer R for my use case which is longitudinal analysis of experimental data.

canjobear5y ago

About 8 years ago I agreed with this point, but with the development of tidyverse, R has become far superior to Python for anything involving dataframes.

iaw5y ago

Icathian5y ago

rjmorris5y ago

plotnine isn't a perfect port of ggplot2, but it's pretty close. https://plotnine.readthedocs.io/en/stable/

civilized5y ago

It will never happen. Python doesn't trust programmers with the power to make packages like dplyr and ggplot

civilized5y ago

R already has a better lambda than Python, simply by virtue of having first class functions. This is just a bit shorter notation for something that already existed.

1 more reply

zwaps5y ago

I use Python whenever I can, but R has loads and loads of statistical libraries that Pyrhon doesn’t. It is not even close.

Emphere5y ago

free2OSS5y ago

What kind of stat problems?

Also I used to love Python... Until I got a full time job and learned why static typing exists.

zwaps5y ago

Any sort of statistical or econometric estimator is typically published as an R package.

If you want to use it in Python, you'd have to replicate a whole estimation infrastructure yourself, starting by extending the basic models in statsmodels.

That example is quite typical in my opinion.

j / k navigate · click thread line to collapse