df = tibble(a = c(1, 2))
and you want to use a dplyr verb to modify it mutate(df, b = a + 1)
the `a` in the above expression refers to the column in `df`, but this means it's hard to reference a variable in the outer scope named `a`. Furthermore, if you have a string referring to the column name `"a"`, you can't simply write mutate(df, b = a_var + 1)
Contrast this with DataFramesMeta.jl, which is a dply-like library for Julia, written with macros. df = DataFrame(a = [1, 2])
@transform df :b = :a .+ 1
Because of the use of Symbols, there is no ambiguity about scopes. To work with a variable referring to column `a` you can write a_str = "a"
@transform df :b = $a_str .+ 1
I won't pretend this isn't more complicated or harder to learn. Some of the complexity is due to Julia's high performance limiting non-standard evaluation in subtle ways. But a core strength of Julia's macros is that it's easy to inspect these expressions and understand exactly what's going on, with `@macroexpand` as shown in the blog post.DataFramesMeta.jl repo: https://github.com/JuliaData/DataFramesMeta.jl
mutate(df, b = .env$a + 1)
And if you have a string (contained in a_var) which identifies a variable you can do mutate(df, b = .data[[a_var]] + 1)
You could argue these feel clumsy, but I wouldn’t say it’s “hard” to do either of these things with dplyr.However, both patterns are another special case how identifiers are resolved in the expression. Aren't `.env` and `.data` both valid variable and column names? So what happens if I have a column named `.data`?
Another example, which is the reason why we chose the `:column` style to refer to columns in `DataFramesMeta.jl` and `DataFrameMacros.jl`:
What happens if you have the expression `mutate(df, b = log(a))`. Both `log` and `a` are symbols, but `log` is not treated as a column. Maybe that's because it's used in a function-like fashion? Maybe because R looks at the value of `log` and `a` in their scope and sees that `log` is a function an `a` isn't?
In Julia DataFrames, it's totally valid to have a column that stores different functions. With the dplyr like syntax rules it would not be possible to express a function call with a function stored in a column, if the pattern really is that function syntax means a symbol is not looked up in the dataframe anymore.
In Julia DataFrameMacros.jl for example, if you had a column named `:func` you could do `@transform(df, :b = :func(:a))` and it would be clear that `:func` resolves to a column.
This particular example might seem like a niche problem, but it's just one of these tradeoffs that you have to make when overloading syntax with a different meaning. I personally like it if there's a small rule set which is then consistently applied. I'd argue that's not always the case with dplyr.
.data[[a_var]]
?But in general yeah, R plays pretty fast and loose with scopes, and lets you capture expressions as arguments and execute them in a different scope from the outside one