The thing that keeps me coming back to Julia is the ability to pipe (or whatever you want to call it). It makes DataFrame operations a lot cleaner since I don't need to modify in place or create new DFs at intermediate steps in a process. Here's a video showing this sort of workflow in R:
Julia has a pipe syntax (|>). But I think the bigger part here is more generally APIs built around it, which people are doing some things to port tidy syntax (https://github.com/TidierOrg/Tidier.jl).
I'm still not entirely convinced that pipes aren't an anti-pattern. Absolutely an improvement over nested function calls:
a(b(c(d))) vs d |> c |> b |> a
but I'm not convinced pipes are better than more verbose code that explains each step:
step1 = c(d)
step2 = b(step1)
result = a(step2)
I've written a lot of tidy R and do understand the specific use cases where it really doesn't make sense to use the more verbose format, but generally find when I'm building complex mathematical models the verbose method is much easier to understand.
I think having intermediate variables is sort of 'littering', and requires extra work in the naming which might not be necessary. Also, with pipes, you can just take out any intermediate step by commenting out a line or deleting it. You cannot do this with your method above without then going and rewriting many different arguments. I also like piping because you can quickly increment and build a solution - quicker than naming intermediate steps anyway.
Naming intermediate steps require some non-trivial efforts. It can even distract from the main task of getting the results.
In programming the code will be read multiple times and good names will help the future readers. But in data science the calculation will be most likely will not be reused. So efforts to name things will be waste of time.
I suggest that trying to strictly only bind output to a symbol if it will be used in multiple places.
So when I read code and I see some "intermediary" value bound - it tells me immediately "this thing will be used in several spots". Thereby bindings actually start to convey extra information
Anyway, it's just something that's worked for me. In all other scenarios I will use threading/pipeline (maybe Clojure specific). If steps are confusing/complex then you make a local named lambda or add in the extreme case.. comments
If nothing else, you can just pipe the code and then write comments explaining what's left after each step. But the verbose code can be substantially slower (which happens when piping can be used to perform all these operations lazily).
> The thing that keeps me coming back to Julia is the ability to pipe
> Provides link to R.
Is there an example of this in Julia? I use R now, and every time I give Julia a shot I go back to R because of the insane TTFP. I don't use anything remotely close to big data, and the 90-120s compile times just to replot my small data (using AlgebraOfGraphics.jl in a Pluto notebook) just kill me.
Did you try v1.9 or v1.10 yet? From others I'm hearing that the code caching changed Makie from about 70 seconds down to 10 in v1.9, and then the loading time improvements brought it to like 5 (unreleased of course, though v1.10 should be branching in a few weeks). Makie load times were of course one of the ones highlighted in the release notes of v1.9: https://julialang.org/blog/2023/04/julia-1.9-highlights/. So while Makie won't be "instant" by v1.10 (<1 second), it was one of the worst offenders before and has gone from "wtf" to "bad but manageable".
Of course you can. In fact I'm doing just that in two places in the example.
(Yes I know what you mean, but yes you know what I mean!)
In the end, chains are about readability and logical flow; even if you don't like pre-wrapping in more meaningfully named functionals like the example, and accept the slight readability cost of using the occasional in-spot lambda or partial, I feel that this still becomes a lot more readable than "treat this symbol unconventionally in this context as a positional placeholder" hacky syntax stuff.
The flip side is that in pandas, chaining is less uniform because it is based on methods.
In R you can pipe a data frame into any function from any package or one you just wrote, so you use %>% for any piping that happens. In pandas, you have special pandas methods that don't need the pipe, but to pipe with any other function, you have to write .pipe.
The comparison is not really between %>% and ., it's between "you just use %>% for everything" and "you use . for a bloated, somewhat arbitrary collection of special pandas methods, and .pipe for everything else".
The sad thing about the conventional object-oriented programming paradigm is how it put the really cool syntactic idea of piping/chaining in the straitjacket of classes and objects.
The ability to pipe shouldn't be tied to whether a function is a method of a class.
https://youtu.be/W3e8qMBypSE