The thing that keeps me coming back to Julia is the ability to pipe (or whatever...

ChrisRackauckas · on May 17, 2023

Julia has a pipe syntax (|>). But I think the bigger part here is more generally APIs built around it, which people are doing some things to port tidy syntax (https://github.com/TidierOrg/Tidier.jl).

joeman1000 · on May 17, 2023

This is very similar to DataFramesMeta:

https://github.com/JuliaData/DataFramesMeta.jl

time_to_smile · on May 17, 2023

I'm still not entirely convinced that pipes aren't an anti-pattern. Absolutely an improvement over nested function calls:

a(b(c(d))) vs d |> c |> b |> a

but I'm not convinced pipes are better than more verbose code that explains each step:

step1 = c(d)

step2 = b(step1)

result = a(step2)

I've written a lot of tidy R and do understand the specific use cases where it really doesn't make sense to use the more verbose format, but generally find when I'm building complex mathematical models the verbose method is much easier to understand.

joeman1000 · on May 17, 2023

I think having intermediate variables is sort of 'littering', and requires extra work in the naming which might not be necessary. Also, with pipes, you can just take out any intermediate step by commenting out a line or deleting it. You cannot do this with your method above without then going and rewriting many different arguments. I also like piping because you can quickly increment and build a solution - quicker than naming intermediate steps anyway.

_0w8t · on May 17, 2023

Naming intermediate steps require some non-trivial efforts. It can even distract from the main task of getting the results.

In programming the code will be read multiple times and good names will help the future readers. But in data science the calculation will be most likely will not be reused. So efforts to name things will be waste of time.

geokon · on May 19, 2023

I suggest trying to lean into it more

I suggest that trying to strictly only bind output to a symbol if it will be used in multiple places.

So when I read code and I see some "intermediary" value bound - it tells me immediately "this thing will be used in several spots". Thereby bindings actually start to convey extra information

Anyway, it's just something that's worked for me. In all other scenarios I will use threading/pipeline (maybe Clojure specific). If steps are confusing/complex then you make a local named lambda or add in the extreme case.. comments

Max-Limelihood · on May 22, 2023

If nothing else, you can just pipe the code and then write comments explaining what's left after each step. But the verbose code can be substantially slower (which happens when piping can be used to perform all these operations lazily).

dillydogg · on May 19, 2023

> The thing that keeps me coming back to Julia is the ability to pipe

> Provides link to R.

Is there an example of this in Julia? I use R now, and every time I give Julia a shot I go back to R because of the insane TTFP. I don't use anything remotely close to big data, and the 90-120s compile times just to replot my small data (using AlgebraOfGraphics.jl in a Pluto notebook) just kill me.

ChrisRackauckas · on May 21, 2023

Did you try v1.9 or v1.10 yet? From others I'm hearing that the code caching changed Makie from about 70 seconds down to 10 in v1.9, and then the loading time improvements brought it to like 5 (unreleased of course, though v1.10 should be branching in a few weeks). Makie load times were of course one of the ones highlighted in the release notes of v1.9: https://julialang.org/blog/2023/04/julia-1.9-highlights/. So while Makie won't be "instant" by v1.10 (<1 second), it was one of the worst offenders before and has gone from "wtf" to "bad but manageable".

dillydogg · on May 22, 2023

I haven't! I didn't realize that code caching was part of 1.9. Looks like I'll have to check it out. Thanks

tpoacher · on May 19, 2023

I just use a simple chaining function for python like so https://sr.ht/~tpapastylianou/chain-ops-python/

joeman1000 · on May 19, 2023

A neat solution, but you can’t alter the position of the argument per function.

tpoacher · on May 20, 2023

Of course you can. In fact I'm doing just that in two places in the example.

(Yes I know what you mean, but yes you know what I mean!)

In the end, chains are about readability and logical flow; even if you don't like pre-wrapping in more meaningfully named functionals like the example, and accept the slight readability cost of using the occasional in-spot lambda or partial, I feel that this still becomes a lot more readable than "treat this symbol unconventionally in this context as a positional placeholder" hacky syntax stuff.

stoniejohnson · on May 17, 2023

Is that not also available in Pandas?

https://pandas.pydata.org/docs/reference/api/pandas.DataFram...

joeman1000 · on May 17, 2023

Yes, but try using this and then try Julia's way. I tried this pandas implementation once and never touched it again.

chaxor · on May 17, 2023

In pandas you can chain commands by wrapping the whole command in (). Personally IMO looks far 'cleaner' than all of the ugly %>% everywhere.

civilized · on May 17, 2023

The flip side is that in pandas, chaining is less uniform because it is based on methods.

In R you can pipe a data frame into any function from any package or one you just wrote, so you use %>% for any piping that happens. In pandas, you have special pandas methods that don't need the pipe, but to pipe with any other function, you have to write .pipe.

The comparison is not really between %>% and ., it's between "you just use %>% for everything" and "you use . for a bloated, somewhat arbitrary collection of special pandas methods, and .pipe for everything else".

civilized · on May 17, 2023

The sad thing about the conventional object-oriented programming paradigm is how it put the really cool syntactic idea of piping/chaining in the straitjacket of classes and objects.

The ability to pipe shouldn't be tied to whether a function is a method of a class.

joeman1000 · on May 17, 2023

What do you mean by wrapping the command in ()? I haven't seen this before. Do you have a link to where they mention this in the docs?