Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> “ Many of the large tech companies are investing in alternative languages such as Swift and Julia in order to build the next iterations of these libraries because of the hard limitations of CPython.”

I would be interested in evidence of this. I work at a big tech company in ML and scientific computing, and have close peers and friends in similar leadership roles of this tech in other big companies, FAANG, finance, biotech, large cloud providers, etc.

I am only hearing about the adoption of CPython skyrocketing and continued investment in tools like numba, llvm-lite and Cython. At none of these companies have I heard any interest in Julia, Swift or Rust development for these use cases, and have heard many, many arguments for why Python wins the trade-offs discussion against those options.

In fact, two places I used to work for (one a heavy Haskell shop and one a heavy Scala shop) are moving away from those languages and to Python, for all kinds of reasons related to fundamental limitations of static typing and enforced (rather than provisioned) safety.

I mean, Haskell & Scala are great languages and so are Julia & Swift. But even though in some esoteric corners people have started to disprefer Python, it’s super unrealistic to suggest there’s large movement away from it. Rather there’s a large move to it.

It reminds me of the Alex Martelli quote from Python Interviews,

> “Eventually we bought that little start-up and we found out how 20 developers ran circles around our hundreds of great developers. The solution was very simple! Those 20 guys were using Python. We were using C++. So that was YouTube and still is.”



I work in HFT and now use Julia for all my research, and a couple of my colleagues now do too. Personally I'd rather retire and farm goats than go back to having to write Python professionally: there's just soo much that can go wrong that doesn't happen in a typed language, so much unnecessary stuff you have to keep in your head when coding. It also seems incredibly counterproductive to use a language that's 100x slower than necessary just because it's the only language some people know; the difference in research speed between having to wait one second for a result and ten minutes is massive.

Of course, HFT is a somewhat different usecase than pure ML, as we work with data in a format that's rarely seen elsewhere (high-frequency orderbook tick data). Python's probably less painful for working with data for which somebody else has already written a C/C++ library with a nice API, as then you don't need to write your own C/C++ library and interface with it. My choice is either: write Python, and the research will take 100x longer, write Python and C++, and the development will take 2-4 times longer, or write Julia, and get similar performance to C++ with even faster development time than Python.


Is it just performance that puts you off Python? If so, did you try writing a native extension to accelerate it?

Where i work, we also have analysis work which involves sequentially reading gigabytes of binary data. We came up with a workflow where a tool written in Java runs a daily job to parse the raw data and write a simplified, fixed-width binary version of it, then we have a small Python C extension which can read that format. We can push a bit of filtering down into the C as well.

This has worked out pretty well! We get to write all the tricky parsing in a fast, typesafe language, we get to write the ever-changing analytics scripts in Python, and we only had to write a small amount of C, which is mostly I/O and VM API glue.

The Java and C parts were both written by an intern in a month or two. He's a sharp guy, admittedly, but if an intern can do it, i bet an HFT developer could too.


>Is it just performance that puts you off Python? If so, did you try writing a native extension to accelerate it?

This is what I did originally, but it was way slower to have to write and maintain C++ and a Python interface for it than to just write Julia. Particularly because the some of the business/trading logic basically has to be in the native layer (can't backtest a HFT algo without running it on HFT data, and that volume of high frequency tick data is too slow to process in pure Python).


One of the close colleagues I was alluding to also works in HFT and they do in fact use Cython libraries built in house for extremely low latency applications, order book processing, etc.

Their main alternatives are coding in C++ directly or wholesale switch to Rust, but they prefer Cython. I know they evaluated Julia and found it entirely intractable to use for their production systems.


I'm curious why they found Julia intractable. In my experience it's much quicker to write than Rust, C++ or Cython. It's also much more expressive than Cython.

Is it because they tried embedding it in C++? That can be painful, because it needs its own thread, and can only have one per process, but it's certainly doable.


I’m not sure what you mean by saying Julia is more expressive than Cython, given that Cython is as expressive as C.

In this shop’s particular situation, it’s mostly the switching costs to Julia that cause it to lose the debate. The firm has lots of systems software, data fetchers, offline analytics jobs, research code, etc. With Python & Cython, they easily write all of it in one ecosystem, build shared libraries that span all these use cases, rely on shared testing frameworks, integration pipelines, packaging, virtual envs, etc.

If Julia offered some kind of crazy game changer advantage that required a huge amount of effort to get in Python/Cython, they might consider breaking off some subsystem that has to have new environment management, new tooling, etc., and is not sharable across as many use cases.

But there is no such case. They might get some sort of “5% more generic” or “5% benefit from seamless typing instead of a little rough around the edges typing in Cython”, and these differences would never justify the huge costs of switching or the missing third party packages that are heavily relied on.

I always like to remind people that in any professional setting, ~95% of the software you write is for reporting and testing, and 5% at best is for the actual application. Out of that 5%, another 95% never has serious resource bottlenecks and taking care to write super careful optimized code for the 5% of the 5% can be done in nearly any language. Choose your ecosystem based on what best solves your problems in that other 99.75% of cases.

This is especially true in HFT and quant finance, which is why so many of those firms use Python for everything except the 0.25% of the code where performance is insanely critical, they just use anything that super easily plugs into Python, usually C++ or Cython.


>I’m not sure what you mean by saying Julia is more expressive than Cython, given that Cython is as expressive as C.

I mean expressive in the sense of how much you can get done per unit code/time. Perhaps a better way of phrasing it: for most problems X that I encounter in my work, I can write code in Julia to solve X faster than I could write C/C++ to solve x, and also faster than I could write Cython to solve x. Excellent type inference is a big part of this, along with macros, multiple dispatch, and libraries designed with performance in mind (e.g. https://juliacollections.github.io/DataStructures.jl/latest/...).

>In this shop’s particular situation, it’s mostly the switching costs to Julia that cause it to lose the debate. The firm has lots of systems software, data fetchers, offline analytics jobs, research code, etc. With Python & Cython, they easily write all of it in one ecosystem, build shared libraries that span all these use cases, rely on shared testing frameworks, integration pipelines, packaging, virtual envs, etc.

>I always like to remind people that in any professional setting, ~95% of the software you write is for reporting and testing, and 5% at best is for the actual application. Out of that 5%, another 95% never has serious resource bottlenecks and taking care to write super careful optimized code for the 5% of the 5% can be done in nearly any language. Choose your ecosystem based on what best solves your problems in that other 99.75% of cases.

That makes sense then. In my firm at least (and in my team at least) the case is different: we're mostly full stack, so each member will be responsible for the whole pipeline from research->model_development->backtesting->production_algo_development->algo_testing/initial_trading. In this case 95% of my time is spent writing research code, running research, and writing production code, so if I can double the speed at which my research code runs or double the speed at which I can write it, that translates into a massive increase in my productivity/output.


Did you find something better than tensorflow+python by any chance? I'm desperately looking for something that is mature, stays in loop and does not require me to touch python.


Depends what you're trying to do, but Flux.jl is pretty nice: https://github.com/FluxML/Flux.jl . Failing that, the Julia Python FFI is very good, so it's possible to use PyTorch almost seamlessly in Julia (I previously used Tensorflow 1.x, and it was such a painful experience I'm not brave enough to touch 2.0).


Thanks. 2nd day playing with it and I guess now I'm hooked up on Julia.


Complexity matters. Python originally had success because it was easy to use. It was one of the first languages I learned after C++ to just get stuff done.

People are not leaving Python for Haskell, Rust, Swift and Scala because those languages are too complex to deal with.

However what we see time and again is that languages that are easy to use and which offer real advantages gain adoption. Look at the rising popularity of Go as a good example.

This is why I belive Julia has a real chance against Python. It hits all the right checkboxes: It is easy to learn, simple tools, while also being highly productive and giving great performance.

Sure Python is not suddenly going to get knocked off the crown. But the fact is that Julia has far more growth potential. Packages are built faster as there is no need for C/C++. Packages are combined way easier due to multiple dispatch and lack of C/C++ dependency. This is a hard one to explain in a short text but Julia has a unique ability in how packages can easily combine. Hence with a couple of Julia packages you get squeeze out the same functionality as a dozen Python packages.


I am a researcher at an algo trading shop. We are moving our core libraries from rust+python to Julia.

It really is amazingly powerful and has great interop with python (eg seamless zero copy array sharing).

In the past one had to muck about with cython/c/numpy api to speed things up, now one can just write the functionality in Julia and make it available in both ecosystems.

Python will likely remain the premier language to do research in, but more and more work can move to Julia which is much better at anything not immediately vectorizable.


Great news for me! I actually do Julia training. My colleagues do Python training. But we don't seem to be quit at the critical mass yet to gain a lot of training requests for Julia. But I am an optimist. There seems to be a shift going on. 2020 could be a breakout year for Julia.

A lot of my job the way I see it at the moment is to try to learn about people's experience switching to Julia to help explain to people why it would benefit them.

How did you guy actually end up switching to Julia? Was it a careful analysis ordered from the top, or was there more like some Julia evangelists who bugged people until they tried it an realized it was actually quite useful?

I used to work in the oil industry, and tried to convince people to try Julia. It would have been a huge advantage over our Python interface in terms of performance but it was a very hard sell. People are conservative. They are very reluctant to try new things. So I am always curious how other people pull off making a change happen.


Julia was brought in largely organically. We often need performant non-vectorizable code. Numba is actually a real pain to work with if the code is not explicitly numeric only. It works well for optimizing a hot loop here and there, but anything more complex becomes an exercise in frustration.

So we started experimenting with Julia and it was literally like the best of both worlds. Compile times can still be a struggle sometimes, but we are happy to eat a few secs startup cost when doing research, as most of the time we'd just have an ipython/julia repl going and keep the session open for hours/days.

Most people (me included) weren't prepared to pay the mental tax of writing Rust on potentially throwaway experimental code, so as we worked we realized that the core Rust libs can be easily replaced with much simpler Julia code without any loss in performance.

While the PyO3 library is awesome, it was actually quite difficult to reconcile the safety of Rust with the dynamism of Python and the friction can be really felt in any code dealing with the interop. This is mainly on the Rust side, as it ended up being littered with casting and type checks when communicating with Python. Rust being compiled AOT, a lot of the generics power goes out the window in the interop purely because it is impossible to know at compile time the type of objects coming from python. This has negative implications for performance, because the rust code can't be neatly specialized either, but has to resort to dynamic dispatch and trait objects. Julia wins here due to the JIT compiler that auto-specializes code at run time when the types are known.


I really appreciate folks like the parent here who take the time to highlight specific details like these that arose from their experience with different languages and paradigms.

These are the details that matter.


> This is why I belive Julia has a real chance against Python. It hits all the right checkboxes: It is easy to learn, simple tools, while also being highly productive and giving great performance.

The only thing that makes me anxious about Julia is Dan Luu's complaint that testing is not a big part of the culture. Generally, for code folks are going to rely on to be correct, I want some mechanism for assuring correctness. Types, tests, proofs, whatever, but I want to see them.

Admittedly this is true of a lot of scientific computing, so I may be misplacing my nervousness.


In 2016 yes. But this has been very much addressed. At this point, not only is Julia well-tested, but it's so well-tested that it has to carry around its own patched LLVM and its own math library in order for its full tests to pass, because for example it requires things like sin to be correct to 1ulp which isn't actually true in a lot of implementations! Then when you go to packages, there's a huge culture of testing. Compare OrdinaryDiffEq's (just the ODE part of DifferentialEquations.jl) test suite:

https://travis-ci.org/JuliaDiffEq/OrdinaryDiffEq.jl

which does convergence tests on every algorithm, along with regression tests on every interpolation and analytical value unit tests on each feature, etc. Meanwhile, scipy integrate has some of its own algorithms, but only a few regression tests, and most tests just check that things run: https://github.com/scipy/scipy/tree/master/scipy/integrate/t... . Same with other libraries like torchdiffeq: https://github.com/rtqichen/torchdiffeq/tree/master/tests . So Julia's testing culture is now strict enough that things that are commonly accepted in other programming languages would be a rejected PR due to lack of testing! And for good reason: these tests catch what would be "performance regressions" (i.e. regressions so the algorithm doesn't hit its theoretical properties) all the time!


Thankyou, this is very heartening.


My counterpoint to this would be that everything you need for proper testing exists already in Julia.

Also I would argue that Julia is a much easier language to write correct code in than Python. Python suffers the general OOP problem where it becomes hard to isolate problems due to all the mutations (imperative coding).

While Julia is not a purely functional language, you program in a far more functional style in Julia because it supports it much better.

My own experience when working with both languages suggest that I am able to write more pure functions in Julia than Python. I am able to crank out small isolated functions which I quickly test in my REPL environment as I go.

It is a different way of working. My Python friends write more formalized tests than me as they code. Julia is perhaps more in the LISP tradition. You are continuously writing and testing as you go in the REPL. Some of those test make their way into my test suite but not all.

Because we are generally not writing Server code in Julia testing is less important. If the program crashes so what? What is important is correctness of numerical output. Yes you need tests for that.

But I would speculate that tests needed to verify correctness of numerical calculations are less than tests needed to secure uptime for some server service.


Python is a fad right now, i dont think this will last. And i appreciate python dont mistake my comment. It just happened to have a simple 'ui' and some people made nice libs in it. The language is too fragile to reach more imo. i'd bet on Julia because it has stronger roots even though few care about it.


"nice libs" are worth a lot; consider Fortran. Although quite possibly nice libs that are often just bindings to stuff in other languages have less staying power.


Seeing how it’s been the standard in scientific computing and ML for decades (plural) at this point, I don’t see how it can be a fad.

It obviously might get replaced by new languages and ecosystems, but that’s a huge difference than it being a fad.


Julia is getting a lot of adoption outside of the tradition CS crowd, so data scientists who would otherwise might be using R or Matlab, or Mathematica. I’m not really sure who is adopting Swift yet, beyond iOS devs.


yep, it is the current state of things. However, it would be better to have all this huge ecosystem written in a single performant language. People trade perceivable simplicity for things that are more beneficial long term.


>However, it would be better to have all this huge ecosystem written in a single performant language.

It's not the language, it's the linkage model. Python glues C components together and that's why experienced professionals don't blink when going all in on Python: the performance can be easily cranked up to C level.


Within the scope of a linked C library, this is true. But when your Python code is shuttling chunks of data between libraries it's less so. Pure Julia has the advantage that it can perform global optimisations that Python-wrapping-C can't.


I disagree. In practice the things I can do between Cython, llvmlite and numba are very low effort, wide reach optimizations. Practically, I have never found a case when a feature of Julia would have made this much easier for me. As a result, the tradeoff of Julia continues to be 100% focused on the switching costs, infrastructure maturity, drop in equivalents of well worn Python libraries, etc.


Doesn't that "just drop down to C" get complicated when you need to pass callback functions though? Users will want to write them in the original language for convenience and to use closures safely and easily. Wouldn't that kill your performance if the callback function is just Python again but called many times (e.g. an integrand for a numeric integration function, which integrand depends on captured runtime params). This is no problem in Julia, what are people doing cases like this in Python land?


Passing callbacks is so rare that this is definitely not a reason to switch languages and stacks and tool chains.


I would argue that because it could have been Lua in the first place. Fastest C ffi. Faster, simpler scripting language with natural 1 - indexed arrays like in fortran, matlab. It got some traction, yes, from ML community thanks to Torch but Python had better batteries already at that time.


The problem is that you need C/C++ expertise to build packages, which creates a barriers between users of packages and makers of packages.

This is a big selling point for Julia. One is seeing much faster package development on Julia than on Python because you use the same language on both ends. Package users are often able to contribute to the packages they use.

Also the Python-Linkage has major limitations. If you create anything that requires the user to pass a function defined in Python to the C/C++ layer you take a performance hit. Think about solvers e.g.

For machine learning that is a big deal. In Julia you can write your own scientific models and do automatic differentiation and training on them. If you do the same in Python you get a huge performance penalty. C/C++ cannot really help you.


> The problem is that you need C/C++ expertise to build packages, which creates a barriers between users of packages and makers of packages.

The vast majority of Python packages are in pure Python and require no C/++ skills at all.


Sure, but the large packages which are the big selling point for scientific computing are for the most part C/C++

- NumPy - Pandas - TensorFlow

etc.

In contrast the most popular scientific packages in Julia are almost all pure Julia.


Yes, but these large packages represent a very small portion of Python packages. Also, they still include very large amounts of Python code.

So a typical user who knows only Python can still contribute even to these packages, and certainly to the other 99.9% of pure Python packages.

The barriers only exist if you want to contribute to the C++ core of NumPy or TensorFlow, and these parts are so complex and performance critical that even in Julia I'm sure only experts would be touching them.


I am not convinced. It does not reflect the frustrations I have heard from Python developers who ended up switching to Julia.

I don't have personal experience interfacing C/C++ code with Python but I spent a lot of time with Lua and C/C++ and Lua is considered far easier to integrate than Python.

Yet looking back and those experiences, while I thought it was super easy and cool back then, it was fraught with all sorts of problems with impedance mismatch.

Especially shuffling large chunks of data back and forth between the interfaces of two languages is hard. You cannot take an arbitrary Python data structure and push it into say NumPy.

I've seen in practice how much problems that can cause at one of my previous companies. We made a C++ application with a Python interface. We got major performance issues because we shuffled data back and forth as NumPy arrays.

It also limited what we could do with the data. Rather than being rich objects with associated functionality, you are stuck with blobs of NumPy data, which is really just dumb data with generic operations associated with them.

With Julia I can create e.g. an RBG buffer, where ever element is an RGB value and register functions working specifically for this type of data. In Python you would be stuck with pretending your NumPy array of floats or ints represents an RBG buffer.

But don't take it from me. Talk to people who have switched from Python to Julia and let them describe the benefits. I meet a lot of these people and they talk about how much simpler code they can write and performance gains they get.

Sure if your Python packages fit your niche great at the moment, no need to switch. But too many people twist themselves sideways to continue with the technology they have determined is all they ever need. Python developers should know that too. Lots of Java, C++, Perl etc guy at some point resisted going over to Python with lots of lame excuses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: