When I think of "innovation in language design", I don't really think of JIT-compiling highly dynamic behaviour. I think of inventing languages with programming models that can be used for programming the heterogeneous hardware of the future (and, well, present). For example, massive parallelism is needed for both GPUs, FPGAs, clusters, and other more exotic systems, and the trick becomes how to translate portable and high-level problem descriptions to, say, highly vectorised code for GPUs, deep FPGA pipelines, or asynchronous message-passing cluster code.
You can't solve that with JITs. From what I have seen, JITs are very useful for taking away the overhead of indirect calls and similar dynamic behaviour, but I don't think that's going to be the primary language design challenge of the future. No compiler can (usefully) extract significant amounts of parallelism from an algorithm which has none, and current programming models are not particularly convenient for expressing parallel algorithms. While current languages allow you to express parallelism by using low-level features, the resulting program will invariably be non-portable. High-level parallel languages (or maybe libraries) are the only solution I can see.
An idea I've been banging around for a while: most high performance DSLs do not have language constructs that you can't find in general purpose languages. In fact, it's quite the opposite--these DSLs are defined more by what they lack instead of what they contain. Take for example Halide [0], the image processing DSL. It has you represent image transformations with pure functions (no side effects) on limited data types (float, int, char). These are things you can do in any modern general-purpose programming language (e.g. Haskell, Rust, Scala, etc.), but of course you can also do much more in these environments. When you take away everything but just what you need to do image processing, you can generate far faster code. If you tried to do the same thing for Rust functions, for example, it's considerably more difficult to generalize image processing code where each element of your n-D array could be a struct or a pointer.
My hypothesis is that a common DSL framework would be just an ordinary programming language (e.g. an extended version of Haskell or Rust) that allows library writers to subset its AST and substitute a code generator for its preferred subset. That way, a DSL writer doesn't have to re implement expression parsing, type checking, error messages, ... but instead just has to do the code generation. Moreover, it would limit fragmentation since the DSLs are, by requirement, the same syntax as the host language, so users would have a more consistent experience between different languages instead of working across arbitrarily different syntaxes.
I agree that achieving high performance from high-level languages relies crucially on restricting the expressive power of the language. The more powerful the language, the harder it is for a compiler to analyse the essentials of the computation and map it to hardware whose functioning may be very remote from the programmer's mental model the the program. Of course, what is "powerful" to a compiler is not necessarily what is considered "powerful" to a human, so with some cleverness it is possible to restrict languages in ways that vastly help the compiler, but seem like fairly minor concessions to a human. That is what I would call the language design questions problem of the future.
Whether these restricted languages are doable as embedded DSLs in general-purpose language remains to be seen. There have been attempts that work pretty well (Accelerate for Haskell is the one I know best), but they are limited in expressibility, and the ergonomics (type errors and conceptual clarity) are not great. Also, even with all the work that has gone into Accelerate, the end result is only useful to Haskell programmers, which is a shame. Perhaps this can be solved with a sufficient amount of shared infrastructure. Access to the real AST, instead of the lifted operations used by Accelerate, would be a good start.
> Whether these restricted languages are doable as embedded DSLs in general-purpose language remains to be seen
I'd argue that the Racket language (or language platform, depending on your view of things) shows it's possible with its #lang mechanism and macro system. For example, though Racket proper is untyped, Typed Racket is implemented as a #lang with macros and includes an optimizing compiler that removes runtime checks on programs that typecheck successfully.
Whether these restricted languages are doable as embedded DSLs in general-purpose language remains to be seen. There have been attempts that work pretty well (Accelerate for Haskell is the one I know best), but they are limited in expressibility, and the ergonomics (type errors and conceptual clarity) are not great.
Ziggurat[0] was an attempt at a general-purpose system to communicate static properties of DSLs past the more general host language (where they don't necessarily apply). It probably falls under the "ergonomics are not great" category though.
A good point that Lisp encapsulates a lot of the extensibility I describe, but I think there's two missing parts: a strong type system and a clear story for FFI. With Haskell (or Rust or Scala or OCaml or ...) you could take advantage of static type guarantees, both for productivity as well as for generating more efficient code. That isn't to say you can't type check Lisp code, but that you could get it for free with a statically typed language. For FFI, if your eDSL is going to be generating CUDA code or whatever, then you need a clear way to translate values between the two sides. This is easy for a language like Rust and is doable for languages like Haskell, but I don't know if anything like that has been tried for Lisp.
The typing aspect is really not a problem to me, but I get what you mean (also, "strong" typing is different from "static" typing). It seems also that staged compilation is becoming more and more used in statically typed languages. As for interfacing with low-level code, this is basically a major aspect of why Common Lisp is appreciated. Interfacing with C libraries is done routinely (CFFI: https://common-lisp.net/project/cffi/manual/html_node/index....). You write shaders in Lisp, you can interface with CUDA, and so on.
There has been work done on automatically identifying and running ordinary JVM hotspots on the GPU.
That said, I agree that automatic parallelisation is a hard problem. I don't think it's the only interesting area of language innovation: most of the innovations making programmers lives easier in recent years have nothing to do with it.
Both of those projects look like they are about designing a convenient GPU programming API for Java. Which is fine and useful, but you still have to program with an understanding of the GPU model (notably, massive threading), so it's not quite as high-level as what I believe is needed for portable performance.
How can we even get "portable performance" between architectures like ARM and x86/64, let alone from orthogonal paradigms like CPUs and GPUs?
Intel/AMD processors have complex branch predictors and instruction decoders and SIMD, ARM has conditional execution and some sort of quasi-SIMD, and GPUs have a weird hybrid of SIMD and multi-threaded-single-whatever (execution??) with sorta-kinda shared memory. Not only do these architectures implement memory access differently, the way basic control flow operations (conditionals and loops) work is physically different.
That's not even including architectures like SPARC or specialized DSPs. Or FPGAs, which are a whole different universe.
It's not going to be easy, I agree. Matching hand-optimized hand-written code written by experts is not realistic for single primitives, of course. But getting close may be. It's all about programming at a high enough level that the compiler doesn't have to think too hard about what the program means, so that it can instead spend its time worrying about how to express that in the machine code of the target platform.
Imperative languages are probably hopeless, but declarative languages may work. Functional languages in its current popular incarnations is about as hard to compile intelligently as imperative code, but I think that functional array programming (where bulk operations are primitives) show promise. Consider computing the average of every row of an n*m matrix 'a' (a 2D array). In Haskell-ish syntax, we would write this as:
map (map (/m)) (map sum a)
Now the compiler really only has to figure out two high-level things: how to apply a function (division by 'm') to every element of an array, and how to sum equally-sized segments of an array. Summing is again an instance of a more primitive bulk operation, reduction, which is parallel in theory, and has known efficient parallel implementations on most hardware. It's also possible to do some fusion for this small example, but my point is that we will need a programming model where the individual statement or expression works at a much higher level, and has much stronger properties, than in most current languages. You still need to teach the compiler about each specific hardware platform you want to target, of course (possibly by automatic learning or autotuning, as in http://www.lift-project.org/ ).
I agree, the headline is incorrect, the whole thing is basically a compiler infrastructure for those using Java, not about language design.
And it's true that the hard part is still transforming a representation close to the problem domain into a VM-executable one, as they are worlds apart. Then there is also an ecosystem problem, which forces us into the meta world and generating specific code in a specific format instead of doing anything with a VM. I didn't really see the appeal last time we discussed it and I don't see it now.
Why do you need that outside of gaming and numerical computing? Most software above the system level doesn't need to be super performant, or highly parallel. It's fast enough. We got by with far less resources a decade ago, and still ran a lot of the same software. Why would we need to port all that over to GPUs, etc?
Numerical computing is important for science in general, and really hard. But if you're perfectly fine with the current state of software, then sure, there's no need to change anything. Future CPUs will gain an increasing amount of their performance from parallelism - both coarse- (more cores) and fine-grained (better vector units), but there is no reason we have to exploit it. I suspect many people will want to, however, and they will likely find current programming models somewhat awkward.
It depends what you're optimizing for. If it's for programmer time / inexpensive programmers, sure. But remember the (probably apocryphal) story of Jobs complaining that the Mac took too long to boot, costing human lifetimes worth of waiting per year. Look at all the slow web pages, how slow Word or Excel are, etc -- that's a lot of human lifetimes.
More relevant is power. The more wasted cycles the shorter your battery life. The hotter that device implanted in you becomes. The worse the cooling problem in the datacenter.
It's naive to think that "computers are fast enough".
Progress on Moore's law has always created new markets that weren't predicted, or seemed unlikely/silly.
There are a lot of computational tasks that can not run on a smartphone today. It'd be great to run an IDE, a few baclground classification tasks, and a high quality augmented reality VR headset app on a device with an 8 hour battery under that load and fits in my pocket.
That's not happening without lots of hardware customization.
High level synthesis for FPGAs is coming along, slowly, and dynamic method migration has been around for at least a decade. But I agree that more efforts along those lines would be interesting to see.
Nice article, especially the section on downsides at the end.
A lot of VM design is at the expense of memory consumption over speed. Not just the JVM, but I think v8 memory usage got out of control recently and they had to go back to an interpreter in some cases.
VMs in the future should probably be optimized for power. More memory usage means more power usage. It matters both on mobile devices and in data centers.
The startup time issue is annoying as well. I like the general idea, but making startup time even worse than the JVM itself makes it unattractive.
If you want to reduce the time required to handle an individual request in a given program that you already have there's not much you can do automatically. You can't use a processor that's twice as fast because if you are already using a modern system we don't have one of those. You can't use two processors as we don't really know how to parallelise effectively enough in general cases. But you can double the RAM in a system with reasonable effort and cost.
The startup time issue is being solved with ah ahead-of-time Java compiler with whole-world analysis that we are developing. It runs hello-world in our Ruby implementation in about 100ms, which is an order of magnitude faster than the normal JVM.
OK, if those are your design constraints, then that's fair. I'm just questioning how widely those constraints will apply in the future.
I'm pretty sure the v8 changes were motivated by phones, so maybe there is room for another project like Graal and Truffle with a different set of constraints.
Also, for web services in particular (not sure if that is a major use case), I would question the view of making individual requests faster. I recall Steve Souders said he was a back end guy at Yahoo working on latency, and then he switched to front end performance because he found that the front end was eating up all the time.
In my experience this is too true. In other words, if you had a server where requests were taking 100 ms, and you made them take 50 ms through VM design, you would be a hero! But somehow front end engineers are still constructing pages with 10 - 30 full seconds of latency (especially on mobile) out of back end requests that take 100ms. There's just a lot more room to optimize there, whereas sometimes I feel like VMs are trying to squeeze blood out of a stone.
And I agree that there will be demand for such a "black box" approach, where you just take an arbitrary program and make it faster. But I'm interested in overall system performance, and in that case, a little bit of application knowledge can work wonders. One of my pet peeves is that often performance tools like time and space profilers are sort of an afterthought that programmers use only when they have problems. They write this huge spagehetti of allocation, and then 2 years later they start tuning the GC.
This is partly the fault of programmers and partly the fault of the language ecosystem. It would be nicer if the language exposed integrated tools. I dont' know that much about the JVM ecosystem, but they always feel like a "separate thing" you have to go research and choose and download and install. I think Go's tooling is a good example here.
Anyway, I think this is an interesting project. In particular I think the focus on polyglot programming is fantastic. I too feel the pain of enormous amounts of duplicated work -- e.g. even at say Google with a relatively small set of languages like C++, Java, and Python, there were still all these separate "worlds" and gradually engineers became sequestered in their own world. IMO Go made things worse and not better -- it was yet another native runtime that didn't even interoperate all that well with C++, let alone Java or Python!!!
>> "making startup time even worse than the JVM itself makes it unattractive
Agree.
Do you know if this can generate native code without any JVM dependencies? For instance, could I implement a C compiler that outputs optimized native code just like gcc or clang?
At any rate, it sounds like this research could be used to produce the tools we want, even if this is not it. For instance, PHP could use a JIT, if it didn't cause slow warm-ups and blow up memory usage.
"Graal is designed from the start as a multi-language compiler, but its set of optimisation techniques is especially well suited to compiling programs with high levels of abstraction and dynamism. It runs Java as fast as the existing JVM compilers do, but when applied to Scala programs it runs them about 20% faster. Ruby programs get 400% faster than the best alternative runtime (i.e. not MRI)."
Pretty amazing claims being made here, but I do like the idea of it. Build up your language grammar, then get a compiler, debugger and runtime for basically free. Definitely will have to play around with this soon.
JetBrains has something that is similar to this: MetaProgramming System (MPS)(https://www.jetbrains.com/mps/). It resides exclusively within the context of Java (so DSL) -- so it's not as, shall we say, far reaching as Graal and Truffle claim to be?
It works really well though; abstract the complexities of the AST through the use of various different methodologies so that designers can do what they do best, design. The shortcomings, I'd imagine, are the same with MPS in-so-much that the limitations imposed are defined by the system. And when you run into those boundaries, they are hard and unforgiving -- much like trying to use Python in a non-pythonic manner.
MPS is rather different. I've played with that too, perhaps it should be the topic of another essay.
MPS is fascinating because it lets you create non-text based programming languages and have them actually interop with text based programming in a useful way.
The main problem actually is it only generates projection editors, not text editors. So you interact with the language like an excel spreadsheet, which is pretty awful to be frank.
Another perspective is that Graal & Truffle prove that a large swath of programming language space is semantically equivalent, suggesting that we shift focus of innovation elsewhere.
The question is can you create a language like Scala with it? Ie with complex type inference? Or Haskell? Because if it constrains you to Java like languages, I don't see this as "innovation in language design".
This is the execution layer, i.e. how is that Scala program turned into machine code. You can make a Haskell implemented using Truffle. Or Rust etc... Haskell, defines the semantics of a program, while truffle turns that semantics into executable code.
It's actually still hosted by Medium, I just got around to setting up a custom domain as I noticed a few people saying they've learned to associate the medium.com domain with low quality writing. That's a pity, but at least it's easier to move off Medium now when something better comes along (or they go bankrupt).
It's mostly under open source licenses, and such licenses tend to discuss patents don't they.
The Google lawsuit was partly because they didn't use the actual Java source code so they didn't benefit from those open source licenses. But bear in mind:
• Google won, Oracle lost, multiple times. I'm guessing they won't try that again.
• Google has since switched to the OpenJDK source code, not away from it. They are now actually working more closely with Oracle, not less.
It's Oracle's general attitude. While not going anywhere near their products won't protect you from a patent lawsuit, it is some protection. They can't sneak in some commercial components and bill you later, for example.
That's never going to be sorted out because Oracle intended it that way. Like in 2011, just after they bought Sun, they started bundling payment-required components (e.g. JavaSE Advanced) in their supposedly "free" (as in beer) JavaSE download. Only this year did they start hunting down people who have been using those extra components, with some businesses owing back-charges of $1 million or more to Oracle. See http://www.theregister.co.uk/2016/12/16/oracle_targets_java_...
Based on past behavior, Oracle intends Graal and Truffle to do the same.
Edit: Actually, I suspect the name Graal is a play on JVM-based Grails, which, along with Apache Groovy, has the same culture of drawing users in with false promises then charging them for OCI consulting and G2One conferences to manage the resulting technical debt. Perhaps Graal means Oracle ain't buying theirs, they're building their own.
No, Java SE had commercial features forever and you can't use them accidentally. You have to pass a command line switch called -XX:+UnlockCommercialFeatures. They are free for development use and as such there's no DRM, it works on the honour system. Oracle have simply started stepping up their efforts against people who were, shall we say, not able to handle the honour system.
The commercial features aren't relevant to most developers, so it's all a bit of a storm in a teacup. There's certainly no bait and switch.
For this reason alone, I have refused to use Truffle. Consider the RPython toolchain, from the makers of PyPy; it does a similar thing (turn an interpreter with light annotations into a JIT compiler) but you are in no legal danger.
From what I understand, the JIT compilation RPython does isn't anywhere as sophisticated as Graal, and I can say from personal experience that the toolchain is pretty painful to work with.
You can't solve that with JITs. From what I have seen, JITs are very useful for taking away the overhead of indirect calls and similar dynamic behaviour, but I don't think that's going to be the primary language design challenge of the future. No compiler can (usefully) extract significant amounts of parallelism from an algorithm which has none, and current programming models are not particularly convenient for expressing parallel algorithms. While current languages allow you to express parallelism by using low-level features, the resulting program will invariably be non-portable. High-level parallel languages (or maybe libraries) are the only solution I can see.