Something that drives me bonkers about current tree shaking is that there’s no good way to see what got shaken.
I want a graph that shows me all the branches that got culled.
Even better would be a dependency graph that shows me what is causing things not to be culled.
I had an issue where Antd was pulling in 200KB of icons and I couldn’t figure out why until I spent hours bisecting it by commenting out entire sections of my application.
Something that drives me bonkers about current tree shaking is that there’s no good way to see what got shaken.
On a super basic level you can just build with and without tree-shaking and diff the results. If you turn off transpiling and minification, and move your bundled NPM modules to their own separate file, you can see what parts of your own code got shaken out pretty easily.
I'm tempted to have a go at writing a tool to do that actually. It would be useful.
It doesn't have to be perfect, from static analsis you can get what people want. It won't cover dynamically created keys but people don't use it anyway. You'd just have to provide list of entrypoints and ie. vscode plugin could gray out or put horizontal grey bar for code that's shaken. It would be great plugin.
A solution would need to be fast an incremental. Also we have got much faster bundlers with tree-shaking support recently (for example esbuild and parcel adopting swc), fast incremental correct builds seems to be still not completely there. E.g. cache invalidation would need to be correct in all possible cases, which does not seem to be the case yet, according to some Github issues I read recently.
Otherwise I agree such a tool in VS would be great. Not that if you generate sourcemaps you kind of have a similar view in Chrome. Shouldn't therefore not be too difficult to implement in principle for VS.
Interesting thing to notice is that for vscode plugin it probably doesn't have to be fast/incremental – the reason is that you want to skip node_modules completely (minus symlinked) - because it's for code you're developing, not full blown tree shaking on final bundle that has to visit everything.
It's common in javascript at least because you're delivering the code across the network, frequently, and at dev time, you're probably pulling in large libraries of components and frameworks that you may not use in their entirety. Since the load time is one of the core metrics used by SEO, and has shown to have a significant impact on conversion rates, it's worth the investment to remove code that's not being used.
I'm sure there are other places where it's applicable, but modern javascript and webapps have to be the quintessential use case.
Most other compiled languages do dead code elimination, which sounds similar but is a little different. Think of dead code elimination as removing code that doesn't change the output, while tree shaking instead includes code that could run.
To apply this to python is interesting - if you were creating a packaged version, I could see "compiling" the code to a separate package with only the required imports.
I know that Microsoft is doing some kind of treeshaking in Blazor (basically a .NET runtime in WASM), since they do not want to ship a full .NET base library to the browser. Not JS related, but the problem is the same - you want to ship as little data as needed on websites that are dynamically loaded.
It could be, but I don't see it as too practical as dependencies are very likely quite transparent and don't themselves rely on many dependencies. But for JS?
There's no reason you can't write JS the same way you would write Java or C++. (And if you were writing "serious" applications in JS that weren't web apps before NodeJS came along, you did. Of course, you still can, too.)
They are not the same, and the term comes from Lisp in the early 90s, not JavaScript. Tree-shaking is actually "live code inclusion" - in other words, it approaches the problem from the other direction.
In Lisp this might work like this (did and still does work in some commercial Lisp systems):
For example a Lisp system may consist of starting a memory image from disk, which leads to a live running memory heap. The tree shaker then may break unused links in memory - either automatic or under programmer direction. The garbage collector then frees the unused memory. Lisp then dumps a new (typically smaller) image, which has unused code and data removed.
They both start at the entry block (or exported symbols if it's a lib) and traverse the program graph, keeping only live branches. There is no “other direction”. Tree shaking is the same as dead code elimination.
All tree shaking is dead code elimination, but not all dead code elimination is tree shaking (in webpack at least).
Tree shaking happens at the import level. If I say `import { abc } from 'some-lib'`, tree shaking won’t include any other objects some-lib may export.
Removing a branch that can never be true, like `if ('production' === 'production') { … } else { /* dead code */ }` is dead code elimination, but webpack wouldn’t consider that tree shaking.
Anyone have any experience using a Java treeshaker? I have a Clojure GUI/JFX App that's bloated and I've only managed to slim it down with manual package pruning.
I couldn't get Proguard to work with Clojure. Granted proguard is quite baroque so I probably did something wrong. But maybe there is some simpler solution out there for the JVM? Or would this simply be impossible b/c Clojure uses reflection? Though so does Javascript from what I understand..
Shouldn’t the JVM already be doing this, as part of dead code analysis during the compile step? I believe tree shaking in JavaScript also only handles static analysis — eg imports and function calls.
I imagine though usage of something like exec throws DCE right out the window, poisoning the whole program
No, the Java compiler does not do this. You can try it yourself and see. All dependencies are bundled into the JAR. If you have some large library as a dependency (like BoofCV or JavaFX for instance) then all of the classes and dependencies are bundled. Even if you don't use 99% of them. That's why a lot of Java libraries are broken up into many smaller sub-libraries which you can then manually pick and choose from. It's also why you sometimes see people re-implement smaller sections of larger libraries - b/c you don't want to be forced to drag in the whole mess of code
As far as I understand, and I'm murky on the details, but if you're language supports reflection then you can call any dependency during runtime. And so this is not amenable to static analysis. So the Java compiler doesn't have the same guarantees as say a C++ compiler.
Most people are running JVM on the server so they just don't care about executable size (and on Android people use proguard). But reflection in general is a source of headaches..
If you use something like GraalVM then it will try to prune dead code but it will not work well with reflections.
That all being said, on the specifics, I'm actually not entirely sure how you'd use reflection to arbitrarily call library code at run time. If anyone knows, I'd be curious to see an example
In Java you could make an instance of a class doing
c = Class.forName("tld.something.SomeClass").newInstance();
and if that string is created runtime, it's impossible to rule out usage of any dependency. So you basically have to tell the tools what to remove and what not to.
Oh that's cool. I'm not intimately familiar with Java, so it's interesting to see how it's done. I figured it was something like that. Coming from C++, reflection seems like a giant useless codesmell - and there are all these issues further down the line because of it. And I get the impression most Java code doesn't even rely on it
Most Java code actually does rely on it, but behind the scenes. So you could go years between writing any reflection code yourself, but libraries and frameworks use it to dynamically search for known stuff in the classpath and change behavior if it finds other class X, or use it to serialize to and from various formats and java objects, do stuff based on annotations etc.
Java's reflection capabilities are pretty far down the "almost anything is possible" side of things, so yeah - unless you can rule out ALL reflection in the whole application, there's no way to safely shake a tree.
Crazy to think they choose an unshrinkable language as a mobile platform :)
It doesn't seem like that powerful of a language feature (and more of a code smell) and it creates this mess down the line. I wonder what fraction of libraries even require it.. I'm guessing it's b/c of the flag soup i never got ProGuard working. I know the Clojure language/runtime itself relies on some reflection - unfortunately. That probably complicates things
Java's is extremely powerful, especially when you include stuff like bytecode-weaving (e.g. AspectJ: https://www.lenar.io/logging-method-invocations-in-java-with...). Which I mostly include because support for it effectively built into the language and all major IDEs, whether it's a run-time or compile-time construct (or a blend of both). And with a sufficiently-complex custom class loader, you can do darn near anything at runtime, as you have control over how the system loads new code.
Android specifically though: an unbelievable number of libraries use reflection at least a little, if not deeply. E.g. the Android UI framework is absolutely riddled with it - ever used XML to inflate views, i.e. the default way to build a UI? That's all based on reflection. Many (many!) of the most-popular libraries use it as well, though the efficiency-minded ones generally use compile-time codegen (annotation processors) where at all possible.
Yeah, I mean from a IDE/developer point of view reflection is fantastic - really can't argue with that. At dev time it's a life saver. But coming from C++ my gut reaction is that for compilation/distribution time this is mostly a crutch/code-smell with a huge price. I've for instance noticed there is seemingly a lot of library rewriteing and fragmentation. It's common for people to choose a smaller one-trick-pony library instead of a more mature large library - to avoid dragging in a enormous dependency. That ends up as a huge bummer for the ecosystem
Thanks for the insight into Java internals and Android. It's a shame it's so tightly coupled at the roots but it's pure fantasy on my part to have a reflection-less JVM :) At the end of the day, there isn't really anything else like Clojure, so I just try to accept the warts and features.
Out of curiosity I took a cursory look at Qt and they seem to handle QML without using RTTI - so think it's all solvable - at least in most cases (I'm guessing by enumerating your possible types at compile time?)
I recommend checking out the jdeps command line tool, that comes bundled with the JDK. Though you probably have to read up on modules a bit first. It can create very lean “JREs” that can be bundled with the program and only load the required classes.
As I understood it, jdeps is only for shrinking the runtime and bundling it with the program. This part seems fine.. or at least they're doing the best they can :)
I ended up skipping this step and distributing a JAR - which is a bit of no-no .. but i just ask user to install the Java runtime. (it's better than making executable for every platform and then spinning up VMs to test)
However, jdeps or not, the rest of the executable still remains as bloated as before
You are right in that it doesn’t do dead code elimination per se.
But if my understanding is correct, java classes are loaded as needed. So other than JAR size, you don’t really need it as much as in the case of JS for example.
That's an important distinction. I guess my annoyance is mostly cosmetic. The app is pretty small and does a simple data processing task. Had I written the app in C++ it would have been a couple of megabytes (and it would have taken me 10 times longer to write..). But with Clojure/Java/JVM/jdeps it comes out as some monstrous 100MB installer. It just feels a bit gross ...
End user apps are second class citizens on the JVM (except weirdly on Android.. but they're in a weird bubble of their own)
I suspect (although I haven't checked the code) that this is actually a "dead code elimination" algorithm which is easier to write than a tree shaking algorithm.
Tree-shaking is actually "live code inclusion", the opposite side of dead code elimination.
The same as it deals with common code paths: What can't be guaranteed to never be called, it left in. E.g. you can remove functions in a module that are never imported anywhere else and never called, rather easily. You can also remove branches that are unreachable, e.g. inside an if (false). But for things like
function f(s) {
if (s === '') throw new Error()
...
}
there probably are very few tools that will statically be able to tell that f is never called with an empty string and thus the validation logic is not needed. It's JS after all and tools that mangle it all fall on a spectrum between doing very little, but very safely and trying to do too much and breaking code.
1. Because of lower overall bandwidth (in many cases it is a SIGNIFICANT difference between code size before and after treeshaking)
2. Because of better startup performance - less code to process, less pings to the server for new modules, less load on servers too (streaming one file vs streaming hundreds of them separately)
3. Because of better performance (eg. removing conditional branches that will never execute)
I don't think you understood the question. HTTP/2 allows for one connection to the server to stream multiple files. There's no more network setup penalty for making a request. So all that remains is downloading the files. Now it doesn't matter if they're in one bundle or 20 files.
You also don't have to tree shake up front because the execution will do it at runtime. If you deploy some unused modules, the browser will never request them and the server will never send them.
For the remaining benefits I can think of - more bundle bytes means more optimal compression; or shaking out functions within a module - I'm not so sure the benefits outweigh the complexity of maintaining and using bundlers. We could be splatting the src directory straight to the server and letting HTTP/2 do the rest.
I think you didn't understood my answer ;-) It still matters if it is one file or 20 (for example for file server loads). Then requesting 20 files separately (even on established connection) doesn't bring you a profit, as even HTTP/1.x browsers try to keep connection "alive".
What HTTP/2 brings here is a possibility to push to client files in advance so you can do a bit of "prebundling" on server side. Heard of couple different techniques of "prebundling" on server side BUT none of them will be as simple and performant as bundling it yourself :-)
(and want to make sure that you realize that: there is no profit if modules will be requested from server at runtime)
And I guess you never tired things like terser (https://terser.org/docs/api-reference) so you don't know how much profit it brings not only in a bundle size, but also a startup performance and even runtime performance. Try it, play with different hoisting and mangling options (especially test mangling properties) and check the memory consumption, startup times and performance of your code.
Not sure what you're talking about and if you know what you're talking about here: "more bundle bytes means more optimal compression; or shaking out functions within a module"
But to give you a hint will tell you to: compress src directory of your project and compress bundled and minified source of your project. See yourself what performs better ;-)
Your condescending responses are infuriating. Of course I have used terser, and before it uglifyJs, and am familiar with all the intricacies of mangling, compression, and other minifying techniques. I've also maintained many bundler configurations across different bundler solutions.
I'm not looking for a mentor to explain the basics of bundler technologies or asset compression. I'm asking if HTTP/2 reduces the advantages of these techniques to the inflection point where the complexity of using bundlers is no longer worth the squeeze.
"more bundle bytes means more optimal compression" - this alludes to a single bundled file taking better advantage of the compression dictionary than many small files, who can't share dictionaries when compressed.
"shaking out functions within a module" - this is another form of tree shaking as opposed to file-based tree shaking (import dependency analysis). Unbundling could compete with the file-based case, but not with a deeper analysis of unused functions _within_ files.
These were the remaining advantages I could think of, I'm sure there's more. It sounds like you think bundling is still the way to go, and I probably agree, but it doesn't make any less interesting a discussion about its remaining merits and if we can iterate to close the gap. Bundlers, minifiers, sourcemaps, are all complicated build tools that require a hand on the wheel to maintain in the long term. We should not settle for them as defacto js ecosystem techniques.
Sorry for that, for my excuse it was not my intention.
> Bundlers, minifiers, sourcemaps, are all complicated build tools that require a hand on the wheel to maintain in the long term.
It just shows how bad is design of JS and web. But there are tools that do it nearly all with just one command, check "rollup" or my recently favorite "esbuild" (if you didn't before)
> We should not settle for them as defacto js ecosystem techniques.
I think it's done, it's settled, as JS itself, and if we like it or not, we have to live with it. :-)
Still hoping that one day will come real web 2.0 and HTML, CSS, HTTP, JS and even webassembly will be taken as "lessons learnt", thrown away and replaced with something way more logical, structured and extendable
I want a graph that shows me all the branches that got culled.
Even better would be a dependency graph that shows me what is causing things not to be culled.
I had an issue where Antd was pulling in 200KB of icons and I couldn’t figure out why until I spent hours bisecting it by commenting out entire sections of my application.