Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting, I wasn't aware; thanks for that. I will say, Polars' implementation is much more centered on out-of-core processing, and bypasses some of DuckDB's limitations ("DuckDB cannot yet offload some complex intermediate aggregate states to disk"). Both incredible pieces of software.

To expand on this, Polars' `LazyFrame` implementation allows for simple addition of new backends like GPU, streaming, and now distributed computing (though it's currently locked to a vendor). The DuckDB codebase just doesn't have this flexibility, though there are ways to get it to run on GPU using external software.



Thanks for that insight as well! My needs don't tend to be so demanding so I've gotten away without knowing these details, but I suspect I the not-so-distant future this could be useful to know.

Being able to use distributed backends to process frames sounds kind of incredible, but I can't imagine my little projects ever making use of it. Still, very cool stuff.


Have you seen Ibis[1]? It's a dataframe API that translates calls to it into various backends, including Polars and DuckDB. I've messed around with it a little for cases where data engineering transforms had to use pyspark but I wanted to do exploratory analysis in an environment that didn't have pyspark.

[1] https://ibis-project.org/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: