Just a quick note to the readers: we're still in the early days, in a pre-release mode. A lot of things work, but not everything is there yet and there are some bugs (we know some for sure!). A lot to come as we progress: first-class Python support, CLI, schema & package management, etc.
Our initial users are using it with our support and our timeline is sometimes advised by their milestones.
That said, happy to see you try it out and please join our Discord (https://discord.omnigr.es/) if you want to chat.
So i Tried omnigress to build a simple api that does few db calls a month back to mock real world scenario. And damn, the numbers were quite thought provoking. Comparing those numbers with a similar api built using fastapi and asyncpg, there was if i remember correctly at least 4x to 5x throughput increase.
Even though the project is in quite nascent stage, the idea is promising. This tackles the concept or should i rephrase, pain of having multiple platform to run a service and the needed expertise that is necessary.
If the scaling part and dev convenience can be fully sorted, and also the security aspect like many have highlighted, this could very well have the potential to change the game.
> Comparing those numbers with a similar api built using fastapi and asyncpg, there was if i remember correctly at least 4x to 5x throughput increase.
I wonder to what extent this improvement comes from not using Python, as opposed to using this alternative solution? For example after Python to Go migrations it’s common to see this kind of speed up I believe.
We are working on first-class support for Python indeed (and other languages like JavaScript)
An important thing here is that we see Omnigres as a polyglot runtime with a database inside (Postgres) and we want people to use languages they prefer.
Many thought RDBMS is dead (at least 10 years ago), well here we are.
Dynamic languages were all the craze and now pretty much all those languages added type safety (Typescript, recent Python versions)
Looks like what goes around comes around.
Folks like Martin Casado noticing the trend [1] and explained what was the challenge [2]
This sort of stuff (and a lot of other popular things) highlights something that I've felt for years: we are still so far away from a good ops story as an industry.
Why would someone use this? Because they only want to deal with "one thing" in production. Dealing with multiple stacks is annoying. People show stuff like kubernetes as a way to handle multiple stacks. That stuff is also complicated.
There is still a huge opportunity for a methodology (accessible to non-operators) to get systems in production easily, where we wouldn't feel the need to force it all into these kinds of models.
Having said all of that, this is pretty fun looking!
However I doubt shoving everything into the DB will get us there - besides some hardcore PG enthusiasts like OP & co.
Sqlite together with the application layer of your choice seems to make much more sense to me, but hey we don't have to all like the same stuff (better we don't).
Also without reading too much in detail but I wonder how testing/monitoring etc would be supported
The language probably won't be perceived as "sexy" enough compared to, i.e., full-stack JavaScript. Most people don't really like SQL and aren't that good at it. Hence the continued popularity of ORMs that support the often-seen pattern of bulk-loading everything into the application server for processing instead of using a simple UPDATE statement.
We're actually first-classing a number of languages and ecosystem support precisely so that people can use their languages of choice and not just SQL or Pl/pgSQL.
> There is still a huge opportunity for a methodology (accessible to non-operators) to get systems in production easily, where we wouldn't feel the need to force it all into these kinds of models.
Nomad is a simpler way than K8S to run not just containers, but plain old binaries. Can’t beat K8S network effect. Technical merit or economical efficiency are secondary factors in tech trends.
This is an interesting project, but I'd be unlikely to use it, for a couple of reasons:
- In my experience the performance of the stateful DB server has been the biggest bottleneck when scaling - it's much easier to scale the stateless application servers which sit between end user requests and your DB in a traditional architecture. So usually I'm wanting to move as much work as possible away from the DB in order to squeeze the most out of it before needing to shard the DB or move to a different solution, rather than moving more responsibilities into the DB.
- It's frankly pretty scary to load a C extension into postgres which is opening ports and parsing requests etc - bugs in it could crash the server or open security holes, and if you were able to exploit a vulnerability you'd be able to grab any of the data in the DB and easily exfiltrate it. This would be less of an issue if using this for an internal service which isn't directly exposed to the internet, but it still could make it easier for an attacker to escalate their access. (This isn't a 'it should be rust' comment really, even if this was in rust it would still be pretty worrying).
- Even if think you only need simple CRUD actions, over time you tend to need more and more logic around those actions. Authentication, verification, triggering processes in other systems, maybe you make schema changes and need to adapt requests from old clients, etc. It's really nice to have a more heavyweight application server where you can implement that logic - I'm pretty skeptical that row level permissions, triggers etc will be able to cleanly handle all those as you add new requirements over time. This applies also to other tools for more directly exposing your DB ( e.g. PostgREST ). IMO starting off using a tool like this is really just setting you up to have to do a pretty painful rebuild later on.
Am I missing something here, maybe I have misunderstood the intended use case?
These are good points, but I'd like to nuance some of them. Disclaimer: I work on an SQL web application builder [1] that shares a lot of the philosophy behind Omnigres.
- about scaling: you have to get very far before saturating a single postgres server. A lot of applications certainly do get to that point, but most don't. And once you get there, scaling postgres is definitely more work than scaling a stateless service, but it also gives you a lot more in terms of performance, reliability and further scalability.
- about C being scary: as a rust afficionado, I am not going to contradict you. But postgres itself is already C, and @yrashk is not just any C developer. Notably, he contributes to postgres itself.
- about managing complexity: postgREST, Omnigres, hasura, SQLPage and other tools that simplify building directly on top of the database never require exclusive access to the database. You can always put some of the complexity outside if you need to, when you need to.
>- about scaling: you have to get very far before saturating a single postgres server. A lot of applications certainly do get to that point, but most don't. And once you get there, scaling postgres is definitely more work than scaling a stateless service, but it also gives you a lot more in terms of performance, reliability and further scalability.
Thank you for saying this. Mostly when postgres as a platform is brought up, the horizontally scaling ppl will often mention the parent thread. I have being developing since Apple computer had floppy disk, and there is rarely many situations where I need to saturating a single postgres server.
And even if one did get to the situation where that happens, with introduction of hydra, or other postgers columnar db, we can just put that in. Most user will never get to a point where they need to saturating a single postgres server. And also keep in mind when processing large row data, writting stuff in middleware is just not as efficent or fast as in postgres when it has native access to data and data manipulation.
On scaling, yeah a single postgres server can handle a lot. For us we were well past the million user mark before running into serious issues. However, a lot of how we were able to keep postgres working for us as we grew was by shifting work from postgres to our stateless services like I alluded to before - e.g. making our SQL queries as simple to execute as possible even if it means more work for the client to piece the parts back together.
If everything had been running inside the database we wouldn't have had that option and we'd probably have hit scaling limits much earlier - I guess we could have split off the traffic to the highest traffic endpoints and have those handled by a separate service calling the PG db, but then you get into issues with keeping the authentication etc consistent.
Re security - yep, PG is already using C to parse untrusted inputs from the network, which is also scary, but it's (hopefully) well reviewed and mature code - and even so, I wouldn't want to expose PG's usual wire protocol port to the internet, so it's hard to imagine exposing HTTP from postgres to the wild west.
Ultimately it probably is just a question of the sort of project it's being used for - if it's for something that's not going to get need to get to larger scales, handle a lot of complexity over time, or pass security reviews and your main goal is simplicity, then maybe an approach like this is a good option. I've just found that things tend to start off looking small and simple and then turn out to be anything but, so I'd rather run `rails new` and point it at a standard PG server - which would be just as simple and productive when you are starting out, and can keep scaling as your customer base and team size grows up to the size of Shopify, Github, or Kami (shameless plug).
Avoiding scaling the stateful part is also a path to hammer-nail syndrome - you start using less and less of the database system because you keep pulling things out since that's the only place you've established the ability to add CPU capacity and with that come a host of new and old issues.
Nice to see Omnigres trending on hn; congratulations Yurii!
The project is interesting, and thought provoking, because it goes against the often recommended "good practice" of separating storage and compute. Doing the exact opposite has a lot to offer in terms of performance, simplicity, and speed of development.
I am currently also working on a database-first web application framework [1], with different goals and use cases, and I bet we'll see more of these in the future.
Oracle has had this for ages, I’ve built websites by printing html from the database I think somewhere back in ‘97/‘98. Oracle’s framework has grown into an entire low code development tool running in the database: oracle apex.
It works and can scale far enough that it allowed entire corporate websites to be run from an oracle database.
> the often recommended "good practice" of separating storage and compute.
This gets me thinking... has developer conventional wisdom ever recommended binding things together? Or does it only ever recommend more separation, more abstraction?
10 years ago when people still used the term "big data" there was some enterprise buzzword usage around "hyper convergence" that was supposed to make it easy to bring compute and storage together/move compute jobs close to the storage they were supposed to operate on. I guess the idea was something like if you had a map-reduce job, you have some scheduler that runs workers on nodes that are in the same rack/directly connected to the storage partition they'll need.
React is a recent example of the “right” thing being less separated than prior conventional wisdom - we’re now mixing JS, often styling, and “HTML” when the historical powers that be demanded separation.
Mechanical sympathy. And conventional low-latency systems do that as much as possible - binding all sorts of layers. When you're picking the first few bytes of the packet and semi-parsing it, you don't traditionally and conventionally have that go through some vtable and perform dynamic dispatch to find the right parser or whatever. It's all plugged into everything monolithically.
At some point during the dot-com era, Microsoft toyed with adding an HTTP server to its SQL Server product to enable similar development paradigms. My recollection is that this was a disaster on several fronts, not the least of which was that successfully breaching the HTTP server gave an attacker direct access to the database. This (and other considerations) make a good case for the kind of separation-of-concerns which would avoid entwining API service with DB access.
I read the README, but it focuses on the practical aspects of getting the solution to work, not its philosophical justification.
edit: When I say "disaster" for the MS SQL Server attempt, I don't believe anyone ever experienced a breach, etc. But, as I recall, there were no large customers (MS caters to the larger corporate crowd) with IT deparments willing to risk exposing their DB servers to the "DMZ zone" of public Internet access (or only 1 layer past is, such as behind a proxy). So from that standpoint the disaster was providing a solution nobody (with money and corporate experience) wanted. I guess for hobbies or Silicon Valley MVPs, the concept might have some legitimate appeal.
I work at a shop that viewed MSSQL as a platform just like the OP describes.
Of course noone ever pointed that out to me, but as we migrated more and more away and the graybeards became upset about moving logic out of stored procedures into the application... and async tasks out of Agent jobs into an application job server... it dawned on me.
Would not recommend this path, but that is probably because I understand deployments and platforms. if they can fix the ergonomics, modernize the process, I could see it working out.
I suppose replacing your OS with postgres could have its advantages.
There was a similar tool developed by Oracle called mod_plsql, which served as an Apache web server module. As far as I remember, it allowed you to configure a mapping from a URL to a stored procedure. These stored procedures could receive HTTP request parameters and return HTML content.
I've always felt that this approach is the right way to build applications. Application servers seemed like an unnecessary layer to me, essentially serving as intermediaries that merely passed data from the database to the browser. In the past, they played a more critical role in generating HTML, but nowadays, application servers are primarily used for handling APIs. Consequently, they often lack meaningful tasks to justify their existence.
Having your code closely integrated with the data also has the benefit of improving performance.
Yes I believe this is often used as part of Oracle's APEX (Application Expresss) tool which has similar goals to Omnigres. It's used to put together internal business CRUD and reporting apps very quickly for some orgs I've worked for.
Cool project! About a decade ago, I had a job that involved entering data into an Oracle-based equivalent of this. I wish I could remember what it was, called but perhaps it is better for such a monster to remain nothing more than a myth.
The intention we have is to take what's right with the colocation approach and take the learnings of what was wrong with those before (such as the above or Illustra).
Essentially, we're aiming create a modern polyglot runtime that has an embedded database with all its functionality (Postgres), with a contemporary, lightweight D3X.
I saw omnigres mentioned on a shakti mailing list. Many of these concepts have been around with k (kdb) and some other APL-like languages for some time. But Postgres will give these concepts a much broader appeal.
I did something similar once, but integrated everything RESTful into nginx which connected to postgres directly, which ran triggers with JS. What I didnt like about it is being bound by max connections to postgres, and then blocking postgres thread when running javascript in it. The parallelization story just wasnt competitive enough.
Oh it's Yuri Rashkovskii, one name I actually remember from the olden times.
I think you have a good project on hands, happy to see you could establish it as a tech startup. Over the years my mind keeps wandering into this place where an apps are much tigther integrated with the database. Where a full class of engineering tasks (and bugs caused by them) is eliminated completely. I'll be checking your progress.
I was interested in how you do authentication, but currently the 'Omni_web' link & readme is missing. Suggest you could use the pgjwt[0] approach for this for simple logins in the short term, but supporting OpenID Connect would be a larger engineering effort.
The current approach employed by Omnigres users is good old sessions (since latency to the database is non-existent) and omni_txn's transactional variables (https://docs.omnigres.org/omni_txn/variables/) to store session-related data.
This way we don't need to handle the difficult parts of JWT (forced expiration, etc.) and the mental model becomes rather simple.
Also forgot to say, love the project and love the objectives! Allowing Postgres to do it all means so much from a server management perspective - imagine not having to manage any redundancy/performance/analysis outside of the health of the Postgres box - fantastic :)
Thank you for your kind words! The original dream was indeed a "one box" approach, even when they scale horizontally and to the edge. Perhaps a more accurate depiction of this is a "unified interface".
postgREST is a traditional web service that opens tcp connections to the database. Omnigres allows you to run your own custom http request handling code inside the database.
Couple of questions (with notes): Yes, I'd really like to write code next to the db with more suitable (for task) languages like Python. But at the moment SQL alone is supported..? Could one connect with Jupyter notebook somehow and have a REPL like experience with Omnigres instance?
Also, this [1] seems intriguing. How do containers connect to the db? What would the performance differences to the "internal" approach? Is this feature more like Lambda or for long running processes? Or something else? In any case very interesting.
You can use any language Postgres supports or will support to write your logic.
We are adding first-class Python support right now. It's already possible to extrsct stored functions from decorated functions and their type hunts and we're working on providing standard Python APIs like DBAPI, WSGI support, etc. We have a branch on which we ran Flask applications inside Postgres. As it matures, it will be merged and documented.
As for the containers, they receive the database credentials over env variables currently. The performance characteristics of such applications aren't as good currently. The intended use case for this is third-party apps and legacy pieces of own applications.
Please keep in mind that this extension hasn't received much updates in the past couple of months but there will be upcoming changes to simplify ot a lot and provide more functionality. We also have a future experimental goal of going all the way through a runtime like crun to remove moving pieces.
Thank you. I’ll be sure to check this one out when the Python supports lands.
Perhaps not same level as containers, but WASM runtime could be a powerful addition here. Or a container running said WASM code. I’m thinking more about untrusted client code for ad hoc data analysis and such.
One additional question: is Postgres foreign data wrappers going to be supported?
Love this. The problem I have with this approach, though, is SQL itself. Until we get simple things such as inserts and unions based on field name rather than field order, there are simply too many pitfalls for me.
I believe what they’re talking about is being able to pass something like a struct of key-value pairs, which correspond to column names in the inserted table. This would let us pass straight records/maps/whatever you want to call them to the db, and have the correct insert performed, instead of the genuinely horrible string-interpolation-hope-you-got-the-right-number-of-params-in-the-right-order-and-too-bad-if-you-didn’t dance that we currently have.
The interface we have with databases could be so much better.
This architecture has lots of potential for prototyping and internal tools. I would think twice before exposing some app built with it to the world, though.
Currently, we have an early version of `omni_schema` [1] that allows traditional incremental migrations and in-place migration of certain objects.
However, we're not quite satisfied with this and working on a more sophisticated system that would allow us to derive incremental changes where possible, lint schema, load application code with the right dependencies on types and other functions where necessary.
It does not have any UI yet. Something we haven't thought much about yet. Technically, feasible, just need to have a good story to drive a good experience there.
It's not currently spelled out in the readme, but our approach does lend for straightforward shipping of complete systems (data + code) to the edge because it's effectively just a replication of the database. We're working on further primitives that facilitate such an operational model.
We believe that a practical edge [backend] requires the presence of data next to the code, which is precisely what Omnigres promotes.
Actually, some of the feedback that we received from fairly close to the source, was that Postgres as an application platform of a sort was the original vision for it.
But I can see how this work can be perceived through your lense.
Just a quick note to the readers: we're still in the early days, in a pre-release mode. A lot of things work, but not everything is there yet and there are some bugs (we know some for sure!). A lot to come as we progress: first-class Python support, CLI, schema & package management, etc.
Our initial users are using it with our support and our timeline is sometimes advised by their milestones.
That said, happy to see you try it out and please join our Discord (https://discord.omnigr.es/) if you want to chat.