Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
PostgreSQL 9.2 released (postgresql.org)
271 points by lest on Sept 10, 2012 | hide | past | favorite | 77 comments


We've been pretty excited about this release to come for some time at Heroku as its loaded with great features. In addition to the JSON datatype here's a bit of a longer list of features that are pretty noteworthy in the release:

- Allow libpq connection strings to have the format of a URI

- Add a JSON data type

- Allow the planner to generate custom plans for specific parameter values even when using prepared statements

- Add the SP-GiST (Space-Partitioned GiST) index access method

- Add support for range data types

- Cancel queries if clients get disconnected

- Add CONCURRENTLY option to DROP INDEX

- Add a tcn (triggered change notification) module to generate NOTIFY events on table changes

- Allow pg_stat_statements to aggregate similar queries via SQL

- text normalization. Users with applications that use non-parameterized SQL will now be able to monitor query performance without detailed log analysis.


Since postgres has basic json type support now, and PL/Javascript exists, it's only a matter of time until an extension appears that lets you deploy javascript applications directly to the database.

Who needs CouchDB or Node.js when you can just say CREATE EXTENSION 'couchnodegres.js'


Once the database can start receiving and returning json, then you can treat it as a webservice and not have the client involved in SQL. I think this is huge.


Just FYI, this is already half-true:

"With PostgreSQL 9.2, query results can be returned as JSON data types."

It also supports the PL/V8 stored procedure format, which is Javascript http://code.google.com/p/plv8js/wiki/PLV8

It's very, very close.


Can you elaborate a little bit on how an architecture like that would look like? I had great hopes for CouchDB once.


HTML/JS frontend calls simple node server that does nothing but call a stored procedure like

   spc_handle_request(req_headers, req_body, http_method, ...)
stored procedure does simple routing and does something like

  insert into table (a, b, c) values (to_json(req_body).a, to_json(req_body).b, to_json(req_body).c);
or

  to_json(select * from emp);
Time to implement simple crud app: 10 minutes.


yet it will not cease comparsions of postgres to mysql


I am actually pretty excited about the native JSON support, and overall I am a huge fan of Postgres, but this is the most press-release-y press release ever* . By that I mean that the quotes are way too "perfect", the kind you only see in press releases. Some PR or marketing guy wrote them then showed them to the person to whom they'd be attributed to get their ok. Nothing inherently wrong with it, just struck me as funny.

* having written more than my share of press releases in my time


> Some PR or marketing guy wrote them then showed them to the person to whom they'd be attributed to get their ok

I happen to know the guy that AFAIK was in charge of preparing the press release and he's actually a major contributor to the codebase, a geek par excellence and have been nagging people to get him quotes on the development mailing list.

The fact that apart from hacking C code he also knows how to write a catchy press release just makes him all the awesomer :)


Hey man, I was PR & marketing for 6 years before I became a programmer. Valuable skillset to have.


That was some good combination. I always admire programmers than can also produce good design. Now, I will add marketer + programmer combination to my "admiration" list!


Yep, the people that can both produce and sell are a rare and very valuable breed, if only because they can actually understand what the other marketers need (or will pay for!).


This. Good devs who are able to better pimp their great work (such as anything in this Postgres release), are the devs that get the coolest job offers. So props to the great work by the Postgres team and their tasteful fact based self promotion. :-)


I'm one of those quotes (talking about NewsBlur) and I just tried to give back to an amazing tool in my tech stack by being effusively honest. Honestly, I think they deserve more credit and was a bit surprised that they chose my quote over the dozens of undoubtedly heavier stacks that also use Postgres. Surprised in the good way.


I think Postgres doesn't do enough to toot it's own horn. Look at it's website. Come on. Spice things up a bit. I'm inclined to think hardcore users purposely downplay the product because they see it as their secret sauce.


It actually looks extremely professional. In my experience, when you're trying to convince a large company to use a platform, you want a clean, business-case-focused site without excessive flair. Postgres nails it on those points, I think.

It's reminiscent of IBM, which is a huge compliment in this context.


The last major web site update was in late 2004, according to my bisection of the wayback machine. It's a project, not a company, and there is no marketing or design budget. For example, Magnus, who does some Windows maintenance, is a principal author of the new pg_receivexlog, and knows a lot about the SSL support also worries about the web-based mailing list archive program and the CSS of the manual, if memory serves.

Nobody's holding back anything. It would be wonderful if someone wanted to contribute their design skills to the web page or documentation -- talking to the mailing list pgsql-www is probably the way to do that. I am positive such a person would receive profuse appreciation, and maybe can parlay that into personal advantage if the fulfillment of the impact such improvements would have is not quite material enough by itself.

The project runs its own web site infrastructure (many regimes have come and went in Postgres' history, only semi-recently was pgfoundary decommissioned for the purpose of new projects), and it's a little arcane -- but if someone really wanted to take ownership to move things beyond maintenance into progress, perhaps changes could be made.


> I am actually pretty excited about the native JSON support

There really isn't much to it yet, it's basically a varchar field with JSON validation, it's not like you can query or index it.

I actually find the `row_to_json` and `array_to_json` functions more interesting for now (though strangely enough there's no `hstore_to_json`)


You can query and index json columns using plv8. http://lanyrd.com/coverage/?speakers=leinweber&topics=js... has more details.


As one of the main contribs noted in a previous HN article, the important part was to get JSON conversion into core, full stop - because extensions (like hstore) can't declare dependencies on other extensions. Having JSON in/out in core is an enabling feature for all the rest.


That limitation (not-lucid dependencies between contribs) is rather sucky, but jumping into the world of dependency management will require a deep breath for sure.

Hence, instead of waiting for that, JSON support in un-adorned Postgres to solve a common use case in the interceding years.

The jargon for what this gives you is "a stable oid". Also more or less equivalent to a "system OID" at this time. These "Object IDentifiers" in Postgres are unsigned int32s that are (almost?) never reused (only accrued, or removed), and are all under the number 10000, and all assigned statically by hand. A-priori knowledge of these numbers can simplify writing extensions dramatically, but clearly this is not scalable for a future with dependency chains in extensions.


I think you can index JSON using a function based index, which will probably cover a lot of use-cases.


In addition, I don't know what you'd index if it wasn't through a function. Indexing JSON data as JSON? Why?


Not indexing JSON data as JSON, indexing a record based (in part) on the content of its JSON field(s) selected via e.g. an xpath-like query.


Hence using a function to extract that and then indexing the function's output. That can be done in PostgreSQL right now.


Eh, if you want to index it you have the hstore. I do wonder why they don't deserialize to an hstore type, though.


I'm sure something like that will be added. One of the issues there facing is that hstore is a string => string map, so a conversion from json to that will be lossy (i.e. 'that "true" in hstore, is that the string "true" or a converted boolean true from json?').

What you can do right now though is use PL/V8 to query the JSON fields. Then you can use functional indexes to still being able to speed up queries. Yes. You could that before but now there's a guarantee that a field of type json contains just that, meaning that your application logic will get simpler.


> One of the issues there facing is that hstore is a string => string map, so a conversion from json to that will be lossy

Or the insertion/conversion routine can assert that the JSON object is a string:string mapping only.


Why would that be helpful? I'd want arrays, hashes, integers, etc.


You won't get them in an hstore[0] column, which was the context of my reply. Of course in a JSON column you want full JSON support, but my and my parent's post were about converting back and forth between hstore and json.

Since hstore only handles string:string mapping, the choice is either to silently corrupt data by stringifying all values in the json->hstore encoding, or erroring out if the input data is anything other than string:string.

The latter would be what I'd prefer, and more in line with usual Postgres behavior.

[0] http://www.postgresql.org/docs/9.2/static/hstore.html


JSON isn't much like hstore, as JSON can be recursive. I don't think hstore can.

Still, there's no reason why you couldn't write a function to index JSON however you like.


JSON is much more powerful than hstore. Not only can it store different JSON data types (boolean, numbers) but also you can store documents with multiple levels deep, which is not possible on the hstore type.

Having said that, hstore is perfect when storing simple key/values. It's been battle tested and used for years and has some powerful and native indexing possibilities.


hstore is much more limited, it's a flat mapping of strings to strings (so a single level of JSON object). And as I noted, there does not seem to be any convenience method for converting between hstore and JSON data (or the other way around)


A hstore field can only contain strings, so it's not a one-size-fits-all indexing solution in this case.


If the proposed HTTP REST API (http://wiki.postgresql.org/wiki/HTTP_API) makes it into the product then the native JSON support is going to be even more useful. (I'm excited to see that there's an experimental attempt at it, too: https://github.com/bjornharrtell/postgresql-http-server)


Very interesting, the rest api would make psql into a total feature-for-feature replacement of mongo for us.


You're kidding right ?

PostreSQL can't horizontally scale easily or properly handle JSON right now compared to MongoDB. It also has a fixed schema which makes database migrations a necessary evil again. Sorry but the idea that PostgreSQL is a feature by feature replacement is pretty laughable.


> PostreSQL can't horizontally scale easily

Sure you can. Skype did it, Hitachi¹ did it.

¹http://www.pgcon.org/2008/schedule/events/57.en.html

> properly handle JSON right now

Well it has V8 running on it.

> It also has a fixed schema

You don't need to use it like that if you don't want to.


Easily being the appropriate word here. In that article you wrote they had to manually setup sharding. With something like CouchDB I can press a button to add a new node in the cluster. Again. Easily is what I am talking about.

And again V8 is nice and all but it is rather bolted on as opposed to something that is was built from scratch to support JSON. This shows in terms of feature support and most importantly ease of use. I can't just annotate HashMaps, Lists etc in my Java classes and have them serialized to PostgreSQL in JSON format.

I am just saying that PostgreSQL is a great and all but it is not a proper JSON document store style database anymore than hacking SQL on top of MonogDB would make it a RDBMS.


I noticed that Mattdebord's post qualified it by saying feature-for-feature replacement of Mongodb for us. He didn't claim it was for everyone. Why all the hate?


That's just taligent begin taligent.


Not sure "press the button" is the case for some. I can't speak for CouchDB but I know many people finding it not so easy for MongoDB.


You don't actually know how we use it, so you are thoroughly unqualified to criticize what I've said.


Maybe not easily but it can far more flexibly than MongoDB. Take a look at Postgres-XC which offers a two-tiered clustering solution for Postgres inspired by Teradata. A bit complex, but you don't give up your formal proof of data integrity in order to get it.


Here is another attempt, which exposes all your SPs as a JSON-RPC API. http://search.cpan.org/~joeljac/DBIx-Pg-CallFunction-0.012/s...


My two favorite features are not so high on the PR docs though.

The first is SECURITY BARRIER and LEAKPROOF which gives us an ability to rethink how to multi-tenant applications. This is a game changer and will get even better in future versions I am sure.

The second is NO INHERIT constraints, which I will certainly be making good use of. It is also a complete game changer when it comes to table inheritance and partitioning, and my main use will be things like CHECK (false) NOINHERIT to ensure that a table in fact never has rows of its own.

There is an amazing amount of good stuff going on around Postgres right now. Postgres-XC was recently released, and more. It is an amazing data modelling platform and ORDBMS.


Could you talk about them further and explain why they're your favorite new features?


The basic thing is this:

One area where Oracle is ahead is in multi-tenant applications. They have an approach where you can build filters on data based on specific criteria. You can do this on PostgreSQL too but the way you have to do it before this either involves add-ons or actual partitions of data, or tweaks to prevent building functions altogether.

Security barriers are important in catching up to Oracle in this regard and they allow us to build more integrated multi-tenant applications with greater access to the db by the tenants.

NO INHERIT constraints open up a fairly large area of PostgreSQL for use or misuse, because you can now partition a primary key between parent and child declaratively. My own use will be to prevent inserts into the parent table directly. Table inheritance is a really neat feature if you use it in non-traditional ways. It is not particularly useful for its advertised use (set/subset stuff outside of partitioning). However what it allows you to do is to compose your tables out of smaller re-usable pieces each of which can have complex derivations of data attached. For example, if you want full text search on comments on a lot of your tables, you can create a consistent, centralized interface for this using table inheritance.


Regarding the inheritance features, he's been writing an extensive series on the subject recently:

http://ledgersmbdev.blogspot.com/2012/08/intro-to-postgresql...

(though he should provide forward/backward links in the posts -- as it stands you need to find the other parts in the archive links on the right)


Managing the forward/backward links in a series this long while it is still coming out is a pain. They will be added as the series draws to a close next week though.


>The first is SECURITY BARRIER and LEAKPROOF which gives us an ability to rethink how to multi-tenant applications

It doesn't really let us re-think it does it? It just closes a hole in how you would have typically done it anyways. Is there any way to solve the problem of having to create a new connection as company_X_user for every single request?


I think a number of people might find such functionality useful -- every once in a while I run across people with tens or hundreds of thousands of schemas, because a program written to query against a particular schema is more convincing and easy to audit than row-based multi-tenancy.

The problem is that there are performance and tooling issues when you have that many database objects, so those people can end up sad, even though the model was one they basically were happy with.

So I think adding database features for lucid multi-tenant work while reducing the risk of cross-tenant leakage is a movement in a good direction.


> Is there any way to solve the problem of having to create a new connection as company_X_user for every single request?

Yes you can accomplish that with Veil, but I agree, it would be nice if it was built in, but an add-on is ok too.


I'm always impressed by the PostgreSQL team.

Personally, I'm excited about the range types and I can see immediate usefulness for them. My own applications aside, anything that helps developers create schemas that are better able to handle temporal data is a good thing.


Range types combined with exclusion constraints solve the problem of not allowing overlapping bookings using a database constraint. The solution is simple, clean, and flexible unlike the workarounds.

Example of adding such a constraint:

  ALTER TABLE reservation ADD EXCLUDE USING gist (room WITH =, during WITH &&);
In this example a room cannot be double booked.

EDIT: This is a feature entirely unique to PostgreSQL.


I'll be speaking on this use case next week at Postgres Open ( http://postgresopen.org ) as part of a larger demo of temporal databases in postgresql.

Jonathan S. Katz will also be presenting on Range Types.


For example: if you have a coupon that has a start and end time (but the end time is optional), typically, you'd have a start_at and end_at datetime columns. The query for checking for an active coupon would be:

    select * from coupons 
    where start_at < now() and (end_at is null or end_at > now())
Now, you can have a single column that represents the range of time that the coupon is active.

    select * from coupons where now() <@ duration;
Plus, exclusion contraints. So you could prevent the database from storing a coupon that was active at the same time as another coupon.


also overlooked is the massive speed improvement for in memory sorting.. that's what I've been waiting for.


I was expecting the json support, but SP-GiST is a very welcome surprise. http://www.postgresql.org/docs/9.2/static/spgist-intro.html

User-extensible spacial index types. This makes Postgres perfect for online machine learning.



Now someone please make usable tools for it on OSX. Postgres badly needs front-end tools of the quality of sequel pro.

I am a huge fan of Postgres, it's never let me down. But data exploration, ad hoc querying and such is a pain in psql. These tools are badly needed.


I used Navicat until I finally decided to try M-x sql-postgres in conjunction with a SQL Mode 'scratch' buffer from which I can simply C-c C-c.

I'm not sure about Sequel Pro, but being able to [reverse] incremental search on \d and \dt is a blessing.

If for some reason you want to eyeball sample data for a table with a ton of columns that won't fit in one screen, then a GUI's resizable columns are nice. But usually I just type of the columns of interest — us programmers are usually good typists.


pgAdmin not work on OS X? I'm pretty sure it does.

Just found this: http://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQ...


The complaint usually is "it doesn't look like a native app." Mac users are famously picky about only using apps that look native ;-)


Navicat[1] works well for me. Plus side, I can use the same app for all databases.

http://www.navicat.com


The absolute best feature of 9.2 is that you can now add \x auto to your psqlrc file and never have to suffer unreadable results again


what does '\x auto' do?


It's a display feature in the "psql" client program.

Normal result:

     column1 | column2 | column3 
    ---------+---------+---------
           1 | a       |     9.9
           2 | b       |    19.9
    (2 rows)
Using \x:

    -[ RECORD 1 ]-
    column1 | 1
    column2 | a
    column3 | 9.9
    -[ RECORD 2 ]-
    column1 | 2
    column2 | b
    column3 | 19.9
The first form is tabular and works well for a few columns; but doesn't work well when there are many columns, because the lines start to wrap. So you use \x for wide tables to make the result readable (but, obviously, fewer rows are shown at a time).

Using "\x auto" automatically chooses which format to use based on your terminal width.


Nice!

I tried using it for a result that contains really wide columns. I'm seeing a screen full of hyphens separating the rows.

You'd think that the hyphens would stretch across just one line of the screen, instead of across the whole result set. See https://img.skitch.com/20120910-fn1abpp3w94yhg63hc8yemt4a4.p...


Oh, interesting.

That's a problem for very wide fields, which aren't going to be handled very well even using \x.

\x was meant to handle large numbers of fields, or slightly wider fields.

But you're right, maybe that could be cleaned up a little more.


Oh so like Mysql "\G" command.


\x in psql switches if things are like a table or a list. Before you'd have to run a query and see that you cant read it, then hit \x and then run the query again. Now you can just have it figure out which one to show you based on your terminal size


"PostgreSQL 9.2 will ship with native JSON support, covering indexes, replication and performance improvements, and many more features. We are eagerly awaiting this release and will make it available in Early Access as soon as it’s released by the PostgreSQL community," said Ines Sombra, Lead Data Engineer, Engine Yard.


One thing I love about PostgreSQL development is all the small nice fixes added in every version.

Of the small fixes in 9.1 my personal favorite is probably the cleanup of pg_stat_activity. There are also many other nice small fixes like improved tab completion for some commands and the ability to set environment variables in psql.


We are excited about Cascading Replication because it reduces network data transfer over WAN when we have multiple Read Replicas within and across datacenters.


mysql only gets mentioned 2 (now 3) times in this thread? oracle seems to be doing a job!


Why would anyone bring up mysql?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: