PostgreSQL vs MySQL: an apples to oranges comparison (opens in new tab)

(ledgersmbdev.blogspot.co.uk)

148 pointsottbot13y ago67 comments

67 comments

Really well thought out article.

The single app vs. multiple apps distinction is a good frame for constructive discussion.

Not so much because Postgres is better or vice versa but because in these days of agile and YAGNI development* there's a real problem with some developers not thinking further than the next iteration.

I like that development shouldn't be weighed down by 'what if we'll need X someday maybe' but it should have at least a rough roadmap - the terminology i use next is irrelevant but I'm getting at the idea of having macro, micro and miso level goals defined - in the future we'll deliver architecture concern Y, so let's not box our implementation into an incompatible state.

crazygringo13y ago

Wow, this is one of the best articles I've read on HN in a long time. It's a fantastic insight, and explains so much.

I would kill for a whole series of articles like this, explaining the philosophical underpinnings behind various technological alternatives.

So often, HN comments seem to degenerate into PHP vs everything else, iOS vs Android, OSX vs Linux, MySQL vs PostgresSQL, based on nothing else but people's personal experiences and preferences.

Taking a step back and exploring the why and history behind products, and what problems they were intended to solve, feels so refreshing and educational.

Kudos to the author (Chris Travers)!

smoyer13y ago

I've never been able to completely dismiss MySQL since it obviously works for so many users, but I also haven't found it very useful in my work. I started with PostgreSQL in the '90s and every time I tried to use MySQL, I found myself asking either "why did it do that"? or "why would I want that"?

The article describes two databases that are very different conceptually, and I think they're mirrored by users that conceptually think about data in different ways. So if one or the other works well for you, go for it.

I should note that I have successfully deployed applications on both platforms ... once I started doing most of my database interactions through an ORM, the differences didn't really matter any more.

Vive la difference!

jeltz13y ago

One thing I would say matters a lot still even if you use an ORM is PostgreSQL's support for transactional DDL. No more worrying about migrations crashing in the middle. You should of course still test your migrations but one less thing to worry about is always nice.

einhverfr13y ago

Interestingly the two db's I can think of that don't support transactional DDL are both Oracle products ;-)

Oracle has something that kind of looks like transactional DDL if you stand on your head and squint, but it doesn't cover, say, changes to table schemas.... it's only side by side versioning of stored procedures.....

lmm13y ago

I think MySQL's original success came from features that are less obvious when developing (although that said it does have some friendlier development syntax (e.g. show create table) and lower startup latency in interactive use). It made running multiple instances on a single host easy. MyISAM tables give amazing performance in return for weakening many consistency guarantees, which is a worthwhile tradeoff for many use cases. Postgres is only just starting to gain the level of clustering support that MySQL has had for years.

jeltz13y ago

The performance benefits of MyISAM are exaggerated. Its advantage was that is was a decently fast and very simple engine which made it possible for MySQL to reach the market quickly.

MyISAM only gives good performance when you either have only reads or just one single writer. And even then it does not always win over InnoDB and PostgreSQL. And as soon as you get a mixed load MyISAM performance drops like a stone.

einhverfr13y ago

Actually I agree about the MyISAM tables. Those are great for lightweight content management, more or less that initial use case I mentioned.

The other thing is that when I switched to PostgreSQL for important work, I still had to keep MySQL around for database prototyping because it wasn't until 7.3 that I could drop columns from a table. PostgreSQL was pretty ugly and hard to work with in 1999 but it has gotten a lot better. I would further note that while PostgreSQL has gotten a lot faster, MySQL has become more feature-complete at the expense of speed.

VLM13y ago

Quick summary, assuming you know what MVC stands for:

Mysql - Model is in your code. PostgreSQL - Model is at least partially in your database.

There is a HUGE mistake in the article in the assumption that WRT the model design, that the database always knows best. Its possible to come up with weird situations where you just want the DB to store stuff and not nanny you. Consider a database of actual, real world, gravestone inscriptions. If someone's gravestone stone has "1890-02-30" inscribed on it, I know thats wrong but I don't care, I need to store it exactly as is for historical purposes, I don't want a DB crash or need to recompile postgres to accept it, I don't want to force the users to falsify gravesite records, I don't want to have to store as a CHAR or VARCHAR and have to write my own date handling routines in my app... The correct way to handle data modeling/integrity is to allow the app designer to decide how flexible he wants to be WRT reality, and let him decide exactly how to shoot himself in his foot.

On a bigger scale, if I made a database table and related CRUD app to store philosophical positions, if I wanted an AI to only accept "truth" then I'd put the AI in my app, I would not want the DB model and the app model to have to fight over Marx being right or wrong before the data could be stored. What if filesystems needed to verify "truth" before allowing a file to be saved? Weird.

Is it persistent storage or is it a turing complete theorem prover and why most both be in the same executable? Note I'm not claiming a "middleware" of a model is a bad idea, in fact its a great idea, it just doesn't belong in the persistant DB store anymore than it belongs in the filesystem layer.

bigiain13y ago

I think you missed the _reason_ why that matters.

If it's _just_ your app/code that's sending and retrieving data from tha database, you can pretty much do as you please.

If other code, especially other code written by other people needs to interact with that data, then explicit rules and agreements need to be made about exactly what "1890-02-30" means.

The argument in the article is that Postgres (and Oracle) have features that help in the multiple application interfacing with the same database, when compared to the MySQL and NoSQL end of the database spectrum.

It's not so much "the model" that's moved into the database, but the validation of the values stored by your model.

I think it's making a better argument than you imply. If you want to be able to store 30 Feb in your model, you'd better consider what might happen if you try and store that in a date column in your database. I'm pretty sure at least some versions of MySQL will happily let you insert that date, and "magically" return 02 (or 01) March when you query it. Is that the "expected behavior" of your Gravestone app?

einhverfr13y ago

Author here.

Mysql - Model is in your code. PostgreSQL - Model is at least partially in your database.

Also your code can be at least partly in your database which is what makes this possible.

There is a HUGE mistake in the article in the assumption that WRT the model design, that the database always knows best.

I didn't say that. However if you read the entire O/R modelling series you will see in PostgreSQL it is possible to fully define your model in an OO-like way in your database, and if you do that then that model can be re-used across applications written in different development environments. We now have proof of concept PHP classes for integrating with LedgerSMB because of the fact that our model is in our db. This makes it very easy to write classes which interop across different languages.

The fact is you can decide where you want the line to be. PostgreSQL allows you to build interfaces which give you much more intelligent data models at every line.

These tools have complexity costs though. Use where appropriate.

Mike Stonebraker's example was: Create a db query to tell you what images (in your database) are pictures of sunsets taken within 20 miles of Sacramento.

His argument for code being in the database is that the last thing you want to do is select several thousand images and hand them over to the middleware or client for processing. Instead you need some way of having the database answer this and only send you back the ones you want. He suggests:

    select id
    from slides P, landmarks L S
    where sunset (P.picture) and
    contains (P.caption, L.name) and
    L.location |20| S.location and
    S.name = 'Sacramento';

The point here is that you have two good examples of why this approach can be important here: spacial queries, and filtering out images by content using image recognition algorithms. In this way, you aren't burdening your least scalable tier with transferring MB and MB of information back to a middlware so it can perform the processing and return only a few records to the client.

VLM13y ago

"There is a HUGE mistake in the article in the assumption that WRT the model design, that the database always knows best.

I didn't say that. However if you read the entire O/R modelling series you will see in PostgreSQL it is possible to fully define your model in an OO-like way in your database"

Perhaps the area of disagreement in our interpretations is that I'm thinking "the database knows best" as in the DBA gets the last word on what can be stored vs the DEV whereas I think you're defining the data definition in the DB as a DEV task, or maybe all DEVs should be both DBA and DEV, which I don't think will work very often but when it does work it's great.

"and if you do that then that model can be re-used across applications written in different development environments."

Again, would be great if its possible. Probably one very important part of the workflow would be not to allow the DEVs to code in a MVC framework, essentially VC only, or just vestigial M like anything goes and rely solely on the DB for all data modeling. Otherwise each environment will have a different, probably incompatible, model.

"PostgreSQL allows you to build interfaces which give you much more intelligent data models at every line." "In this way, you aren't burdening your least scalable tier" There's no free lunch, only tradeoffs. In your case at least in one example, it works great and I'd glad for you, there is no better proof than working code / working system. However in general for most situations I don't think it would work very well at all.

2 more replies

zzzeek13y ago

I've read through all your articles regarding "object relational modelling" and am still having a problem with the notion of, "in order to do a complex relational query, we need code in the database". Stonebraker isn't entirely impartial here as he's trying to sell his own product in this area (VoltDB) which is highly dependent on the "database-side logic" approach.

There's an important tradeoff being discussed here, which is, "can we get directly the data we want from the query", versus, "do we need to load all the data into our app first and filter it there". This is of course the critical thing that a lot more people need to learn, and the work I do with SQLAlchemy is all about this. But in the SQLA approach, we use Python constructs on the app side which expand into SQL functions when rendered in a query. The effect is very similar to that which I see in most of the examples in your posts.

While I think advanced data models and rich SQL-side functionality are essential, the usage of stored procedures is IMHO not the only way to get there. In practice I often use a mix of both, depending on how verbose the function needs to be.

Keeping SQL functions as app-side constructs has the advantage of source code management. It's easier to support multiple kinds of backends (I run against PG and SQL Server a lot) since you aren't tied to a stored procedure language. There's no need to emit new stored procedure definitions to the database in order to support new features of the application. You don't have the issue of updating a stored procedure on the database side such that multiple application versions, targeted to different versions of the database function, still continue to function. I think there are ways to approach these problems in favor of SPs, but they require some thought on how the source code is maintained, managed, and deployed. For now I've just stuck with keeping most SQL functions on the app side.

The big namespacing problems I see are, what if two different kinds of "classes" want to have the same method name ? The definition of a PG function here creates a name that's global to the whole schema - this suggests we may want names that are qualified with a "class name". And what if you do in fact need two versions of the same function present to support different application versions ? In that case maybe we want to qualify the names of the functions with version ids as well. This actually sets up a great opportunity to use an application side system of rendering class/version qualified SQL names in response to plain names on the app side.

I guess my point is that the "app logic in stored procedures" approach is interesting, it has some management/deployment issues that also might be interesting to solve, but app-rendered SQL when using an effective enough app-side toolkit can solve the problem just as well in most cases.

1 more reply

jeffdavis13y ago

"Consider a database of actual, real world, gravestone inscriptions. If someone's gravestone stone has "1890-02-30" inscribed on it, I know thats wrong but I don't care, I need to store it exactly as is for historical purposes, I don't want a DB crash or need to recompile postgres to accept it, I don't want to force the users to falsify gravesite records, I don't want to have to store as a CHAR or VARCHAR and have to write my own date handling routines in my app..."

I think this is a crucial point of distinction between the two philosophies: structure defined in the query; and structure structure defined before data is loaded.

Structure defined in the query is the more obvious approach. You collect all of the data, and write queries over that data that handle all the cases. The queries often become quite complex and error-prone. Even if the query is slightly wrong, the result generally looks about right. Queries may take a long time to develop and get right, and may react badly to new data that is loaded (e.g. "I thought that was a number field, but now it has letters"). This approach is useful when you are trying to interpret the input data in several different ways -- in other words, when the query is helping you determine the nature of the data you have.

Defining structure before loading is generally more robust and less error-prone, but requires planning that may be frustrating to people just trying to get their hands on the data. The queries generally don't have branches or special cases, so usually if the query runs at all, it will give the right answer. If someone is trying to file an expense, and the receipt says Feb 30th, the accountants still don't want to see the expense as happening on Feb 30th. If they let it in, it could (potentially) break all of their other queries by creating inconsistencies (e.g. it happens after one month is closed and before the next is opened, and it causes the accounts to be out of balance somewhere). So, the person filing the expense has some extra work to do -- maybe they need to look at their bank statement to see what day it really happened, and add a note saying the receipt has the wrong date, in case anyone does an audit.

Broadly speaking, the first philosophy is easier for writers of data, and people writing the applications that help people input data (in part, because you never have to tell the user that the data is wrong, and they need to reexamine their records). The second philosophy is easier and more reliable for the people querying the data, but harder for the people trying to input data and the people trying to write applications to help input data (because they have to handle more error cases and try to provide context so the user can correct them).

In your example, it all depends on what you are trying to ultimately do with the data you collect. The easiest thing to do is to take the data in an even more raw form: just have people take pictures and automatically upload them. But it's awfully difficult to query pictures, so you have to demand a little more structure at load time if you want to query the data at all. I'm not sure what the right balance is for you -- maybe they have a 13th month or a 32nd day, so you should just ask for 3 integers. Or maybe people put question marks or ranges (e.g. born sometime between X and Y), and you want to represent those as well. But the more of that you do, the more burden you put on query writers, and the higher the chance that you get wrong results.

einhverfr13y ago

In one of my other posts in the object-relational series I noted that select * has very different implications in an object-relational vs a strictly relational model. In a strictly relational model you want your SQL query to define your data structures on output. In an object-relational model often times you want your data structures to be formed properly so the db can do other object-relational stuff with them later. So there select * becomes very useful as a way of ensuring that the data structures on output can be simply re-used later.

Of course if you are doing pure physical storage queries, select * is probably not what you want but if you have a logical model built up, you may want to do select * from it in order to ensure that your output matches some specific set of rules.

xtdx13y ago

How is MySQL going to store "1890-02-30" as a date? What internal format does it use to allow storing a date like that?

bigiain13y ago

A varchar named something like "DateAsEnscribed". With a related date field that you can search on with some well defined policy about what happens when gravestones have invalid dates enscribed them.

The problem is, you probably don't work out ou need that until you've got a million rows stored in a date column, and when you discover it, you then start asking ourself "I wonder how many of our dates have been auto-magically 'corrected' from accurate-but-invalid enscribed dates into valid-but-not-as-enscribed ones."

Someone13y ago

I guess it uses a mixed-radix number with radixes 10-10-10-10-12-31 or 10000-12-31 (or, maybe, 10000-13-32 to allow for zero months and days) if the config flag ALLOW_INVALID_DATES (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#...) is set.

I still fail to see why anybody would want that or even the default 'if you cannot figure it out, use 0000-00-00' mode, though. That flag makes a broken system more broken, and if someone wants more flexibility in storing dates, he could always use char(8) or so.

In the context of this article: if you use your database as a dumb store and put all logic in your application, why would you let MySQL decide for you that, e.g., 2000-12-34 becomes 0000-00-00 and not, for instance, 2001-01-03?

1 more reply

VLM13y ago

CHAR(10) worst case, or probably something a lot more like rowname.year INT, rowname.month INT, etc. Yes you could do your own homemade date type in that in postgres and your query would look like "SELECT " and then you'd write your own date DBMS routines, but it would be icky. Compare the execution time of "Select from blah order by somedate limit 10" on each design, especially if the DB and webserver are on separate boxes.

It comes down to the fundamental question of who defines bad data, the DEV in his model or the DBA in his table design. Worst case is both, with no coordination, second worst case is both with coordination (wasted effort)

3 more replies

joedoe5555513y ago

+1 for that

That's also my impression and a reason why MySQL is a great DB for ORM-driven Apps.

gmac13y ago

mh-13y ago

well, that's completely pointless.

jaakl13y ago

I have a story about performance of two of these. I maintain a Drupal website which under MySQL. Single server system, not very high load. But recently, last Monday it was in news and load increased some 10 fold. People started complaining that it does not respond, or gives "offline" message. So I managed to login to server and it had load average 130 or so, mostly under MySQL. Drupal gives no mercy to DB for sure. It just so happened that I had a recent copy of the site with PostgreSQL, as I had planned to switchover to use PostGIS, but not everything was working there yet. But site was down anyway, so I saw good option to test it against high load, I reconfigured site and restarted Apache to direct traffic to the new database. After minute or two the load average was in the range 5-6. Not normal for 2-core server really, but significantly lower than before, and the site was saved, and I was not able to perform the load test with real users. PostgreSQL was not tuned at all, just plain yum installation of latest 9.1.4, how much MySQL tuning was done I do not now, it was set up before my time. But it smashed my basic belief that MySQL should work better for large number of simple transactions. Stupid thing is that now I know less than before: I don't know any case where MySQL could preferred.

taligent13y ago

Did you try Percona or Twitter/Facebook's version of MySQL ?

dgregd13y ago

If you cannot compare PostgreSQL to MySQL then what databases can be compared?

dotborg13y ago

the point of this article is that Postgres is not just a database

einhverfr13y ago

You have to define "just a database" (I am the author btw).

PostgreSQL is a very different kind of database than MySQL is. MySQL is closer to a persistence layer for an application with an ability to plug into other reporting tools. PostgreSQL is a data modelling platform.

Both are "just databases" or not depending on how you want to define "database." I would say it is clear that PostgreSQL is not just an RDBMS but that's the point of other articles on the blog, not this one.

dotborg13y ago

postgres provides much more capabilities than just "dumb storage of the application's state"

1 more reply

thirsteh13y ago

Sure it is. MySQL is just a database with fewer features (and less rigid ACID compliance.) Like SQLite vs. MSSQL or PG vs. Vertica. Different tools for different jobs.

1 more reply

lobster_johnson13y ago

It's worth mentioning that while Postgres does promote the idea that the database is the gatekeeper to the data -- where you're supposed to manage data with stored procedures, triggers and so on -- it's not in any way mandatory, and not necessarily the way people prefer to use Postgres.

Personally, I lean towards treating the database as a passive data store controlled by the application, as opposed to using the database as an application platform. For example, validating and normalizing is done in the client application.

While I would also be happy about putting logic in the database, this currently requires a split of the data logic between the application and the database in a way that I don't like: Stored procedures, functions, triggers and so on are all persisted alongside data. (Internally they are treated very much like data by the database, in fact.)

This is unlike the form of the application, which manifest itself in its source code, which resides in a Git repo, and so on. The database code is always "live", whereas the app code runs at my behest. There is a reason why the article uses "create or update procedure", not "create procedure".

So if I move some of my code into stored procedures, triggers, rules, checks, etc., I have to push this code to the database using SQL scripts. This drives the code into a kind of dark, murky hole with respect to visibility and versioning. What code am I running? I'd have to look into the catalogs to see.

Also, can I run different versions of the code concurrently? Not in the same database, apparently?

And how do I seamlessly, elegantly upgrade the code? Sure, I can manufacture SQL scripts that do the "create or update procedure" dance, but what if I removed a procedure? Or what my database schema changes -- then my code has to move along in tandem with the schema.

As far as I know, current database migration tools just aren't very good. In other words, for me, personally, I don't think the toolchain is quite there. It sounds like a nightmare to manage. Having dealt with libraries such as PostGIS, which are built on functions and stored procedures, I know how gnarly this system can be.

Personally, I would like to see a database that supported non-persistent data logic. It could be plugging in actual source code (put foo.rb in a folder, database will run it and export its methods as database functions) or interfacing through an API (instead of "create trigger" to create a trigger, have the database call my REST API /user/validate for each row) or similar.

einhverfr13y ago

The question is where you put your API. The biggest tradeoff I see is whether you can run on many different RDBMS's or whether you are tied to one RDBBMS. But similarly the question becomes to what extent your db is tied to the application and to what extent it can be used safely by many apps.

So that's a big tradeoff. The rest can be solved the same way you solve the problems elsewhere, and there are some advantages to being able to put your unit tests in db transactions and roll them back.

There's a second big issue too which rarely gets noticed. Doing application-style development in the db rarely works well. To do this well, you really need to make your queries front and center, write good, clear queries, and so forth. I have seen what happens when app developers try to write stored procedures and I don't really recommend that.

eurleif13y ago

>Or what my database schema changes -- then my code has to move along in tandem with the schema.

Well, yes, but that's true if you don't use stored procedures, too. And don't stored procedures make it a bit easier, since you can update the schema and code all at once (atomically, even, in PostgreSQL)?

ibotty13y ago

great article. i am amazed. (many articles from ledgersmb folks (or only this person, i don't remember) are great.)

tesmar213y ago

I think it really depends on the type of apps you write. If you write small - medium rails apps, then the db is usually just dumb storage and if you are careful to use the ORM properly, then DBs are just plug and play. If your app is larger and you need more features, then you look more deeply into the features of each in which, most of the time, POSTGRES is the clear winner.

VLM13y ago

"If you write small - medium rails apps"

If you are writing using a MVC like rails, mysql is easier because the M is solely in the rails app rather than some of the M being in the rails and some of the M being in the DB configuration.

Theoretically there's no postgresql issue if you're comfortable splitting design and config stuff into two areas is acceptable IF all the devs are also DBAs, or if you are careful to never use any of the features of postgresql (turning it back into a persistent store), or if you never use any of the model features of rails to enforce data constraints, or if the DBA and the DEVs are on exactly the same page... adding a spinlock like that across functional areas, or maybe even across departments, is rarely a win.

fdr13y ago

For many small Rails web applications, details having sane (and rich) datetime support in the database is both nearly invisible (turning bugs into errors, and working about the same otherwise) and very useful. The same could be said about silent truncations of strings (one can get the same behavior in Postgres by using a explicit cast, or just a built-in function).

LedgerSMB -- an author of which wrote this article -- is on the advanced side of the spectrum in using specific Postgres-isms, and that's not for everyone.

There are grades in-between: using arrays for simple kinds of matching in Postgres is not rocket science and can make your life a lot easier (and faster):

  SELECT * FROM blawg WHERE tags && '{a,b,c}';

Yet another use case might be full-text search, where integrating and synchronizing an external information retrieval system (like Lucene, Solr) for simple search use cases are just more trouble than it is worth. With slightly more work than the above, one can use Postgres's Full Text Search feature.

So here I think you are supposing a false dichotomy: there is a continuum available on both databases, but the continuum extends a lot more into 'the database can know something about the data' territory in the Postgres implementation.

1 more reply

einhverfr13y ago

On top of that, if all you are doing is relatively simple persistence, the sql_mode and plugin table model can be actually very useful even though these get in the way in some other environments.

dscrd13y ago

Are there any books out there that describe how to efficiently use PostgreSQL's object-relation techniques?

einhverfr13y ago

I am actually thinking of writing one. My blog (yes I am the author of the article, http://ledgersmbdev.blogspot.com) is basically running a fairly large series on these features.

The big thing to realize is that there are two aspects here: development efficiency and complexity cost on one hand, and db performance on the other. A lot of the O/R capabilities really do have a significant complexity cost and it is easy to code solutions that don't work very well performance-wise or otherwise.

In a bit over a week, I will have general recommendations out on the blog entry. As a brief preview:

1: table methods are generally good

2: nested storage is generally bad

3: table inheritance is frequently used for the wrong things. It is a very powerful technique if you think in terms of composition, but not so great if you think in terms of inheritance.

4: avoid to the extent you can putting tuples in columns. Postgres gets really weird in these cases.

dotborg13y ago

postgres docs are full of performance tips

joedoe5555513y ago

I learned SQL with Postgres docs ;)

dkhenry13y ago

every time I read a PostgreSql vs MySQL post the only thing I can seem to take away from it is that DBA's really seem to hate the fact that MySQL has made them obsolete and really want you to think you should switch to a platform where they are necessary.

joedoe5555513y ago

Postgres is definitely a pain in the butt to properly setup in a network, in particular if you have never done it before. (And just talking about non-cluster use)

However, once you know the steps necessary to setup a postgres server, it's a piece of cake. In fact it's even fun because Postgres is a software that comes in pretty much the same packaging in every version. And I can even top that: with pgAdmin it has a powerful, consistent and stable tool to manage databases. (MySQL seems to be miles away from that)

I think the impression that Postgres is more difficult than MySQL roots in the fact that it's more difficult to install at home. (Why so many installation steps? MySQL is basically just installing the package, changing pw and done?)

I worked around 2 years with Postgres as a developer and I loved it. Very stable, very solid, many features and no surprises.

MySQL on the contrary seems like a toy db, at least when you want to do a lot of relationship stuff with foreign keys etc. It feels really awkward that outer white-spaces have no meaning, there is no boolean type and that the admin tools out there seem to be really immature.

Having to work since nearly two years with MySQL, I find it still painful. If you don't store Petabytes of data and don't use it for a highly frequented website, Postgres is probably your choice.

jeltz13y ago

> I think the impression that Postgres is more difficult than MySQL roots in the fact that it's more difficult to install at home. (Why so many installation steps? MySQL is basically just installing the package, changing pw and done?)

I think much of the blame here can be put on the people writing installation guides on the Internet. The guides often suggest setups more suited for networked databases than for a local database. When I install PostgreSQL for development machines it is just these two steps.

  1. sudo apt-get install postgresql-9.1
  2. sudo -u postgres createuser -s `id -un`

After that I can create whatever databases I like.

So it is the same number of steps to setup a PostgreSQL and a MySQL database. I admit it took me some time to figure out that this was the simplest way, and I have yet to see a installation guide which suggests this method.

einhverfr13y ago

First, I get frustrated whenever I have to use MySQL. However, if you think the MySQL tools that come with the software seem immature you should try Oracle's. There's a reason why TOAD for Oracle is the de facto admin tool and why it is not made by Oracle.

I worked around 2 years with Postgres as a developer and I loved it. Very stable, very solid, many features and no surprises.

Every database has surprises. Oracle has fun with transactions and DDL, null handling of string types, etc). PostgreSQL has surprises galore when dealing with collections and no these are not well documented.

For example the following two are handled very differently by PostgreSQL because nobody can agree on what correct behavior is for collection tables:

     CREATE TABLE foo (
            bar int not null,
            CHECK (bar > 0)
      );
      CREATE TABLE bars (foo foo);

      INSERT INTO bars (foo) values (row(-1)), (row(null));

The above is allowed, but:

      CREATE DOMAIN baz int not null check (value > 0);
      CREATE TABLE foo (bar baz);
      CREATE TABLE bars (foo foo);
      INSERT INTO bars values (row(null)); --not allowed
      INSERT INTO bars values (row(-1)); --not allowed

The thing is that in collections domains are not handled like column types. You'd think this was intentional but if you:

     ALTER TABLE foo ADD is_bar NOT NULL DEFAULT true;

Postgres will happily refuse to do so, saying it can't follow this if foo is used as a type on another table. There is obviously a bug here, but as discussed repeatedly on various email lists, nobody can agree on what needs to be done about it.

Though if you are working with collections I suppose you can assume you are going to find all sorts of surprises. As I say multiple table inheritance is far better as long as you can think in terms of composition instead of OO inheritance.

einhverfr13y ago

Hmmm.... if you think MySQL has made DBA's obsolete, then you haven't had much experience with it.

lmm13y ago

My first job was at a company that made the UK's fastest growing list something like five times in a row. Eight years after being founded by two guys in their college room it was sold for GBP100m.

Prior to that sale we never had a dedicated DBA. Our two and a half sysadmins kept MySQL up to date and handled the replication setup for teams that didn't want to do it themselves (which was perhaps 2/3). Table layout and query optimization were left entirely up to developers.

It worked far better than the setup at either of my subsequent jobs, which have been postgres and oracle with dedicated DBAs.

einhverfr13y ago

What do you think the job of a DBA is?

I have one customer who has a couple of sysadmins do all that for both MySQL and PostgreSQL servers. If that's all you need, you probably don't need a dedicated DBA unless you are running Oracle and that's just because with Oracle your DBA can always find something to do.

If MySQL can be said to make DBA's obsolete because it doesn't take that much maintenance, Informix beat them to that by a few decades.

1 more reply

cagenut13y ago

I've either inherited or hired a mysql dba at my last four jobs. In fact I'd say its a close call as a young/small internet company grows whether they should hire their first dba before they hire their first sysadmin, as you're probably going to get more immediate wins out of the dba.

epo13y ago

The article wasn't written by a DBA. Reading isn't one of your strong points is it? Neither, if you think MySQL makes DBAs obsolete, is thinking.

drblast13y ago

While you're correct, the personal jabs are unnecessary.

This kind of thing cheapens the discussion here. Please take it somewhere else.

j / k navigate · click thread line to collapse

67 comments

3amOpsGuy13y ago

Really well thought out article.

The single app vs. multiple apps distinction is a good frame for constructive discussion.

crazygringo13y ago

Wow, this is one of the best articles I've read on HN in a long time. It's a fantastic insight, and explains so much.

I would kill for a whole series of articles like this, explaining the philosophical underpinnings behind various technological alternatives.

So often, HN comments seem to degenerate into PHP vs everything else, iOS vs Android, OSX vs Linux, MySQL vs PostgresSQL, based on nothing else but people's personal experiences and preferences.

Taking a step back and exploring the why and history behind products, and what problems they were intended to solve, feels so refreshing and educational.

Kudos to the author (Chris Travers)!

smoyer13y ago

I should note that I have successfully deployed applications on both platforms ... once I started doing most of my database interactions through an ORM, the differences didn't really matter any more.

Vive la difference!

jeltz13y ago

einhverfr13y ago

Interestingly the two db's I can think of that don't support transactional DDL are both Oracle products ;-)

lmm13y ago

jeltz13y ago

The performance benefits of MyISAM are exaggerated. Its advantage was that is was a decently fast and very simple engine which made it possible for MySQL to reach the market quickly.

einhverfr13y ago

Actually I agree about the MyISAM tables. Those are great for lightweight content management, more or less that initial use case I mentioned.

VLM13y ago

Quick summary, assuming you know what MVC stands for:

Mysql - Model is in your code. PostgreSQL - Model is at least partially in your database.

bigiain13y ago

I think you missed the _reason_ why that matters.

If it's _just_ your app/code that's sending and retrieving data from tha database, you can pretty much do as you please.

If other code, especially other code written by other people needs to interact with that data, then explicit rules and agreements need to be made about exactly what "1890-02-30" means.

It's not so much "the model" that's moved into the database, but the validation of the values stored by your model.

einhverfr13y ago

Author here.

Mysql - Model is in your code. PostgreSQL - Model is at least partially in your database.

Also your code can be at least partly in your database which is what makes this possible.

There is a HUGE mistake in the article in the assumption that WRT the model design, that the database always knows best.

The fact is you can decide where you want the line to be. PostgreSQL allows you to build interfaces which give you much more intelligent data models at every line.

These tools have complexity costs though. Use where appropriate.

Mike Stonebraker's example was: Create a db query to tell you what images (in your database) are pictures of sunsets taken within 20 miles of Sacramento.

    select id
    from slides P, landmarks L S
    where sunset (P.picture) and
    contains (P.caption, L.name) and
    L.location |20| S.location and
    S.name = 'Sacramento';

VLM13y ago

"There is a HUGE mistake in the article in the assumption that WRT the model design, that the database always knows best.

I didn't say that. However if you read the entire O/R modelling series you will see in PostgreSQL it is possible to fully define your model in an OO-like way in your database"

"and if you do that then that model can be re-used across applications written in different development environments."

2 more replies

zzzeek13y ago

1 more reply

jeffdavis13y ago

I think this is a crucial point of distinction between the two philosophies: structure defined in the query; and structure structure defined before data is loaded.

einhverfr13y ago

xtdx13y ago

How is MySQL going to store "1890-02-30" as a date? What internal format does it use to allow storing a date like that?

bigiain13y ago

A varchar named something like "DateAsEnscribed". With a related date field that you can search on with some well defined policy about what happens when gravestones have invalid dates enscribed them.

Someone13y ago

1 more reply

VLM13y ago

3 more replies

joedoe5555513y ago

+1 for that

That's also my impression and a reason why MySQL is a great DB for ORM-driven Apps.

gmac13y ago

mh-13y ago

well, that's completely pointless.

jaakl13y ago

taligent13y ago

Did you try Percona or Twitter/Facebook's version of MySQL ?

dgregd13y ago

If you cannot compare PostgreSQL to MySQL then what databases can be compared?

dotborg13y ago

the point of this article is that Postgres is not just a database

einhverfr13y ago

You have to define "just a database" (I am the author btw).

dotborg13y ago

postgres provides much more capabilities than just "dumb storage of the application's state"

1 more reply

thirsteh13y ago

Sure it is. MySQL is just a database with fewer features (and less rigid ACID compliance.) Like SQLite vs. MSSQL or PG vs. Vertica. Different tools for different jobs.

1 more reply

lobster_johnson13y ago

Also, can I run different versions of the code concurrently? Not in the same database, apparently?

einhverfr13y ago

eurleif13y ago

>Or what my database schema changes -- then my code has to move along in tandem with the schema.

ibotty13y ago

great article. i am amazed. (many articles from ledgersmb folks (or only this person, i don't remember) are great.)

tesmar213y ago

VLM13y ago

"If you write small - medium rails apps"

If you are writing using a MVC like rails, mysql is easier because the M is solely in the rails app rather than some of the M being in the rails and some of the M being in the DB configuration.

fdr13y ago

LedgerSMB -- an author of which wrote this article -- is on the advanced side of the spectrum in using specific Postgres-isms, and that's not for everyone.

There are grades in-between: using arrays for simple kinds of matching in Postgres is not rocket science and can make your life a lot easier (and faster):

  SELECT * FROM blawg WHERE tags && '{a,b,c}';

1 more reply

einhverfr13y ago

On top of that, if all you are doing is relatively simple persistence, the sql_mode and plugin table model can be actually very useful even though these get in the way in some other environments.

dscrd13y ago

Are there any books out there that describe how to efficiently use PostgreSQL's object-relation techniques?

einhverfr13y ago

I am actually thinking of writing one. My blog (yes I am the author of the article, http://ledgersmbdev.blogspot.com) is basically running a fairly large series on these features.

In a bit over a week, I will have general recommendations out on the blog entry. As a brief preview:

1: table methods are generally good

2: nested storage is generally bad

3: table inheritance is frequently used for the wrong things. It is a very powerful technique if you think in terms of composition, but not so great if you think in terms of inheritance.

4: avoid to the extent you can putting tuples in columns. Postgres gets really weird in these cases.

dotborg13y ago

postgres docs are full of performance tips

joedoe5555513y ago

I learned SQL with Postgres docs ;)

dkhenry13y ago

joedoe5555513y ago

Postgres is definitely a pain in the butt to properly setup in a network, in particular if you have never done it before. (And just talking about non-cluster use)

I worked around 2 years with Postgres as a developer and I loved it. Very stable, very solid, many features and no surprises.

Having to work since nearly two years with MySQL, I find it still painful. If you don't store Petabytes of data and don't use it for a highly frequented website, Postgres is probably your choice.

jeltz13y ago

  1. sudo apt-get install postgresql-9.1
  2. sudo -u postgres createuser -s `id -un`

After that I can create whatever databases I like.

einhverfr13y ago

I worked around 2 years with Postgres as a developer and I loved it. Very stable, very solid, many features and no surprises.

For example the following two are handled very differently by PostgreSQL because nobody can agree on what correct behavior is for collection tables:

     CREATE TABLE foo (
            bar int not null,
            CHECK (bar > 0)
      );
      CREATE TABLE bars (foo foo);

      INSERT INTO bars (foo) values (row(-1)), (row(null));

The above is allowed, but:

      CREATE DOMAIN baz int not null check (value > 0);
      CREATE TABLE foo (bar baz);
      CREATE TABLE bars (foo foo);
      INSERT INTO bars values (row(null)); --not allowed
      INSERT INTO bars values (row(-1)); --not allowed

The thing is that in collections domains are not handled like column types. You'd think this was intentional but if you:

     ALTER TABLE foo ADD is_bar NOT NULL DEFAULT true;

einhverfr13y ago

Hmmm.... if you think MySQL has made DBA's obsolete, then you haven't had much experience with it.

lmm13y ago

My first job was at a company that made the UK's fastest growing list something like five times in a row. Eight years after being founded by two guys in their college room it was sold for GBP100m.

It worked far better than the setup at either of my subsequent jobs, which have been postgres and oracle with dedicated DBAs.

einhverfr13y ago

What do you think the job of a DBA is?

If MySQL can be said to make DBA's obsolete because it doesn't take that much maintenance, Informix beat them to that by a few decades.

1 more reply

cagenut13y ago

epo13y ago

The article wasn't written by a DBA. Reading isn't one of your strong points is it? Neither, if you think MySQL makes DBAs obsolete, is thinking.

drblast13y ago

While you're correct, the personal jabs are unnecessary.

This kind of thing cheapens the discussion here. Please take it somewhere else.

j / k navigate · click thread line to collapse