I should add that Postgres was the first database I ever used, and I literally learned pretty much everything I know about Postgres, SQL, as well as Relational and Set Logic from the official docs. And that was with no background in software development and an undergraduate business degree with Excel being my most technologically advanced toolset. That is a documentation success story.
With the steady addition of features, it's gotten much more complex, and there will come a time when using just the documentation won't be enough to learn how to use Postgresql to full advantage. With release of 9.5 we might be there now.
Perhaps the logical extension of the documentation is some form of coursework to enable users to learn the DB systematically. I haven't looked into it, this might already be offered.
- http://sequel.jeremyevans.net/documentation.html
- http://sequel.jeremyevans.net/rdoc/files/doc/postgresql_rdoc...
Sure if you already understand PostgreSQL. If you don't then it isn't particularly user friendly. It's awkward to navigate, doesn't explain the basics (how to actually install it) and tends to combine information that is more for advanced users on the same page as beginners.
Compare you're standard with mine (MongoDB):
Er, http://www.postgresql.org/download/ and/or http://www.postgresql.org/docs/current/static/installation.h... ?
> ...tends to combine information that is more for advanced users on the same page as beginners.
There's a tutorial for beginners: http://www.postgresql.org/docs/9.4/static/tutorial.html
Then there's reference material in the rest of the manual. The tutorial even suggests the order in which you should read the rest of the manual. :)
Dang/mods: Can we please please get something to avoid accidentally downvoting good stuff on mobile phones?
I also think that PLv8 should probably make it in the box in the next release or two. With the addition of JSON and Binary JSON data options, having a procedural interface that leverages JS in the box would be a huge win IMHO. Though I know some would be adamantly opposed to this idea.
Eeeh...a simple HA solution can be developed in about a week (I was able to do so on 9.3, and so far, it held it's ground). Also, now with 9.5's pg_rewind you can easily switch back and forth between nodes (http://www.postgresql.org/docs/9.5/static/app-pgrewind.html), simplifying things a great deal. Can't imagine that's 5-6 figures.
I agree that you don't get a Plug&Play-Solution out of the box, but from anecdotal evidence they often don't quite work as advertised anyway (remember 1995? And I'm sure your friendly DBA has some stories to share as well).
If you're going to be hosting your db on something like AWS EC2 anyways, then just buy a db product like AWS RDS, and pay for the HA option. Ends up around the same price as if you'd set up everything yourself (assuming you were going to host on AWS anyways, and not going with a low cost option), and is very easy.
col1 col2 count
---- ---- -----
a b 10
a null 5
null b 5
There's more to it than that obviously, but you can read about them here: http://www.postgresql.org/docs/devel/static/queries-table-ex... (7.2.4. GROUPING SETS, CUBE, and ROLLUP)From web server, connect to db as one user then SET ROLE to the database user. This gives you Column Security and easier auditing as well. See http://stackoverflow.com/questions/2998597/switch-role-after...
What particularly are you looking for in a "getting started" tutorial? Honestly, you should just plunge in, on some side project (or a mirror of whatever projects you've used MySQL on) and just compare.
This is a lot easier to say than do/live by, but I think you shouldn't invest in one tool choice when you haven't given the others a fair shake (once you have enough time to step back and think about your decision).
A few things you will want to look at that are different:
1. data types are much richer and more useful than in mysql
2. transactional DDL means migrations are atomic.
3. schemas are what mysql refers to as databases. Remember to set `search_path`.
4. roles and grants are somewhat more expressive and work differently than in mysql, but not that differently for the simpler use cases
5. database functions ( aka stored procedures ) are awesome as are extension languages.
But they are doing absolutely nothing about my biggest beef with PostgreSQL. Which is that there is absolutely no way to lock in good query plans. It always reserves the right to switch plans on you, and sometimes gives much, much, much worse ones. No other database does this to me. Even MySQL's stupid optimizer can be reliably channeled into specific query plans with the right use of temporary tables and indexes.
This is a problem because improvements don't matter if the query plan is "good enough". But they will care if you screw up. PostgreSQL usually does well, but sometimes screws up spectacularly.
The example that I have been struggling the most often with in the last few months is a logging table that I create summaries from. Normally I only query minutes to hours, but I set it up as a series of SQL statements so I first put the range in a table, and then have happened BETWEEN range_start AND range_end. PostgreSQL really, Really, REALLY wants to decide that the index on the timestamp is a bad idea, and wants to instead do a full table scan. Every time it does, summarization goes from under a second to taking hours.
Hopefully the new BRIN indexes will be understood by the optimizer in a way that makes it happier to use the index. But I'm not optimistic. And if I lean on it harder, I'm sure from past experience that I'll find something else that breaks.
But perhaps I also have some practical advice to try.
I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type.
The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans.
As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.
Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads).
My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.
If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.
That'd not only make production scenarios more reliable, but it'd also make it much easier to test tweaks to the cost model in practice.
The problem is that that's a rather significant project, and it's hard to get funding for that kind of work. The postgres companies aren't that big, and it's not an all that sexy feature marketing wise.
I also don't think that query hints are a good way to do it. And I don't mind if the way to do it is somewhat cumbersome. This is very much a case where 20% of the work can give 99% of the benefit.
For example what about the following approach?
1. Add an option to EXPLAIN that will cause PostgreSQL's optimizer to spit out multiple plans it considered, with costs, and with a description of the plan that PostgreSQL can easily parse and fit to a query.
2. Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.
And now in the rare case where I don't like a query's plan I can:
EXPLAIN PLANS=3 (query);
Pick my desired plan from the list (hopefully)Then in my code I:
PREPARE foo AS (query);
PLAN foo (selected plan);
EXECUTE foo;
And now if I notice that a query performs worse than I think it should, I can make it do what I want it to.The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.
You can also convince it that indices are the way to go by changing more basic settings about costs of reading a random page on disk vs reading sequentially, making full table scans more expensive, but tuning those settings away from realistic costs might have negative side effects for you.
I was able to have great success running complex queries on 100M+ row tables that were insert-only using this kind of trick, but YMMV. If nothing else fails, really experienced people are more than willing to help in the performance mailing list. They sure helped me quite a few times.
Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.
SET enable_seqscan = OFF;
We use these options a lot on tables that result in odd query plans to get them doing the best option..
What scares me is that this is unreliable, and according to the documentation, the optimizer is free to choose to ignore everything that I say whenever it wants. The fact that it already HAS done that to me does not provide me comfort.
"RLS implements true per-row and per-column data access control"
The reason is that the optimizer reorders operations. So, a tricky person can write the query in a way that, for example, throws a divide-by-zero error if someone's account balance is within a certain range, even if they don't have permission to see the balance. Then they can run a few queries to determine the exact balance.
RLS builds on top of something called a "security barrier view" which prevents certain kinds of optimizations that could cause this problem.
It also offers a nicer interface that's easier to manage.
I know applications can be compromised but now you can freely share your DB freely with other teams to analyze or play around
It sounds like it's basically a BTree that stops branching at a certain threshold, but I'm almost certainly wrong.
Obviously single row accesses in a fully cached workload are going to be faster if done via a btree rather than such range maps, even if there's perfect clustering. But the price for having such an index is much lower, allowing you to have many more indexes. Additionally it can even be more efficient to access via BRIN if you access more than one row, due to fewer pages needing to be touched.
Slides are here: http://hlinnaka.iki.fi/presentations/Index-internals-Vienna2...
I'm really, really keen to use 9.5 jsonb with its insert/update changes.
My plan is to rely on these functions for now, and switch to the native implementations once 9.5 is production ready on RDS.
i want to look deeper into this, but didnt have the time but from the little i read, seems ms sql merge is more powerful
The reason PostgreSQL went with this syntax is that the goal was to create a good UPSERT and getting the concurrency considerations right with MERGE is hard (I am not sure of the current status, but when MERGE was new in MS SQL it was unusable for UPSERT) and even when you have done that it would still be cumbersome to use for UPSERT.
EDIT: The huge difference is that PostgreSQL's UPSERT always requires a unique constraint (or PK) to work, while MERGE does not. PostgreSQL relies on the unique constraint to implement the UPSERT logic.
I've used MERGE as an UPSERT using MATCHED/NOT MATCHED and SERIALIZABLE/HOLDLOCK since it was introduced in mssql 2008. It was one of the first features I upgraded my code to use, and it worked out of the box with no issues.
Here's a great post from Postgres team showing why they didn't just implement merge themselves:
http://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=YYDGY...
Edit: I meant apt.postgresql.org of course, not the official Ubuntu repo..
Has anyone come across a guide to using it for upgrades?
the very first example points to BRIN indexes resulting in smaller index than btree but with much longer search time... so i guess the 5% time figure was very use-case specific?
We do over 20,000 queries per second on one of our production mysql DB's and I'm not sure I'd trust anything else with that: http://i.imgur.com/sLZzXhS.png
Just curious if I'm missing out on some new awesomeness that PostgreSQL has or if it's just marketing.
In a word: correctness.
Yes, MySQL has an UPSERT implementation. Like so many things MySQL rushed out the door, it's also buggy and unpredictable. Did you know UPSERTing into a MySQL table with multiple unique indexes can result in duplicate records? Did you know MySQL's ON CONFLICT IGNORE will insert records that violate other not-NULL constraints? [1]
I've used both MySQL and PostgreSQL for over a decade, and working around the many MANY misbehaviors and surprises in MySQL requires continuous dev effort. PostgreSQL on the other hand is correct, unsurprising, and just as performant these days.
MySQL is what happens when you build a database out of pure WAT [2].
[1] https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......
You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB? I find MySQL to be both reliable and incredible durable i.e. it handles yanking the power cord quite well. The performance also scales up linearly for InnoDB even for very high traffic and concurrency applications.
We use redis, memcached and other storage engines - by no means are we tied to mysql. But for what it does, it does it incredibly well.
I'm also completely open to using PostgreSQL and I was hoping someone could give me a compelling reason to switch to it or to use it.
People like myself use Postgres because it has a much richer feature set. See http://stackoverflow.com/a/5023936/270610 for some examples. Personally I find MySQL beyond frustrating due to its lack of… well almost all of those. Recursive CTEs in particular, but arrays and rich indexing are pretty core too.
Postgres's query optimizer is far more advanced too. MySQL doesn't even optimize across views, which discourages good coding practices.
The documentation is fantastic. Complete and well-written, covers the nuances of every command, expression, and type. MySQL's doesn't hold a candle to it.
Don't know what you mean about "unfriendly". Help is built into the command-line tool, and like I said, the documentation is fantastic. Maybe MySQL is a little more "hand-holdy", but I don't care for such things so I wouldn't know.
Back in the early 2000s, LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine. Unfortunately, the reputation as being slow stuck among developers.
Postgres particularly shines on multicore systems, thanks to some clever internal design choices. Having a sophisticated cost-based query planner also helps.
As for unfriendly: Care to amplify? In my work, I've found the opposite to be true.
For example, the very first thing you tend to encounter as a new developer is "how to create a user". For MySQL, it turns out that using GRANT to grant a permission creates a user, which is counterintuitive; GRANT also sets the password, and promotes the use of cleartext passwords. By comparison, Postgres has "createuser", as well as a full-featured set of ALTER USER commands. The difference between "mysql" and "mysqladmin" is also completely unclear.
The almost complete lack of warts and legacy cruft in Postgres significantly removes the possibility of confusion, uncertainty and information overload. MySQL's manuals are littered with "if X is enabled then this behavior is different, and in versions since 5.7.3.5 this behavior has been changed slightly, and 5.7.3.6 has a bug that silently swallows errors", etc. MySQL's historical date and NULL handling alone is worth a chapter of any book.
Postgres also has a level of strictness above MySQL, which is in itself instructive. You know when you're doing something wrong. Postgres never accepts bad input. It always requires a strictly correct configuration setup.
Plus: Just type \h in psql. It has a complete reference of the entire SQL syntax.
It wasn't just the /. crowd. Back in the 3.5 days the MySQL devs were doing that too, and writing long discourses on why transaction safety was a bunch of crap and a crutch for bad application developers.
- CTE's
- Arrays/JSON type
- partial indexes
- transactional DDL
- NOTIFY
- Materialized views
- Schemas
- PostGIS
- Row level security (which is new in PG)- Arrays, particularly with GIN indexes. This makes things like tagging fantastic in Postgres. Instead of putting your tags in another table, you throw them in an array and you can do all kinds of things like set intersection-like queries.
- JSON. Postgres can store data as JSON and index and query the JSON. This essentially gives you MongoDB type queries.
I'm sure there's more but these are my favourite Postgres features.
Postgres has always taken a more 'solid' approach. One instance that made my jaw drop when I realized it: in the past (has this been fixed?), DDL (alter table, create table, etc...) were not transactional in Mysql. You could get 50% through a series of them, and find your database 100% fucked up.
That said, over the years Mysql has been improving too, for sure.
So honestly, at this point I do think some of it is impressions from the past which are no longer valid. But still, Mysqlhas done/does all sorts of things that defy the spec, convention or just common sense (I don't know if this has been fixed, but at least for many years, April 30th was treated as a valid date, and there was some profound weirdness of which I can't quite recall the details involving locale stuff).
Postgres generally takes the position that data should always be safe first and speedy sometime later. It also assumes the operator understands their tools. That second one means in comparison with Mysql, people think it is unfriendly. It isn't (if you want to see unfriendly, go work with Oracle), it just expects that its friends learn about it. Which is of course good advice when you're dealing with complicated software on which a lot of money tends to ride.
And as the PG devs have said for years, they don't compete with Mysql. They compete with Oracle. There's no reason to switch if you're happy with Mysql.