I believe also updates to index seek or scan in that time.
I'll update this thread when I find out why.
begin;
alter table foos add answer int not null default 42;
alter table foos drop column plumbus;
update foos set name = upper(name);
create table bars (t serial);
drop table dingbats;
rollback; // Or, of course, commit
What's the benefit? Atomic migrations. You can create, alter, drop tables, update data, etc. in a single transaction, and it will either commit complete if all the changes succeed, or roll back everything.This is not possible in MySQL, or almost any other database [1], including Oracle — DDL statements aren't usually transactional. (In MySQL, I believe a DDL statement implicits commits the current transactions without warning, but I could be wrong.)
Beyond that, I'd mention: PostGIS, arrays, functional indexes, and window functions. You may not use these things today, but once you discover them, you're bound to.
[1] https://wiki.postgresql.org/wiki/Transactional_DDL_in_Postgr...
I don't know if it accomplishes anything truly new (other than ideas that aren't very useful in practice like being able to have multiple test runs going in parallel), but it's a pretty neat way to be able to do it and works well.
Might want to mention the downside of using MySQL as well. (Am also interested to know as a daily MySQL user.)
- JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)
- Window functions (available in MySQL 8x only, while this has been available since Postgres 9x)
- Materialized views, views that is physical like a table, can be used to store aggregated, pre-calculated data like sum, count...
- Indexing on function expression
- Better query plan explanation
For indexing on function expressions in particular, the workaround we use is to add a generated column and index that.
MySQL 5.7 fully supports this. See https://dev.mysql.com/doc/refman/5.7/en/create-table-generat... and https://dev.mysql.com/doc/refman/5.7/en/create-table-seconda...
> JSON column (actually MySQL 5.6 supports it but I doubt if it's as good as Postgres)
Actually MySQL 5.6 doesn't support this, but 5.7 does, quite well: https://dev.mysql.com/doc/refman/5.7/en/json.html
It's an excruciating process though.
- PLV8/PLPython/C functions/etc (with security!)
- TimescaleDB
- Better JSON query support
- Foreign Data Wrappers
- Better window function support
- A richer extension ecosystem (IMO)
Honestly, at this point I wouldn't use MySQL unless you only care about slightly better performance for very simple queries and simpler multi-master scaling/replication. Even saying that, if you don't need that simple multi-master scaling RIGHT NOW, improvements to the Postgres multi-master scaling story are not too far off on the roadmap, so I would still choose PG in that case.
Many of the largest tech companies rely on MySQL as their primary data store. They would not do so if it was unreliable with persistence.
There are many valid reasons to choose Postgres over MySQL, or vice versa -- they have different strengths and weaknesses. But there are no major differences regarding data reliability today, nor have there been for many years now.
Where I work, we chose MySQL back in 2012 due to production quality async replication. I think (but am never sure) that that is now good in Postgres land.
PG has a lot of SQL features I'd love to use and can't. OTOH MySQL's query planner is predictably dumb, which means I can write queries and have good idea about how well (or not) they'll execute.
EDIT: Apparently 11.1 is available in beta as of April 9th.
EDIT: I'll try again. Looks like it was added April 9th
Any complications or hiccups I need to worry about moving from 10 to 11?
Per Heroku Docs: By supporting at least 3 major versions, users are required to upgrade roughly once every three years. However, you can upgrade your database at any point to gain the benefits of the latest version.
How do you manage failover and replication? At my previous job this was done by a consultant. Is this doable on a self hosted setup?
Thank you in advance.
I know of BDR, but there hasn't much news about it lately, especially with more recent versions of Pg.
We like Galera for our simple needs: we use keepalived to do health checks, and if they pass the node participates in the VRRP cluster. If one node goes down/bad, another takes over.
Queries done through pgbouncer just pause as if the query is really really slow when the db goes down, then when pglookout does the failover, the bash script switches pgbouncer's config and those pending queries are sent immediately.
For a complex web-app, would you suggest an ORM (looking at SQLAlchemy) or a custom module with hand written queries and custom methods for conversion to python objects?
My app has a lot of complex queries, joins, etc. and the data-model is most likely to change quite a bit as the app nears production. I feel using an ORM is an unnecessary layer of abstraction in the thinking process. I feel comfortable with direct SQL queries, and in some cases, want to directly get JSON results from PGSQL itself.
Would that be a good idea, and more importantly, scalable?
Note : My app will be solely developed by me, not expecting to have a team or even another developer work on it.
For advanced queries, you can write raw SQL
The way I see it, an ORM has three useful features:
- A migration/seed mechanism (you will need it anyway)
- A schema definition for mapping tables to object
- A query builder
If you feel that an ORM is too heavy, you can seek for just the query builder.
* for normal queries (select /cols from table where id etc etc) we just used plain django orm. even for weird joins, django orm makes it a lot easier than using raw sql
when we needed raw speed, we just wrote raw sql and delegated to django sql layer -- that way we leverage everything the framework has with raw sql power.
It maps pretty much 1:1 to SQL and for me it beats the alternative (using text interpolation for composing queries).
If you're doing anything more complex than these basic sorts of queries and subqueries, or your developers are proficient in sql, using even a very good ORM like sqlalchemy is going to be a step down.
Since you say you're doing this all yourself, and SQL is probably the most ubiquitous programming language (in terms of percentage of jobs requiring it, not total LOC) so learning opportunities there are more valuable, I would go direct.
Since you're probably used to dealing with and migrating your tables manually, I would keep custom SQL for all your complex operations, and use SQLAlchemy for doing basic insert/update/select. Django also has an "unmanaged" mode where you can create a model and it will avoid trying to create a migration to create the table.
Of course, you have to manually update the model if you manually change your DDL.
Watch out for differences on how you are serializing data from Django/SQLAlchemy models vs. raw dicts from PsychoPG.
I like to organize my SQL by keeping each query in a separate .sql file and writing a little wrapper that fetches the files (+1 for caching it) and then executing it. I'm not a fan of lots of inline SQL mixed with Python.
Overall I think it's a great + powerful setup!
cur.execute(query, {‘foo’: bar})
Passing values directly into cur.execute is the best way to prevent SQL injection as well since it will sanitize the input params upon running
I'm developing a web application that uses SQLAlchemy. The ORM has been a huge boon for CRUD functionality. We also have some very complicated reporting features and use SQLAlchemy's query builder almost exclusively. I find that the query builder maps very cleanly to SQL, so I can still "think" in SQL while writing and reading it. And the query builder makes complex query composition easier to manage.
SQLAlchemy provides more than just the ORM... I actually wish the docs were structured differently to better emphasize that in search results, etc.
Hard to say, but don't forget about migration support, which is quite helpful.
With Aurora, the storage layer is swapped out entirely for a distributed storage engine, that I believe is based upon DynamoDB.
The wire protocol and server interface are much the same as regular Postgres, though there are some additional benefits as well as caveats as you might expect
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...
This has really put us off using other AWS managed products and was a major factor in us deciding against using Amazon Elasticsearch Service.
I know of BDR, earlier versions of which are open source, but there hasn't been much movement with Pg 10 or 11 AFAICT.
We don't do anything complicated, but simply want two DBs (with perhaps a quorum system) that has a vIP that will fail-over in case one system goes down (scheduled or otherwise).
Galera provides this in a not-too-complicated fashion.
> This is the Git repo of the Docker "Official Image" for postgres (not to be confused with any official postgres image provided by postgres upstream)
Supporting five versions is no more than MS do: currently SQL Server versions 2017, 2016sp2, 2016sp1, 2014sp3, 2014sp2, 2012sp4, 2008R2sp2 and 2008sp3. 2008sp3, 2008R2sp2, and 2016sp1 will hit their final EOL in a couple of months taking SQL Servers's supported list back down to 5 too.
I expect other significant DB maintainers have similar support life-time requirements for much the same reasons, though I'll leave researching who does[n't] as an exercise for the reader.
Note the recent versioning change: 9.4, 9.5, 9.6 were the previous 3 major versions bases, and the last two are 10 and 11.
2) It's horrifically high risk because downgrading is usually not a thing
3) It usually requires downtime.
1. Test the upgrade: set up an additional secondary (9.3), break the replication link (promote it to a master). Test the upgrade on that. It was really fast, under 30 seconds to shut down the old DB, run the in-place upgrade, and start up the new DB.
2a. In production: set up an additional secondary (9.3). Make the primary read-only. Promote the new secondary to a master. Shut down, upgrade to 11.2, restart. Point applications at it.
2b. Backout plan: leave the applications pointing at the original database server, make it read-write.
There are other options, including with only seconds of downtime, but <1 minute with pg_upgrade was simple and very acceptable for us.
Because it mirrors and supports the reality of the business world.
Every large or small organization that manages their business, every year make 'Grow/Invest', 'Maintain', 'Disinvest' decision for each of the product/service lines.
Does not matter if is software, or making kielbasa. Postgres is exceptional, and is supporting the first 2.
There's legacy crap everywhere, all langs,db,versions etc. Supported sometimes for 10+ years.
Security updates should push the upgrade path a little harder, but there are still cases where a database can be completely isolated from the network and that might not even matter.