Show HN: Skor – Drop-in microservice to get Postgres changes as JSON webhooks (opens in new tab)

(github.com)

133 pointsmesid8y ago37 comments

37 comments

Looks cool. Seems like a good alternative if you can’t use listen/notify for some reason.

I’ve used listen/notify with triggers for a while with good success. The flexibility of my apps controlling what they want to listen for instead of having that config elsewhere.

Downside to l/n is that if you aren’t listening you aren’t getting changes. Seems like the same is true if the webhook fails here since there isn’t a retry.

If you want to check out another cool alternative see Debezium[1]. It’s uses PGs logical decoding feature to stream changes. Supports protobufs and JSON output.

http://debezium.io/docs/connectors/postgresql/

mesidOP8y ago

I did use Debezium in a Kubernetes cluster to stream Postgres changes into Kafka and then through a custom application that can send them to clients via HTTP or WebSockets. But the entire system had multiple points of failure. So we then decided to write our own thing that connects to the LR slot and sends over WebSockets. Working on open sourcing that too!

netcraft8y ago

can you talk more about the points of failure? and is there a place I could sign up for more information when you open source your websocket solution?

1 more reply

btown8y ago

I remember this old article about Debezium's now-unmaintained predecessor, Bottled Water. Its diagrams may still be useful for those wanting to understand how this might be used.

https://www.confluent.io/blog/bottled-water-real-time-integr...

1 more reply

octernion8y ago

Unfortunately, most of these LR programs don't work in AWS (RDS) land, since they require PG plugins, and RDS LR only supports the built-in test decoder.

I wrote a similar program that uses LR to stream changes to AWS's Kinesis[1], using a mini library that parses the output of the default test decoder[2] as a result, in case anyone else is in RDS and has the same limitation.

[1] https://github.com/nickelser/pg_kinesis [2] https://github.com/nickelser/parselogical

nrmitchi8y ago

IIRC Debezium is RDS compatible since version 0.7.0

1 more reply

fizx8y ago

RDS supports wal2json.

pfooti8y ago

Interesting. A quick look at the code indicates you're just doing a notify on the whole row. Have you tested with large rows? The postgres notify function fails with a payload larger than 8000 bytes, and that limit cannot be changed at runtime, last I looked.

I wrote a similar thing to notify on changes, but had to just fire notifies on what changed (id, column names) and there was a second step in the listening code to fetch the full details.

vsurabhi8y ago

Yes. That is correct. It fails with large rows. I'll add this to the limitations. The trigger currently sends the entire row but it can be modified to send just the unique identifier for a row. I'll add the documentation to do this. Tracking this here: https://github.com/hasura/skor/issues/6

(I work at Hasura)

netcraft8y ago

This problem is one of my biggest needs with postgres, although its certainly not limited to pg. listen/notify is fragile - you either have a single point of failure or you run multiple workers and have to deal with de-duping. Logical replication might one day be an answer, but not likely to ever work completely and without issue on cloud hosted instances. (maybe im wrong about this... see debezium comments elsewhere in the comments)

What ive been thinking of lately is doing triggers to write to a different table with a timestamp and doing a rolling log - the worker can listen/notify on that but can also keep track of the last message it saw and can go back and catch up when it first starts, so if it fails you dont lose anything. All I really need is a type (table) and the relevant keys. I think I would prefer a way to configure which tables I want to listen on and which columns (keys) I care about.

Really wish there was a standard postgres provided way to notify an external system of changes though. Its certainly not an easy issue.

fritzy8y ago

`table -> trigger to log table -> trigger to notify` is exactly the right solution here. You can even use the same stored proc to replay old logs.

xdanger8y ago

I did something similiar, but directly from postgresql triggers: https://github.com/pramsey/pgsql-http

You just have to be careful with failing queries and set timeout to low.

politips8y ago

This looks cool, and another approach is pg_amqp_bridge to send events like this into rabbit. A nice feature is that the triggers don't block:

https://github.com/subzerocloud/pg-amqp-bridge

You can then push to websockets from rabbit with STOMP

https://www.rabbitmq.com/web-stomp.html

oneweekwonder8y ago

> pg_amqp_bridge

Thanks for posting this. Can already see a couple of weekends burned trying to drive celery from pg.

Do you know if it is used in production somewhere; how battle-tested is it?

politips8y ago

I've used it in a couple production deployments and it's solid. It's only ~300 lines of Rust code, so I'm confident it is simple and even if I had a problem, would be easy to understand.

bearjaws8y ago

Very cool, at my current job we have a realtime ETL process using Maxwell & Apache Kafka to stream replication events to a set of Node servers for a variety of transformations.

Seeing that this isn't hooked into replication is a bit worrying, but it seems they are going to implement it as a replication client which should make it much more robust.

fizx8y ago

Seems like there are a lot of people here who aren't aware of wal2json and RDS support for wal2json.

crescentfresh8y ago

Does an update of 100 rows result in 100 webhook calls or 1 webhook with a payload of 100 rows?

vsurabhi8y ago

100 webhook calls as the trigger is executed whenever a row is modified.

Postgres's per statement triggers don't let you capture the rows that are modified and skor doesn't do any sort of batching currently.

anarazel8y ago

> Postgres's per statement triggers don't let you capture the rows that are modified

They do these days. Since 10 you can specify:

  REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]

in a FOR EACH STATEMENT ... AFTER trigger.

1 more reply

anc848y ago

Very cool and I see myself using this in the future.

Is an approach like this best practise or rather meant as intermediate hack?

tango128y ago

It’s more of an intermediate hack I’d say unless you don’t mind missing the odd event. We’re working on another implementation that uses the replication stream (like dbezium) that will be a more best-practice solution.

(I work at Hasura)

Lord_Zero8y ago

In general I have not seen the DBMS emitting changes as a best practice. The applications modifying the data should be emitting events or messages appropriately. Someone can correct me if I am wrong.

matte_black8y ago

I’d venture to say that is wrong at least in the case of Postgres.

In order to be able to reason with certainty about your data, information should be coming from the source closest to the truth, and that would be your database. You can’t really trust the application, it may have been written wrong.

Postgres’s LISTEN/NOTIFY commands, which are not part of the SQL standard, exist for these cases.

brightball8y ago

Part of the thought process around that is that most DBs don’t have a good option to do it well.

In larger applications, if you’ve got a need to have a stored procedure handle a complex modification your application will be blind to it.

Plus, at the application level you don’t have any way to enforce that the particular place in your code that modified the data and then emits the change is the ONLY place in your code that can modify that data. This becomes more pronounced over time too.

The problem is amplified if you find something that needs to be done to inbound data that is better handled in another language for some reason. Then you have to remember to duplicate the logic or hook your new code into the application, forcing a new layer in front of your database.

There are some things that the database is better left handling. :)

hderms8y ago

That was the conclusion I came to as well. If you just want to push data to a data warehouse by listening to the WAL or NOTIFY and then batch a bunch of updates together that's one thing. If you intend it to be application level events then coupling yourself to the schema of some other service (among other issues) seems problematic.

ryanworl8y ago

The most reliable implementation of this pattern is to use the replication binary logs of the database. Your application cannot atomically publish a write to both a database and a messaging system (without introducing a massive headache that you will implement incorrectly).

1 more reply

netcraft8y ago

what if you have multiple applications writing data? The central repository, most authoritative copy of your data is your database.

codezero8y ago

Cool! Does this handle rate limiting or retries?

vsurabhi8y ago

Currently there is no support for rate limiting or retries of webhook calls. I've opened an issue (https://github.com/hasura/skor/issues/7)

codezero8y ago

Cool - I didn't expect this, but was curious.

theo318y ago

Hi! It's very similar to what we built at https://base.run :) Take a look!

j / k navigate · click thread line to collapse

37 comments

odammit8y ago

Looks cool. Seems like a good alternative if you can’t use listen/notify for some reason.

I’ve used listen/notify with triggers for a while with good success. The flexibility of my apps controlling what they want to listen for instead of having that config elsewhere.

Downside to l/n is that if you aren’t listening you aren’t getting changes. Seems like the same is true if the webhook fails here since there isn’t a retry.

If you want to check out another cool alternative see Debezium[1]. It’s uses PGs logical decoding feature to stream changes. Supports protobufs and JSON output.

http://debezium.io/docs/connectors/postgresql/

mesidOP8y ago

netcraft8y ago

can you talk more about the points of failure? and is there a place I could sign up for more information when you open source your websocket solution?

1 more reply

btown8y ago

I remember this old article about Debezium's now-unmaintained predecessor, Bottled Water. Its diagrams may still be useful for those wanting to understand how this might be used.

https://www.confluent.io/blog/bottled-water-real-time-integr...

1 more reply

octernion8y ago

Unfortunately, most of these LR programs don't work in AWS (RDS) land, since they require PG plugins, and RDS LR only supports the built-in test decoder.

[1] https://github.com/nickelser/pg_kinesis [2] https://github.com/nickelser/parselogical

nrmitchi8y ago

IIRC Debezium is RDS compatible since version 0.7.0

1 more reply

fizx8y ago

RDS supports wal2json.

pfooti8y ago

I wrote a similar thing to notify on changes, but had to just fire notifies on what changed (id, column names) and there was a second step in the listening code to fetch the full details.

vsurabhi8y ago

(I work at Hasura)

netcraft8y ago

Really wish there was a standard postgres provided way to notify an external system of changes though. Its certainly not an easy issue.

fritzy8y ago

`table -> trigger to log table -> trigger to notify` is exactly the right solution here. You can even use the same stored proc to replay old logs.

xdanger8y ago

I did something similiar, but directly from postgresql triggers: https://github.com/pramsey/pgsql-http

You just have to be careful with failing queries and set timeout to low.

politips8y ago

This looks cool, and another approach is pg_amqp_bridge to send events like this into rabbit. A nice feature is that the triggers don't block:

https://github.com/subzerocloud/pg-amqp-bridge

You can then push to websockets from rabbit with STOMP

https://www.rabbitmq.com/web-stomp.html

oneweekwonder8y ago

> pg_amqp_bridge

Thanks for posting this. Can already see a couple of weekends burned trying to drive celery from pg.

Do you know if it is used in production somewhere; how battle-tested is it?

politips8y ago

I've used it in a couple production deployments and it's solid. It's only ~300 lines of Rust code, so I'm confident it is simple and even if I had a problem, would be easy to understand.

bearjaws8y ago

Very cool, at my current job we have a realtime ETL process using Maxwell & Apache Kafka to stream replication events to a set of Node servers for a variety of transformations.

Seeing that this isn't hooked into replication is a bit worrying, but it seems they are going to implement it as a replication client which should make it much more robust.

fizx8y ago

Seems like there are a lot of people here who aren't aware of wal2json and RDS support for wal2json.

crescentfresh8y ago

Does an update of 100 rows result in 100 webhook calls or 1 webhook with a payload of 100 rows?

vsurabhi8y ago

100 webhook calls as the trigger is executed whenever a row is modified.

Postgres's per statement triggers don't let you capture the rows that are modified and skor doesn't do any sort of batching currently.

anarazel8y ago

> Postgres's per statement triggers don't let you capture the rows that are modified

They do these days. Since 10 you can specify:

  REFERENCING { { OLD | NEW } TABLE [ AS ] transition_relation_name } [ ... ] ]

in a FOR EACH STATEMENT ... AFTER trigger.

1 more reply

anc848y ago

Very cool and I see myself using this in the future.

Is an approach like this best practise or rather meant as intermediate hack?

tango128y ago

(I work at Hasura)

Lord_Zero8y ago

In general I have not seen the DBMS emitting changes as a best practice. The applications modifying the data should be emitting events or messages appropriately. Someone can correct me if I am wrong.

matte_black8y ago

I’d venture to say that is wrong at least in the case of Postgres.

Postgres’s LISTEN/NOTIFY commands, which are not part of the SQL standard, exist for these cases.

brightball8y ago

Part of the thought process around that is that most DBs don’t have a good option to do it well.

In larger applications, if you’ve got a need to have a stored procedure handle a complex modification your application will be blind to it.

There are some things that the database is better left handling. :)

hderms8y ago

ryanworl8y ago

1 more reply

netcraft8y ago

what if you have multiple applications writing data? The central repository, most authoritative copy of your data is your database.

codezero8y ago

Cool! Does this handle rate limiting or retries?

vsurabhi8y ago

Currently there is no support for rate limiting or retries of webhook calls. I've opened an issue (https://github.com/hasura/skor/issues/7)

codezero8y ago

Cool - I didn't expect this, but was curious.

theo318y ago

Hi! It's very similar to what we built at https://base.run :) Take a look!

j / k navigate · click thread line to collapse