So if they mean directly exposing a service's data model from the database to other services, I'm very skeptical. If they mean providing that access by means of some abstraction, e.g. database views, it can be an option in some cases.
You'll still loose lots of flexibility you'd gain by putting some middleware in between, e.g. ability to scale out compute independently from storage, ability to merge data from multiple sources into one API response, ability to implement arbitrarily complex business logic e.g. in Java, etc.
It really depends.
If you have many different client services that need to access the database in a similar way, then you are right, it makes sense to add some type of abstraction. Then you can change the underlying model and only need to update the common logic once.
Example: if two services both need to create user accounts, it makes sense to encapsulate the logic for creating user accounts somewhere. (A common practice is to use stored procedures inside the database for that)
But if the services access the database in very different ways, then your abstraction may end up just making things more complicated. You'll have just as much work to update as if you updated all the services individually.
Example: An internal dashboard may need to access the data in unique ways. If you route the dashboard service through some middle layer, then that middle layer would have lots of APIs that are used only by the dashboard. So you gain nothing from the abstraction. Any change to the dashboard requires updating the middle layer, any change to the database also requires updating the middle layer. It's just as much effort to make changes as if there was no middle layer, you've just split the dashboard logic in two parts and made it harder to understand.
> No matter how you integrate different applications, be it via APIs, messaging, or a database, it's vital to separate your application's internal data model from models which it exposes.
Especially in respect to this product.
This is a good example of what separation enables you to do:
https://www.troyhunt.com/your-api-versioning-is-wrong-which-...
(it is presented from the perspective of how to version an API, but the examples of what it looks like are there)
S
The internal data model might need to be reorganized to improve the efficiency of part of the internal application.
A client might not need a higher fidelity model. The internal reorganization of the data model might not be pertinent to the client's interface. So we normally have some data-mapping in the application to help provide a stable interface for clients.
It's possible to providing these compatibility mappings within the database through views, but this is usually considered to be harder to control, test and scale.
Maybe in a big application, the application-layer mapping and caching eventually get complicated enough to be something like a custom-made database. And so we might end up with an "integration database" but call it something different.
> A customer pattern we see solves this problem, and it’s the integration database pattern. This pattern went out of style in the 90s along with Oracle servers in the closet and dedicated DBAs, but with technological advances like scalable multi-region transactions and serverless computing resources, its utility has returned.
Considering they are flogging a product, this feels especially dishonest to me.
(This project isn’t ready for prime time but it was a sketch of an approach to mitigating the downsides of integrating through the database, in order to unlock the upsides).
The big downside of this approach are that the human factors are tricky nowadays. Developers expect to be allowed to poke at the tables directly, and don't necessarily take kindly to alternative ways of framing things. Especially if they've been raised on a diet of code-first object relational mapping. And there's not really a great way to enforce this boundary, because, unlike HTTP, an ODBC session will generally let you pretty much anything you want to the remote service's implementation details.
1) changing the language you use in a server (say, from PHP to Python) is big, but changing the language you use in your database (from SQL to anything else) is even more intimidating. If you are integrating through the database, this limitation matters more.
2) you need to have DBA's who are not only highly competent, but also have good people skills, since they will often be saying "no", or at least "not that way", and if they don't know how to do that in a constructive manner then it becomes a net productivity drain. Fortunately, where I worked that used this pattern, the DBA's had exceptional people skills as well as technical skills. This is, I am led to believe, not always the case.
This is bringing back old (bad) memories of the times where I was debugging stored procedures that called triggers that called the same stored procedures.
A lot of that was due to poor design and bad choices. Some of that was due to developers trying to fit processes and patterns into a language, i.e. SQL, that lacked the expressiveness for it.
I'm not a fan of this approach to say the least.
I'd be interested in hearing other's experiences with this though.
And I'm going to check out Fauna since it looks like a cool database and to see if anything has changed with this approach since I encountered it almost a decade ago.
One huge difference today is that you can actually unit-test user-defined functions and stored procedures using the same tooling you use for your application code. That lends itself naturally to integration testing, and so on.
I wouldn't necessarily recommend taking it to the property-based testing extreme like I did just to see whether it worked (it does).
Can you give an example of how this is possible, or what tools/services it's possible with? I can understand how it would work on UDFs, but not entirely sure how it would be possible for stored procedures, esp when they're often times doing something more complex than simple read operations.
It was a fairly limiting approach, performance wasn't great, and diagnosing problems was a total PITA.
Afterwards I did an integration using an antiquated version of BizTalk, which was all based on COM and to this day remains my most loathed software ever. It was way too complex, the UI was crap, it was flakey as hell, and ridiculously slow. Later versions got better, but it was a very low bar.
Later I used a Java-based Apache project to do an integration - I forget the name. It was actually really good, easy to use but also allowing full control.
Later still, I did some integrations using custom C# code based around WWF (Windows Workflow Foundation) - the UI in Visual Studio was horrendously slow and flakey, it was too complex to build your own pipeline segments, and difficult to diagnose issues in production. But it did work fairly well for simple stuff.
And then later still I used PowerCenter - I was the architect, so wasn't that hands on, but the PowerCenter guys liked it, and we ended up adopting it as our standard integrations tool, with hundreds of integrations across the company.
But I do not see why the product itself is different to serverless DBs such as SupaBase and PlanetScale?
If you use Firebase as an ephemeral message bus it's a great pattern. It has problems if you use it like a traditional database because migrations are very tricky. DBs that support views (or GraphQL) can make migrations much easier
[1] https://firebase.googleblog.com/2013/03/where-does-firebase-...
A lot of complicated practices that we’ve adopted to overcome the challenges of distributed systems just disappear when you have fast, strongly consistent transactional writes on a single data store. Either a thing happened or it didn’t. No need to wait around and see.
This matters even more as applications move from single failover to multi-region to edge computing with hundreds of locations of compute. How do you get consistent state out to each PoP?
You don’t, you integrate through the database.
Personally I haven't noticed anything resembling a come back, but I'm certainly using them more than ever... and I'm loving it.
It SUCKS.
Let me count the ways.
- You can't update a table schema without updating and deploying a bunch of applications in sync (even if that's part of a stored proc). Which for something like this, means that getting out of this situation is WAY harder than getting into it.
- You end up putting WAY too much logic into the DB which makes it hard to ultimately figure out WHAT is supposed to happen
- DBs have TERRIBLE development stories. That's ultimately because the code and data all live in the same place and you can update code without any sort of revision control to help you understand or see a change that's been made to db schema (Forcing a bunch of painful process around updating DB capabilities).
- DBs are resource bottlenecks that SUCK to figure out how to scale out. Putting a bunch of apps into one DB complicates that process. Scaling a single app in a single DB is simply WAY easier.
- At least my db (but I assume a bunch of other DBs) have really crappy performance diagnostic tools. Further, the more complicated the queries against it, the more likely you are to go from "Hey, stats are making things fast" to "OMG, why is this thing taking 10 seconds to run now!". It's really bad when the only solution that seems to fix things is dumping stats.
I could MAYBE see something like this for a macro service dedicated to a domain, but I'd never build a complex system like this from scratch. Colocating apps in the same DB would have to be for some crazy performance reasons why bypassing a microservice makes sense. An exception, not the regular course of action (And I'd still hate it :) )
1. Schemaless/document/schema-on-need databases like Fauna don't mandate the application breakage on every change that SQL does
2. It's hard to reason about if its not transparent, but it can be transparent now, see below
3. Fauna is a temporal database, which acts like version control on your stored procedures, so you can easily check and revert any change
4. Fauna is serverless and horizontally scalable without consistency/latency impact
5. This was definitely a problem when you were occupying precious CPU cores on a vertically scaled RDBMS with business logic, but compute in Fauna or in serverless lambdas scales horizontally indefinitely
Anyway using REST for inter services communication decreases performance and increases latency and 99% of projects still do it.
But I rather use RPC for communication.