Database internals are becoming less important than developer experience (opens in new tab)

(planetscale.com)

68 pointsgagejustins4y ago71 comments

71 comments

Chiron19914y ago

Friendly reminder that this post is published on the PlanetScale blog, a company that sells a database SaaS. Beware of the bias.

I personally would argue with every single point this article makes, except scalability.

hodgesrm4y ago

> I personally would argue with every single point this article makes, except scalability.

Maybe you could be more explicit about what you don't like about their ideas? I personally do like a lot of their ideas, such as the following:

> In the future, I’d expect to see a tighter coupling between the frameworks we’re using for reactive frontends – React, Vue, etc. – and the database, via hooks or otherwise.

This builds on the behavior that made MongoDB so phenomenally popular, as the article points out. Data management is pervasive in modern applications and anything that makes it easier for devs to implement is goodness.

pm904y ago

Agreed. The recent splurge in VC money for dev tools startups is going to lead to a lot more articles like this one. I hope developers read it with an eye towards that bias.

gagejustinsOP4y ago

~hello everyone, author here~

I know posts with ThOuGhT LeaDeRshIp titles like this are usually annoying, but I thought it would be interesting to write down some of the lessons I've been gathering as I've spent more time covering and using specific databases. My background is in data science / analytics with a couple of years of more traditional full stack here and there.

Broadly we've seen this pattern with infrastructure in general – it's a lot easier to set up a server than it used to be, all things considered. Now obviously if you're a tiny startup, you're more comfortable outsourcing everything to Heroku, and if you're a hyperscale enterprise, you probably want more control on exactly what your database is doing.

The thesis here is that on the tail end (hyper scale), things are getting more under control and predictable, and developers there want the same "nice things" you get with platforms like Heroku. Elsewhere in the ecosystem, more and more parts of the stack are getting turned into "simple APIs" (Stripe for payments, Twilio for comms, etc.). And perhaps most interestingly, as serverless for compute seems kind of stuck (maybe?), it may be the case that serverless for databases – whatever that ends up meaning – is actually an easier paradigm for application developers to work with.

jasonwatkinspdx4y ago

I think the article's thesis is a false and misleading dichotomy.

It's absolutely true that a low friction developer experience is necessary for a database product to be successful. But this in no way implies that database internals are being commoditized or relegated to minor importance.

Snowflake is a particularly bad example as taking a clean sheet and novel approach to internals is the very fulcrum that creates the easy developer experience.

Admittedly it's been a while since I looked at vitess, but my recollection is that it's cross shard functionality is so limited as to make claiming internals no longer matter a bit dubious.

The reason there's only a handful of spanner style systems is exactly because the internals both matter and are quite daunting to get right.

felixhuttmann4y ago

I agree. It is also amazing how different the database systems are that are competing against each other today:

Partitioning: 1) DynamoDb: Partitioning is explicit and one of the most important parts of schema design 2) Spanner, Cockroach: Database automatically partitions the key ranges. 3) Postgres: You will probably never reach the scale where you need to partition your dataset!

Transactions: 1) Spanner, firestore - no stored procedures, client-side transactions are important 2) Dynamodb: No stored procedures, no client-side transactions, only transactions where all items involved are known by primary key in advance. 3) Fauna, Supabase: Stored procedures are the way to go! You do not need application code, access your database from the client! 4) Postgres: We have everything, use what fits your particular use-case!

If database internals did not matter, why are they all doing something different and are sometimes quite opinionated about it?

redwood4y ago

I think you bring up some good points, but it's factually incorrect to state that mongodb is not ACID compliant (it has been since its 4.0 release). The way to win is not by hyperbole and lies. It's to your point by focusing on developer experience.

TekMol4y ago

As a developer, I have to say that sqlite gives me the best experience.

Everything else pales in comparison.

Create a database?

    sqlite3 mydata.db

Where is the database?

    In the current directory

How is it structured on disk?

    It's a single file

How do I backup the DB?

    cp mydata.db /my/backups/mydata.db

Do I have to install a server?

No

Do I have to configure anything?

No

During setup and deployment I usually I dabble a while with the whole GRANT *.* ON localhost IDENTIFIED BY PASSWORD or something. How do I do that with sqlite?

    It just works

Do I have to close / protect any specific ports?

    No, it's just a file

Which field types should I use for ... ?

    None. It just works.

johncolanduoni4y ago

Unless you need to do something crazy, like run two nodes of your application. But only Google needs to do that, right?

dragontamer4y ago

You don't even need to reach two nodes before SQLite becomes grossly inadequate. Even on a single node: SQLite's paradigm of global locks leads to poor performance when multiple threads write to the same table.

You could be a single-node 4-core $5/month VPS instance and run into this issue. SQLite requires "exclusive" access to a table to handle writes (meaning when writing, no other thread can be reading the table). Especially if your transactions start to become complex.

In contrast, MySQL and PostgreSQL allow for simultaneous reads while writes are occurring.

2 more replies

bob10294y ago

We've been using sqlite as our only datastore for the last 5+ years.

There are a lot of fun little tricks you learn along the way.

The most important one is to ignore all the naysayers who claim it won't scale. I can easily saturate any IO subsystem by way of SQLite.

At first you may think it's impossible, but then you start to go outside the lines a bit. You start by turning on WAL. That helps a lot... But then you want another 100x? One thing you can do is heavily abuse the fact that any SQLite db is "just a file" by opening as many as you need to in parallel. 9/10 times you can break down a typical business app into grains roughly approximating the current state of each user.

Might make regulatory compliance easier if each user is stored in their own database too. Makes removing someone's entire presence from your system 1 file deletion activity.

Tenoke4y ago

How do you aggregate the data and query among all those databases? Doesn't it incur huge costs? Does sqlite support this natively or are you basically treating it as text files of data per user and don't need complex queries in the first place?

2 more replies

DelightOne4y ago

Is there a wrapper to treat a database like its an sqlite database (a file) to simplify to the same level as Sqlite? This may at least simplify some issues. Doesn't matter whether its backed by a local database running in a container or by a remote one.

munk-a4y ago

The biggest database unification effort tends to be focused around ODBC compatibility from what I've seen - and the operations laid out by ODBC are quite trivial and easy to comprehend I think.

1 more reply

waynesonfire4y ago

calibre uses sqlite and it doesn't support hosting the db on a network drive.

https://manual.calibre-ebook.com/faq.html#i-am-getting-error...

i guess that's a bad -dev- user experience?

TekMol4y ago

You can certainly host it on a network drive if the network filesystem has the right features and behaviour.

The same goes for a local filesystem. Sqlite has certain features it requires the filesystem to have. That is independent of how that filesystem stores the data physically.

1 more reply

mamcx4y ago

Firebird is also close. With the extra capabilities that can run as server (and that is only a setup concern: the code stay same).

Also, it support more stuff (like stored procedures!) that I miss a lot on sqlite.

My only complain with firebird is that is not easy to embed into iso/android or it will my main db.

tyingq4y ago

Fully agree, but that comes with tradeoffs around concurrent writes, strong typing (though I hear that's coming), some other functionality, etc.

void_mint4y ago

This feels like kind of a silly comparison. SQLite can't do most of what makes modern databases difficult/valuable. As an example, if you replace your above example with a CSV file, you'd get the same output, right?

the_gipsy4y ago

> with a CSV file, you'd get the same output, right?

Not when you have threads.

TekMol4y ago

I describe the differences between sqlite and mysql, postgres, oracle etc that are the reason why I like sqlite best.

If for you there are specific differences that make you chose some other db, let us know.

1 more reply

jdblair4y ago

The author skips the first decade of database systems in the 1960s. The oldest databases were not relational! They were hierarchical or navigational.

Hierarchical databases were much like a filesystem, but for records instead of files.

Navigational databases allowed data to be linked in a network. Look up CODASYL for detail.

The relational database design was first proposed in the 1970s.

cl0ckt0wer4y ago

I would think that having bad internals would create a bad dev experience.

wmf4y ago

Unfortunately MongoDB proved the opposite. You don't notice the data loss and schema problems until later.

cl0ckt0wer4y ago

That is a pretty bad experience in the long run.

2 more replies

DemocracyFTW4y ago

> You don't notice the data loss

Wait, is that a feature or a bug?

seibelj4y ago

> NoSQL databases are maturing, for sure – we’re starting to see support for transactions (on some timeframe of consistency) and generally more stability. After years of working with “flexible” databases though, it has become clearer that rigidity up front (defining a schema) can end up meaning flexibility later on.

So funny to me that NoSQL boosters have only recently understood that designing sane schemas and knowing what order your data is inserted is important for data integrity. It's like an entire generation of highly paid software devs never learned fundamental computer science principles.

dragontamer4y ago

To be fair, relational algebra is hard.

That being said: going back to 1970 to read the original "A Relational Model of Data for Large Shared Data Banks" by Codd (the paper which started the relational-database + normalization model) is incredibly useful.

But yeah, all of this debate about "how data should be modeled" was the same in 1970 as it is today.

-----

SQL doesn't quite fit 100% into the relational model, but its certainly inspired by Codd's relational model and designed to work with the principles from that paper.

And strangely enough, legions of authors and teachers and courses do a worse job at explaining relational databases than Codd's original 11 page paper.

munk-a4y ago

I had a specific class on relational algebra in uni and it is up there with algorithm design and analysis in the realm of classes that actually provided me the most long term value.

Relational algebra is a lot easier once you start viewing it as relational algebra - a declarative expression of intent that can be manipulated and re-expressed similar to other purely mathematical statements. Then, when performance tuning becomes the watchword, you take that flexible expression and slice and dice it according to how the DBMS you're working with requires to align it with performance. You always want to think of your queries as complex summoning spells that draw in different necessary resources in some particular patterns and then impose an expression form on that blob of data - then you'll skate through all things SQL.

1 more reply

blacktriangle4y ago

That's exactly what it is. "Self-taught coder" really isn't the right word for what many are, as it implies some form of intentional individual study. More like "self learned to duck tape shit together thanks to Stack Overflow" but we don't have a catchy term for that.

fabian2k4y ago

Understanding a limited amount of database internals has been very useful to me. There is one aspect of using databases that you simply cannot abstract away and that is performance. If you ask your database a question in a way it is not suited to perform or that isn't supported by indexes performance is not going to be good.

And these performance differences are not small once your database has a decent size. And if you tables are really large it's not a question of fast or slow but fast enough or so slow it's indistinguishable from the database being down.

Of course to some extent you can simply throw hardware or money at the problem. This certainly works for smaller inefficiencies, but sometimes knowing the database will give you orders of magnitude better performance. Hardware and money also don't scale indefinitely.

abanayev4y ago

At some point, DBaaS systems should be able to understand and make inferences about your use cases, to the point where indexes and other performance optimizations are automated whenever you register a new query or something. This would be the new era of database systems, and as the article points out is increasingly true about all “infrastructure” concerns.

pgwhalen4y ago

One of the problems with this is that there will always be trade offs. It’s hard to imagine a database understanding the appropriate compromise between read and write speeds for your specific application, for example.

orlovs4y ago

Call me old, title theme for me deeply resonates with foundations chapters of loosing important knowledge.

waynesonfire4y ago

nothing to do with age. this is published by "PlanetScale is a MySQL compatible, serverless database platform powered by Vitess." -- they have vested interest in promoting the complexities of DBs.

In my view the author has absolutely zero basis to make such a claim.

eatonphil4y ago

> MySQL, MongoDB, Firebase, Spanner; there has literally never been a better time to be a database user at any level of complexity or scale. But there’s still one common thread (ha!) among them – the focus is on infrastructure, not developer experience.

It was my impression that everyone picked (and still picks) MySQL, MongoDB, and Firebase _because_ they were the easiest to use. It seemed like developer experience was by far the most important thing to them (compared to sane behavior initially in the case of Mongo and MySQL, some of which has since evolved).

IncRnd4y ago

> It was my impression that everyone picked (and still picks) MySQL, MongoDB, and Firebase _because_ they were the easiest to use.

I've found that to be the case, except for enterprise development, which has different concerns than how quickly code gets written to use a database.

eatonphil4y ago

Are you saying that enterprise developers pick MySQL, MongoDB, and Firebase for other reasons?

1 more reply

jeffbee4y ago

I can't imagine any scenario under which a reasonable person of at least median intelligence would perceive MongoDB as "easier to use" than Spanner, unless their entire experience with Mongo was they put one trivial JSON doc into an M0 cluster and got it back out later. Every practical aspect of MongoDB is a complete shitshow, from sharding to backup to failover.

eatonphil4y ago

With the understanding that Mongo _is_ popular I think you're making my point. Though I have no clue if most people who choose Mongo evaluate it against Spanner.

015a4y ago

"Sharding" "Backups" and "Failovers" are NOT "practical" aspects of any database. They're theoretical. Most databases are not big enough to need sharding. Most backups go unused. Most failover happens automatically, totally managed by your hosting provider.

You know what is practical? Schema design. Query language. That's what made MongoDB super popular; no schemas to worry about. A query is just '{ firstName: "John" }'.

I cannot emphasize this enough: I cannot summon even a milliliter of desire to care about whether Mongo's way of doing these things is actually "better" or "worse". But it is what made it popular.

4 more replies

askdba4y ago

I work at PlanetScale and this post aligns with my thoughts on https://askdbablog.wordpress.com/2018/04/09/learning-the-fun... whether you agree or disagree on some parts.

fmakunbound4y ago

> Database internals will eventually just not matter

Of course you need to know the internals of your database. If you've ever come across a project where the team treated a key/value, or document database as a relational one (probably because the query syntax looks similar), then you will know just how important database internals are.

tyingq4y ago

That does make some sense, especially for databases that have been around a while. Lots of internals were written with hard drives in mind rather than SSDs, much lower amounts of memory, and so on. On the other hand, it's nice when your database works well in even a very limited or old environment.

_jezell_4y ago

Looks like they let the marketing folks write tech articles again.

klysm4y ago

Given that’s pretty much the only way you can differentiate in this market, it makes sense for planetscale to believe that.

1 more reply

withinboredom4y ago

> databases will win based on superior developer experience.

I guess rethinkdb really was ahead of it's time.

j / k navigate · click thread line to collapse

71 comments

Chiron19914y ago

Friendly reminder that this post is published on the PlanetScale blog, a company that sells a database SaaS. Beware of the bias.

I personally would argue with every single point this article makes, except scalability.

hodgesrm4y ago

> I personally would argue with every single point this article makes, except scalability.

Maybe you could be more explicit about what you don't like about their ideas? I personally do like a lot of their ideas, such as the following:

> In the future, I’d expect to see a tighter coupling between the frameworks we’re using for reactive frontends – React, Vue, etc. – and the database, via hooks or otherwise.

pm904y ago

Agreed. The recent splurge in VC money for dev tools startups is going to lead to a lot more articles like this one. I hope developers read it with an eye towards that bias.

gagejustinsOP4y ago

~hello everyone, author here~

jasonwatkinspdx4y ago

I think the article's thesis is a false and misleading dichotomy.

Snowflake is a particularly bad example as taking a clean sheet and novel approach to internals is the very fulcrum that creates the easy developer experience.

Admittedly it's been a while since I looked at vitess, but my recollection is that it's cross shard functionality is so limited as to make claiming internals no longer matter a bit dubious.

The reason there's only a handful of spanner style systems is exactly because the internals both matter and are quite daunting to get right.

felixhuttmann4y ago

I agree. It is also amazing how different the database systems are that are competing against each other today:

If database internals did not matter, why are they all doing something different and are sometimes quite opinionated about it?

redwood4y ago

TekMol4y ago

As a developer, I have to say that sqlite gives me the best experience.

Everything else pales in comparison.

Create a database?

    sqlite3 mydata.db

Where is the database?

    In the current directory

How is it structured on disk?

    It's a single file

How do I backup the DB?

    cp mydata.db /my/backups/mydata.db

Do I have to install a server?

No

Do I have to configure anything?

No

During setup and deployment I usually I dabble a while with the whole GRANT *.* ON localhost IDENTIFIED BY PASSWORD or something. How do I do that with sqlite?

    It just works

Do I have to close / protect any specific ports?

    No, it's just a file

Which field types should I use for ... ?

    None. It just works.

johncolanduoni4y ago

Unless you need to do something crazy, like run two nodes of your application. But only Google needs to do that, right?

dragontamer4y ago

In contrast, MySQL and PostgreSQL allow for simultaneous reads while writes are occurring.

2 more replies

bob10294y ago

We've been using sqlite as our only datastore for the last 5+ years.

There are a lot of fun little tricks you learn along the way.

The most important one is to ignore all the naysayers who claim it won't scale. I can easily saturate any IO subsystem by way of SQLite.

Might make regulatory compliance easier if each user is stored in their own database too. Makes removing someone's entire presence from your system 1 file deletion activity.

Tenoke4y ago

2 more replies

DelightOne4y ago

munk-a4y ago

The biggest database unification effort tends to be focused around ODBC compatibility from what I've seen - and the operations laid out by ODBC are quite trivial and easy to comprehend I think.

1 more reply

waynesonfire4y ago

calibre uses sqlite and it doesn't support hosting the db on a network drive.

https://manual.calibre-ebook.com/faq.html#i-am-getting-error...

i guess that's a bad -dev- user experience?

TekMol4y ago

You can certainly host it on a network drive if the network filesystem has the right features and behaviour.

The same goes for a local filesystem. Sqlite has certain features it requires the filesystem to have. That is independent of how that filesystem stores the data physically.

1 more reply

mamcx4y ago

Firebird is also close. With the extra capabilities that can run as server (and that is only a setup concern: the code stay same).

Also, it support more stuff (like stored procedures!) that I miss a lot on sqlite.

My only complain with firebird is that is not easy to embed into iso/android or it will my main db.

tyingq4y ago

Fully agree, but that comes with tradeoffs around concurrent writes, strong typing (though I hear that's coming), some other functionality, etc.

void_mint4y ago

the_gipsy4y ago

> with a CSV file, you'd get the same output, right?

Not when you have threads.

TekMol4y ago

I describe the differences between sqlite and mysql, postgres, oracle etc that are the reason why I like sqlite best.

If for you there are specific differences that make you chose some other db, let us know.

1 more reply

jdblair4y ago

The author skips the first decade of database systems in the 1960s. The oldest databases were not relational! They were hierarchical or navigational.

Hierarchical databases were much like a filesystem, but for records instead of files.

Navigational databases allowed data to be linked in a network. Look up CODASYL for detail.

The relational database design was first proposed in the 1970s.

cl0ckt0wer4y ago

I would think that having bad internals would create a bad dev experience.

wmf4y ago

Unfortunately MongoDB proved the opposite. You don't notice the data loss and schema problems until later.

cl0ckt0wer4y ago

That is a pretty bad experience in the long run.

2 more replies

DemocracyFTW4y ago

> You don't notice the data loss

Wait, is that a feature or a bug?

seibelj4y ago

dragontamer4y ago

To be fair, relational algebra is hard.

But yeah, all of this debate about "how data should be modeled" was the same in 1970 as it is today.

-----

SQL doesn't quite fit 100% into the relational model, but its certainly inspired by Codd's relational model and designed to work with the principles from that paper.

And strangely enough, legions of authors and teachers and courses do a worse job at explaining relational databases than Codd's original 11 page paper.

munk-a4y ago

I had a specific class on relational algebra in uni and it is up there with algorithm design and analysis in the realm of classes that actually provided me the most long term value.

1 more reply

blacktriangle4y ago

fabian2k4y ago

abanayev4y ago

pgwhalen4y ago

orlovs4y ago

Call me old, title theme for me deeply resonates with foundations chapters of loosing important knowledge.

waynesonfire4y ago

nothing to do with age. this is published by "PlanetScale is a MySQL compatible, serverless database platform powered by Vitess." -- they have vested interest in promoting the complexities of DBs.

In my view the author has absolutely zero basis to make such a claim.

eatonphil4y ago

IncRnd4y ago

> It was my impression that everyone picked (and still picks) MySQL, MongoDB, and Firebase _because_ they were the easiest to use.

I've found that to be the case, except for enterprise development, which has different concerns than how quickly code gets written to use a database.

eatonphil4y ago

Are you saying that enterprise developers pick MySQL, MongoDB, and Firebase for other reasons?

1 more reply

jeffbee4y ago

eatonphil4y ago

With the understanding that Mongo _is_ popular I think you're making my point. Though I have no clue if most people who choose Mongo evaluate it against Spanner.

015a4y ago

You know what is practical? Schema design. Query language. That's what made MongoDB super popular; no schemas to worry about. A query is just '{ firstName: "John" }'.

I cannot emphasize this enough: I cannot summon even a milliliter of desire to care about whether Mongo's way of doing these things is actually "better" or "worse". But it is what made it popular.

4 more replies

askdba4y ago

I work at PlanetScale and this post aligns with my thoughts on https://askdbablog.wordpress.com/2018/04/09/learning-the-fun... whether you agree or disagree on some parts.

fmakunbound4y ago

> Database internals will eventually just not matter

tyingq4y ago

_jezell_4y ago

Looks like they let the marketing folks write tech articles again.

klysm4y ago

Given that’s pretty much the only way you can differentiate in this market, it makes sense for planetscale to believe that.

1 more reply

withinboredom4y ago

> databases will win based on superior developer experience.

I guess rethinkdb really was ahead of it's time.

j / k navigate · click thread line to collapse