We’re pretty happy with SQLite and not urgently interested in a fancier DBMS (opens in new tab)

(beets.io)

397 pointssamps9y ago146 comments

146 comments

I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

I've seen it so many times over the years regarding users and email configurations. I can't count the number of times I've dropped into someone's badly behaving mail configuration and found they had MySQL hosting the users, and explained it was for "performance" reasons. Somehow they didn't grasp that /etc/passwd fits entirely in memory, and the map files Postfix uses for various lookups and stuff are already a database (just one specifically designed for the task at hand) and also fit entirely in memory. Putting that MySQL layer in there is disastrous if performance matters; it is orders of magnitude slower for any case I've seen...still plenty fast for most cases, but it's ridiculous that this idea gets cargo-culted around that if you store your mail users in MySQL your mail server will be faster.

A little knowledge is a dangerous thing, is what I'm trying to say, and people who know MySQL is "fast" may not know enough to know that it's not the right tool for the job in a lot of cases...and is probably slower for many use cases. I'm pretty confident this is one of those cases. SQLite is wicked fast on small data sets, and being smaller means more of it will fit in memory; I can't think of any way MySQL could be a more performant choice for this workload.

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

spudlyo9y ago

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

It's really not that hard to imagine. There is an embedded library version of MySQL called libmysqld which is made for this very purpose. Of course as you point out it's quite overkill for something that would work perfectly well with Berkeley DBs.

creshal9y ago

I've never seen it used like that, but Microsoft's equivalent SQL Server Express is a nightmare in deployment and support, because you need to administer a full-blown SQL Server on every customer PC (backups, migration debugging, …). I've no idea why people don't use SQLite instead.

3 more replies

Sami_Lehtinen9y ago

I've done also some tests, and so far in most of cases SQLite3 has been fastest option and it's also trivial to manage compared to other options. SQLite3 is absolutely awesome. Until you'll need concurrent writes. It's also important to notice that configuration options will make a big performance difference. http://www.sami-lehtinen.net/blog/sqlite3-performance-python... http://www.sami-lehtinen.net/blog/database-performance-tests... http://www.sami-lehtinen.net/blog/sqlite3-performance-testin...

Annatar9y ago

SQLite is a practical option if concurrent writes are not required. It is not meant for that, and that is also one of the reasons why it is both fast and simple, since without concurrency, a whole set of really complex problems goes away. SQLite is great when the communication and arbitration between two services takes place at the application level, as is the case with DNS AXFR requests, for example, and each of those services may have its own private data store. In such a situation, an immediate benefit is realized by having a domain specific, uniform data manipulation tool (SQL) without having to write custom data manipulation code. This is especially well suited to scenarios where configuration management is performed via OS packaging.

ThisIs_MyName9y ago

Yup, I did some benchmarking (for a webapp which runs a separate process for each customer) and SQLite had the highest TPS and lowest memory usage by a large margin.

Probably because it runs in the same address space as each server process and there is no message passing (TCP/unix socket) overhead.

rdtsc9y ago

> I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.

Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.

Programmers love programming tools / frameworks / languages. There are so many new shiny things out there. New ones pop up every month. New language, new build system, new db, new UI library, new message queue etc. Interstingly over the years, lots of those have had great performance improvements. Maybe they took advantage of new kernel API or some new algorithm and so on. That feeds the hype -- put new shiny things in and performance doubles.

So people have been trained to expect all the new things to just magically be better. But like said, you have to probe and find out the reason they propose the change, only then you can only see if it is justified or not (one of the resonable responses could be "I've actually measured with a production load and it does look better"),

thawkins9y ago

Its mostly about concurrency, sqllite serialises all writes, for an embedded database in a single user application that is ok, but for a web application that may have a lot of concurrent writes going on its a performance disaster area. Note that for good perfomamce on a write heavy web application even mysql can be a problem unless you use innodb. As myisam has the same write table locking behaviour.

shermanyo9y ago

I'm a little ignorant of the situation on the sqlite side, so this may be a dumb question...

Say you have an MVC web app, could you have asynchronous calls to a thread performing the actual DB writes? From your app's perspective, concurrent writes would be placed into a queue that performs them serially.

Are there any functional issues to this approach? Thanks in advance for your thoughts or info.

1 more reply

mianos9y ago

So many times I see people handing out a root password to edit /etc/password. Sure, it may be crazy for performance but, even a trivial number of users and aliases are more easily managed with a remote mysql client. Everyone that has used postfix in a production environment knows the pain otherwise, the rest feel free to vote down.

SwellJoe9y ago

There are practically countless ways to delegate management of mailboxes and users without granting root. That's not even a hard problem to solve, and certainly not one that justifies introducing a hugely complex additional variable to the equation.

Anyway, I'm not saying "never use MySQL for mail users" (though, I think the percentage of deployments where it makes sense is closer to none than it is to one), I'm just trying to make the point that MySQL is, in some folks minds, a magical solution to performance problems. Often, it not only introduces needless complexity, it won't even improve performance. It's a classic example of "when all you have is a hammer, everything starts to look like a nail". MySQL is a very fine hammer. It just isn't the right tool for every job.

The article we're talking about is another case where a little knowledge is a dangerous thing. A desktop app serving one user with a tiny data set (as I understand it, we're talking about the metadata for a person's music collection) is exactly the right workload for SQLite. I'd be shocked if a naive port to MySQL were faster (though they acknowledge that there's room for query optimization), and not at all surprised if it were slower. And, I know it'll require more memory and disk space for the same working set.

eru9y ago

> So many times I see people handing out a root password to edit /etc/password.

Shouldn't user groups give a bit more accountability there?

SixSigma9y ago

The next step up is userdb though, not MySql

creshal9y ago

Why MySQL and not LDAP?

ajuc9y ago

> Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

Amarok music player on KDE linux desktop environment (at least on version 3.5) kept the music collection metadata in database. You could choose sqlite, mysql or postgress during installation if I remember correctly. It worked ok, because of the package system.

I guess if you used separately installed databases there would be some conflicts, though.

INTPenis9y ago

Not faster perhaps but with sqlite3 I had locking issues and after I switched to mysql I don't. It's really that simple. The app was used by a small office of 40 people at first, then started being used by close to a hundred people and that's when the locking errors began.

SwellJoe9y ago

That's a pretty big difference in scale. The app in the original article seems to be a desktop application, not a client/server system with many users. There's plenty of use cases where MySQL (or PostgreSQL) is a great choice. Yours is probably one of them.

leo_santagada9y ago

Were you using wal mode with sqlite? It helps a lot with locking.

1 more reply

kyriakos9y ago

Kodi media player uses mysql and runs on most platforms. As an end user mysql is completely invisible

dragonne9y ago

No, Kodi uses SQLite internally (I've fixed garbage in its databases with the sqlite CLI tool before). It does also support external MySQL, though.

Annatar9y ago

Wow, the article is such a fresh breath of air, primarily because the author demonstrates common sense.

He (they?) picked SQLite for all the correct reasons:

- best tool for the job for their situation;

- write-light and read-heavy;

- zero configuration;

- easy to embed;

- understanding that optimizing queries by far gives the best performance in the shortest amount of time.

As an aside, I'm currently using SQLite for Bacula and Postfix, and it's a joy to use; the only drawback I found so far is lack of REGEXP REPLACE in the SQL dialect which the database supports (must be loaded with .load /path/to/lib/libpcre.so, but it is not part of the language). I used the Oracle RDBMS for my PowerDNS deployments, but in retrospect, the way PowerDNS works, SQLite would have been an even better match. All in all, it is great to read that someone picked it for all the correct reasons, rather than some fashion trend, as is often the case in computer industry.

chjj9y ago

Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.

In my bitcoin implementation, as an experiment, I tried storing the blockchain in sqlite, postgres, and leveldb. I gathered up a bunch of data from the first ~200k blocks of the blockchain and benchmarked all three databases. I queried for something like 30,000 utxos out of a set of a couple million. What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.

Now, you can tell me I did the benchmarks wrong, and "oh, if you just did this it would be faster!", but 30+ seconds is slower to an absolutely insane level. Needless to say, I went the key-value store route, but I was still astounded at how slow sqlite was once it got a few million records in the database.

I actually like sqlite, but when you know you're going to be dealing with 70gb of data and over 10 million records, preemptive optimization is the key. If I were the author, I would consider switching to postgres if there are over 500k-1m records to be expected. That being said, if they're partial to sqlite, SQLightning (https://github.com/LMDB/sqlightning) looks pretty interesting (SQLite with an LMDB backend).

edit: To clarify, these weren't particularly scientific benchmarks. This was me timing a very specific query to get an idea of the level of data management I was up against. Don't take my word for it.

panic9y ago

What operations were you doing in your benchmark? What was SQLite actually doing (CPU?, disk I/O?) while taking 30 seconds to finish? This would be a lot more useful and less FUDdy with more detail.

chjj9y ago

I'll clarify here: these weren't very scientific benchmarks, as I alluded to. So, don't take my word for it. I didn't measure ops per second, I didn't take into account system load or cpu usage, etc. This is all anecdotal evidence. I'm not saying this an announcement that "you shouldn't use sqlite". I measured the time of one query.

It was a benchmark I ran just to personally give me a general idea of what I was up against in terms of data management. This is primarily why I didn't post them in the first place. This was several months ago. I'll try to find the code that ran them and put them up on github if I can, but I doubt it will be that useful without the actual data, which isn't easy to upload.

So yes, to clarify, don't take my word for it. This is just my experience.

1 more reply

mappu9y ago

I've found sqlite's performance on bulk insertions to be massively (100x) improved by wrapping the bulk insertion in a transaction.

fsync()ing a couple hundred thousand individual INSERTs isn't fast.

coleifer9y ago

Batching inserts, wrapping the batches in transactions, waiting til after to add indexes: best way to go.

Sami_Lehtinen9y ago

Journaling mode and sync modes etc, also make a big difference: http://www.sami-lehtinen.net/blog/sqlite3-performance-testin...

lomnakkus9y ago

Those optimizations also apply to PostgreSQL.

1 more reply

orf9y ago

> What took 300-400ms in leveldb took 1.6 seconds in postgres (on the repl. in my actual node it would have taken longer due to deserialization of the utxos). What took 1.6 seconds in postgres took over 30 seconds in SQlite.

I'm sorry to say but you're almost certainly doing something wrong then.

chjj9y ago

It's possible, but it was a very simple database layout. Pretty hard to screw up. Then again, I screw up simple things all the time. But whatever I screwed up would likely have been duplicated on the postgres side since the sqlite and postgres setups were nearly identical. Postgres performed fine compared to sqlite. It was a bit behind leveldb, but I figure the overhead of flushing the data to a socket has an added cost. Leveldb has the advantage of being in the same process.

If I remember right: Querying for leveldb was iteration from [20-byte-hash][lo-utxo-id] to [20-byte-hash][hi-utxo-id] and subsequent lookups on the keys containing the actual data. The sql table setup was `id` as a char primary key and `address` as a char with an index. The query was a simple `select data from utxos where address = ?`. The actual `data` blob was on average maybe 60 bytes in size.

Maybe there was something I could have done better there. I'm not sure. This was just my experience. This is all beside the point anyway. My point was whatever database was better, it was worth testing each one, and I don't consider it to be premature optimization.

2 more replies

ww5209y ago

I'm curious to know how come Sqlite performed so bad on couple millions records.

Can you share what the table schema is? What the queries are? And what indices built for the Postgres and Sqlite tables?

ams61109y ago

With relational databases, you absolutely can see subsecond to 30+ second ranges for the same query on the same data, if you don't have proper indexing or stats on your tables.

voiper19y ago

Indeed! An index can be the difference between a 2-5minute full table scan and a subsecond query.

1 more reply

int_19h9y ago

While we're speaking of SQLite; one thing that has little exposure that could probably use more is that it now ships with Windows as a system DLL:

https://engineering.microsoft.com/2015/10/29/sqlite-in-windo...

Between that, and packages readily available on most Linux and BSD distros out there (and, in most cases, installed by default), it's well on its way to become a de facto standard system API for relational storage.

creshal9y ago

It's amusing how Microsoft's different departments keep trying to kill off each other. They spent so much time trying to shove SQL Server Express down everyone's throat, and now everyone gets SQLite included instead.

int_19h9y ago

SQLite is not a competitor to SQL Express, because the latter is still an out-of-proc server. It is a competitor to SQL CE (which shipped its last release to date in 2011, so...).

The nice thing about SQLite is how little it assumes about the world outside. That made it easy to run in WinRT application sandbox with minimal changes; and for quite a while, it was the only local DB readily available to WinRT code written in C# or C++ (JS got IndexedDB).

adzm9y ago

Also, Visual Studio's Intellisense replaced the SQL server compact single database file format backend with sqlite!

c-smile9y ago

If to speak about desktop applications then any embedded DB will be unbeatable.

So I am speaking about embeddable DBs here.

Konstantin Knizhnik have implemented impressive set of various embedded DBs: http://garret.ru/databases.html

Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.

In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).

With the DyBase in script you have features similar to MongoDB (noSQL free-form DB) but without any need for ORM and DAL - you can declare some root object as be persistable and access those data trees as if they are JavaScript objects. The engine pumps objects from DB into memory when they are needed:

  var storage = Storage.open(...);
  var dataRoot = storage.root; // all things inside are persitable
  dataRoot.topics = []; // flat persistable list
  dataRoot.topics.push({ foo:1, bar:2 }); // storing object
  /* create indexed collection with string keys, keys can be unique or not */ 
  dataRoot.titles = storage.createIndex(#string);

DyBase has Python bindings too.

[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application

[2] TIScript - http://www.codeproject.com/Articles/33662/TIScript-Language-...

[3] DyBase - http://www.garret.ru/dybase.html

niftich9y ago

I was unfamiliar with this project and assumed it was a hosted service at first. Not so, this is a local application, so an embedded database makes sense.

It took until the very last paragraph for the blog post to make that point.

petercooper9y ago

FWIW, I've run SQLite in a few production sites (low 6 figure pageviews per month) and it has worked fantastically. If you understand and work with the limitations, it really is amazing how much you can get out of it.

I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.

spriggan39y ago

> I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment

For the same reasons Wordpress hasn't moved to Postgres or doesn't include feature X,Y or Z. Wordpress has to remain relatively stable. A change in the database means breaking Wordpress ecosystem, as many plugins add tables to the DB. There is very little abstraction when it comes to Wordpress API and its interaction with the DB. Wordpress doesn't ship with an ORM.

StavrosK9y ago

What are the limitations? I love SQLite, and have vaguely heard that it has issues with concurrency, but what, exactly? I use it on production for www.tithess.gr and it seems to be working beautifully, no concurrency problems whatsoever there.

What problems should I be expecting in a multi-access scenario? I've never had that question answered adequately.

8 more replies

ben_jones9y ago

Two questions:

Wouldn't a "full" RDBMS like Mysql/Postgres offer a ton of benefits over SQLite (like the features to handle edge cases as they arise) to the point where, even if SQLite would work, SQLite still wouldn't be the ideal choice?

Does wordpress completely abstract the database or do third-party plugins use their own interfaces to the database, in which case a migration to SQLite would break a lot of them?

1 more reply

stevekemp9y ago

Yeah I've written many small websites which, such as a dynamic DNS service[1], which use SQLite as their sole storage.

Providing there aren't too many updates at once, such that locking becomes a problem, it works really really well.

[1] - http://dhcp.io/

sampsOP9y ago

Thanks; I've added a phrase to the intro section to mention that beets is a desktop app.

dudul9y ago

Agreed. Not that the 3 items are not relevant, but the fact that it's a local application sounds to me like the most decisive argument in favor of an embedded DB.

emodendroket9y ago

Presumably the target audience is people asking why his application does not use MySQL, and not people who have never heard of it at all.

Feneric9y ago

SQLite also does remarkably well with recovering from all manner of power loss / crashes / worst case scenarios. We created a "power loss" rig just to test this facility for one particular system. Really SQLite's biggest weakness is concurrency, and if your app needs that in any serious amount you probably ought to look elsewhere. If you're just dealing with occasional concurrency though SQLite shouldn't be dismissed out-of-hand.

eudox9y ago

The breadth and depth to which SQLite is tested[0] is both admirable and inspiring.

It's not just "internal" tests, like unit tests or things that can be run from client C programs, but tests of all kinds of hard-to-simulate external situations like out-of-memory and power failure situations.

[0]: https://www.sqlite.org/testing.html

nickpsecurity9y ago

That is simply amazing write-up. I'd still not call it a high-assurance system given some things missing. Yet, the amount of rigor in the testing phase could easily be the baseline for that niche in that it exceeds about anything I've seen. There's basically so little I could suggest on code or testing side that I'm not going to even bother here given how marginal it would be due to effort already put in. Just too well-done for mere speculations.

I also noted that the assert section essentially does Design-by-Contract. This is a subset of formal specification that's prime value is in preventing interface errors (vast majority in big apps) and supporting formal verification. Done in design/spec phase in high-assurance since both Edsger Dijkstra and Margaret Hamilton independently discovered technique's value. Done at language-level since Eiffel due to Bertrand Meyer. Good to see that, even if not all techniques, they're doing the 80/20 rule to get most benefit out of what formal specs they're using. Also allow you to easily enable run-time checks if you can accept performance hit. Nice.

jtakkala9y ago

I gained a newfound respect for SQLite recently after reading Dan Luu's post on file consistency (http://danluu.com/file-consistency/). I had always thought of SQLite as a bit of a toy database, but having read that post I was surprised by how rigorously it appears to have been developed.

corysama9y ago

Reading that post makes me want to investigate LMDB as an alternative "competition for fopen()". I'd be very interested if anyone had opinions/experience on that idea.

1 more reply

strcat9y ago

It works very well for concurrent readers and can handle one concurrent writer when set to WAL mode. Saying that it's no good at concurrency is selling it short. It's blazing fast as long as the use case doesn't require multiple concurrent writers. In fact, it will be faster than client-server databases when you aren't hitting a weak point like that, since it has no IPC overhead.

omarforgotpwd9y ago

For any database that isn't huge, a library embedded into your application is going to be faster than anything that has to communicate with a server over a socket connection. Though both execute SQL queries, SQLite is completely different than relational database servers and appropriate many places where running a full RDBMS is not. For example, you can't run MySQL or Postgres on the iPhone, but you can use SQLite.

greenleafjacob9y ago

I've found SQLite absolutely amazing for getting the power of SQL for R data frames.

int_19h9y ago

Could you expand on some of your use cases? An R data frame, by definition, has to fit into memory, so it would seem that any sort of map/filter/group/fold operation would be fastest if performed in-memory, as well. And I assumed that e.g. joining data frames (where you would run out of memory really quickly if your datasets are large to begin with) would be uncommon... am I wrong?

1 more reply

IndianAstronaut9y ago

The set up is quite interesting. The sql data frames library will embed a dataframe in sqllite and let you use sql. It has been quite useful in transitioning our SQL data analysts to R.

chillacy9y ago

> we’re read-heavy and write-light

> we have almost no indices, no principled denormalization

Sounds like an easy win. People are probably suggesting a database switch because they're finding issues with the current speed, but they're not using their current database to its full potential yet.

creshal9y ago

Is the bottleneck even the database in the first place? From the article, that doesn't seem to be the case:

> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.

Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.

johnny229y ago

lots of people use beets to query their music to build playlists. those cases don't require network access.

takeda9y ago

This appears to be a desktop application, so SQLite seems like a good fit. Although if some people are requesting it perhaps they have their reasons. For example if I had RDBMS already setup with proper backups I would generally prefer to use that, since there's no effort to make yet another allocation use it. It most likely would also enable me to have access to the same databases from multiple computers. With SQLite you would have to solve the same problems again (e.g. store the database in Dropbox account, and deal with shortcomings of that approach)

IMO instead writing how SQLite is the best choice for the project I think it would be better to add support for multiple backends. Something that is good for you and majority of users does not mean it's good for everyone.

Animats9y ago

If they don't need indices, almost anything will work. Few people have a big enough music library that the data won't fit in memory.

jwatte9y ago

Sqlite is fine for small scale systems. It is not a "web scale" database, but not every web site is "web scale." SQLite does have performance limits, and will break at certain load, but until that, it's okay. For single user databases, like desktop applications, SQLite is awesome! What the others bring to the table is concurrent sever performanc, user management, and such. There's nothing surprising about this, right?

cyberferret9y ago

I'm a long time user and lover of SQLite, since way back when. Use it in a lot of our projects (web and Win32) that require local databases for logging etc.

Sure for larger or concurrent user access to a db, we use other databases, but nothing beats the 'zero footprint' install of SQLite. I even enjoyed the fact that earlier versions of SQLite forced me to think optimise my queries due to the lack of nested SELECTs.

SQLite still kind of reminds me of my early days in MySQL. I was recently trying to downgrade MySQL 5.7 to 5.6 after a slew of issues, which forced me to reminisce about how simple things used to be when MySQL was a lot like SQLite still is now...

tracker19y ago

I find that Firebird has been pretty nice as well, with the option to move from embedded to server.

talideon9y ago

And much overlooked. It doesn't get half the love it deserves. While it could do with some work around making the sysadmin experience a bit better and have a way of aliasing short 'public' database names to the file-system level database, it's quite a good RDBMS.

1 more reply

nickysielicki9y ago

Beet is an awesome program, you should really check it out if you still are among the minority of people who actually have a music collection and don't rent access from spotify/itunes/etc.

I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.

pskisf9y ago

You're doing it right for your application! MySQL or PostgreSQL would most probably be slower and introduce a lot more overhead as they are client/server oriented systems. Don't listen to those armchair architects!

oppositelock9y ago

I don't get the point of this article. SQLite is fine, especially in an embedded database, but once you have concurrent access, it starts to suffer because it has very coarse grained locks, so a "real" database is better for a distributed single-DB design. It's more about using the right tool for the job, and the author seems to be talking himself out of some kind of guilt for SQLite being the right tool for him.

andrewflnr9y ago

I think you're misreading the post. They're just trying to explain why sqlite is the right tool for their particular job, so they don't have to keep explaining it to other people who keep trying to shoehorn in the wrong tool. Lack of concurrency in their app is specifically mentioned as a reason sqlite is appropriate. There's no guilt there.

sampsOP9y ago

Yep! This suggestion comes up shockingly often, and I got tired of re-explaining my reasoning every time.

jedberg9y ago

I can't blame them. I've been a huge fan of SQLite for years. Anytime I need storage it's my default choice unless there is a specific reason to use something else.

Another nice advantage of it is if you are distributing something that requires a small dataset[0][1]. If I give you both the code and the data already imported into a sqlite database, then you can use the code right away, or you can dump the data to a different database very easily.

[0] https://github.com/jedberg/wordgen

[1] https://github.com/jedberg/Postcodes

zaphar9y ago

What people actually seriously suggest that a desktop application needs more than sqlite offers in the way of databases?

Desktop apps are like the sweet spot for sqlite. It's practically made for them.

qwertyuiop9249y ago

Suggesting you add a server dependancy to your desktop app as a solution to a problem that isn't there is pretty braindead.

tedmiston9y ago

I have used SQLite for similar use cases, but occasionally it's led to a corrupted db. I had a cron task writing to it once a day, but an issue with the scheduler led to 2 tasks one day with the latter one finishing before the former.

Of course I can add locking or something in my code, but I'd prefer to handle at a lower level — for example, have SQLite take the latest write without corrupting. I'm hoping someone has solved this problem with SQLite elegantly.

dietrichepp9y ago

SQLite is supposed to handle multiple concurrent writes by returning SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption problem, like power loss at a bad moment.

kentonv9y ago

SQLite is designed to handle power loss at bad moments (indeed, anything that calls itself a "database" ought to be resilient against power failure).

Relevant: https://www.sqlite.org/howtocorrupt.html

1 more reply

tedmiston9y ago

This is helpful; it's possible I'm assuming too much. The corruption did happen only once and was also during a time period that the server (a PaaS) was running maintenance including some downtime.

2muchcoffeeman9y ago

I may be imagining things but I seem to recall that at one point you had to build your own SQLite to get safe concurrent writes, but that was made the default years ago.

takeda9y ago

Haven't used SQLite in a while, but this is how I did it.

You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.

Now, whenever you are accessing the data do it within transactions (https://www.sqlite.org/lang_transaction.html) generally SQLite will acquire proper locks when needed (it will minimize amount of locked time by deferring locks until they are needed, but you can use immediate or exclusive modifiers. If you for example put exclusive modifier only one program will be able to read/write at the time, you generally will want to use defer (default) behavior since it still provides the safety but multiple applications still can access the database at the same time)

Also note that locking might not work if you keep the database on network file system.

franciscop9y ago

I didn't know beet, but it looks exactly like what I've been wanting for years.

partycoder9y ago

Even a file can be convenient. It's all about how you integrate it into the system.

kefka_p9y ago

Some people are unfamiliar with the phrase "right tool for the job".

As the developers behind the project, I'd have to think the authors are in the best position to make the determination about which tool is appropriate.

mgkimsal9y ago

"right tool for the job" ends up being almost meaningless unless the players involved have enough experience with multiple tools to make a useful judgement. Too often "right tool" is "whatever I've already used before". :/

kefka_p9y ago

That is true. I failed to adjust my perspective for the fact that not everyone has nearly thirty years of programming experience given I still consider myself an amateur.

Still we should recall the utility gained in using the right tool is finite and variable. In some cases can be like night and day while in other cases a moderate or negligble improvement might be all you see. Under some circumstances familiarity with a given tool may outweigh the advantages of using a better suited tool - especially if a team lacks well-rounded expertise.

Reinforcing the other side of the argument is the fact that programmers are often subject to constraints placed on them from on high that restrict the choices they might make in such.

Given the specific situation, the author seems to build a pretty good case for SQLite at least by my own limited understanding of the facts.

1 more reply

therealdrag09y ago

For everyone loving SQLite, you should consider donating to them. I remember a post this last year about the maintainers working on it full time, but making much less than most of us probably do.

ww5209y ago

I wish HTML5 storage standardized on Sqlite. The inconsistent story on HTML5 storage across browsers is kind of sad.

combatentropy9y ago

Implementing WebSQL was so much work that all the browser vendors just basically bundled SQLite and called it a day. The problem was that the working group required at least one other implementation, before it became a standard. But that was such a monstrous project that none of the browser vendors wanted to take it on.

What the browser vendors agreed on is that they would rather have IndexedDB. I've never used it, but they say it is a lower-level API than SQL, and using it you could build your own SQL abstraction layer above it.

So we have as standards the key-value-based localStorage and then, halfway between that and SQL, is IndexedDB.

More: https://hacks.mozilla.org/2010/06/beyond-html5-database-apis...

c-smile9y ago

I participated in HTML5 specification work group at W3C as Invited Expert. We had such a proposal to add SQL storage to HTML5. But it was pretty much unanimous opinion that adding specification of any SQL flavor to the HTML spec would be too much.

niftich9y ago

Yeah, basically it would've had to specify all of SQL, or to say "do what SQLite does", and neither option was very palatable to all the parties. Some browsers [1] did implement WebSQL before it was formally abandoned, though.

[1] http://caniuse.com/#feat=sql-storage

1 more reply

chmike9y ago

SQLite is Ok, but write access must be synchronized. I used it for my Flask (Python) application and was forced to switch to PostgreSQL because of synchronization problems. I would prefer sticking with SQLite which was simpler to manage.

The author doesn't say a word about synchronization when writing to SQLite.

talideon9y ago

It's intended for desktop, single-user use from the command line. In such cases, you're not going to have much of an issue with write contention.

chmike9y ago

Then SQLite is obviously a better solution than MySQL and PostgreSQL.

Ultimatt9y ago

The bigger news here is they arent using an ORM to make moving between databases trivial.

known9y ago

Firefox uses SQLite

_pgmf9y ago

Classic HN bait.

You don't even need to read the comments to know what people will say:

"SQLite is a great fit for this type of application. It's a replacement for fopen people. fopen."

"What about ALTER TABLE?"

"It's just a toy database, it doesn't even support concurrent writers"

----- "WAL mode"

"Hey, golang, rqlite"

----- "Whoa I wrote something similar for a ..."

----- "Why would you use this? Just use postgres"

"SQLite is the best database ever"

"SQLite is the worst database ever"

coleifer9y ago

I have written several "Use Sqlite!" posts that have made the rounds on hackernews... reading this watered down post, which is devoid of any new, surprising or usable info, it strikes me that repping SQLite has achieved meme status.

If you want tangible info you can actually use, read sqlites documentation. There's a wealth of information there.

Here are some of posts, for the Python crowd:

http://charlesleifer.com/blog/five-reasons-you-should-use-sq...

http://charlesleifer.com/blog/using-the-sqlite-json1-and-fts...

http://charlesleifer.com/blog/my-list-of-python-and-sqlite-r...

j / k navigate · click thread line to collapse

146 comments

SwellJoe9y ago

I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

spudlyo9y ago

Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

creshal9y ago

3 more replies

Sami_Lehtinen9y ago

Annatar9y ago

ThisIs_MyName9y ago

Yup, I did some benchmarking (for a webapp which runs a separate process for each customer) and SQLite had the highest TPS and lowest memory usage by a large margin.

Probably because it runs in the same address space as each server process and there is no message passing (TCP/unix socket) overhead.

rdtsc9y ago

> I never stop being impressed at how often people will jump to odd, unsupportable, conclusions like, "using MySQL will make this thing faster".

That is a good opportunity to ask questions and learn about the person, their knowledge and their use case.

Yeah sometimes the answer is they don't know how stuff works underneath and are just following a blog post or a trend or heard someone talk about it at meetup.

thawkins9y ago

shermanyo9y ago

I'm a little ignorant of the situation on the sqlite side, so this may be a dumb question...

Are there any functional issues to this approach? Thanks in advance for your thoughts or info.

1 more reply

mianos9y ago

SwellJoe9y ago

eru9y ago

> So many times I see people handing out a root password to edit /etc/password.

Shouldn't user groups give a bit more accountability there?

SixSigma9y ago

The next step up is userdb though, not MySql

creshal9y ago

Why MySQL and not LDAP?

ajuc9y ago

> Also, I don't even want to try to imagine shipping an installable desktop application for non-technical users that relies on MySQL!

I guess if you used separately installed databases there would be some conflicts, though.

INTPenis9y ago

SwellJoe9y ago

leo_santagada9y ago

Were you using wal mode with sqlite? It helps a lot with locking.

1 more reply

kyriakos9y ago

Kodi media player uses mysql and runs on most platforms. As an end user mysql is completely invisible

dragonne9y ago

No, Kodi uses SQLite internally (I've fixed garbage in its databases with the sqlite CLI tool before). It does also support external MySQL, though.

Annatar9y ago

Wow, the article is such a fresh breath of air, primarily because the author demonstrates common sense.

He (they?) picked SQLite for all the correct reasons:

- best tool for the job for their situation;

- write-light and read-heavy;

- zero configuration;

- easy to embed;

- understanding that optimizing queries by far gives the best performance in the shortest amount of time.

chjj9y ago

Premature optimization is evil, but preemptive optimization is necessary unless you want to paint yourself into a corner. I realized this after implementing a bitcoin full node.

panic9y ago

What operations were you doing in your benchmark? What was SQLite actually doing (CPU?, disk I/O?) while taking 30 seconds to finish? This would be a lot more useful and less FUDdy with more detail.

chjj9y ago

So yes, to clarify, don't take my word for it. This is just my experience.

1 more reply

mappu9y ago

I've found sqlite's performance on bulk insertions to be massively (100x) improved by wrapping the bulk insertion in a transaction.

fsync()ing a couple hundred thousand individual INSERTs isn't fast.

coleifer9y ago

Batching inserts, wrapping the batches in transactions, waiting til after to add indexes: best way to go.

Sami_Lehtinen9y ago

Journaling mode and sync modes etc, also make a big difference: http://www.sami-lehtinen.net/blog/sqlite3-performance-testin...

lomnakkus9y ago

Those optimizations also apply to PostgreSQL.

1 more reply

orf9y ago

I'm sorry to say but you're almost certainly doing something wrong then.

chjj9y ago

2 more replies

ww5209y ago

I'm curious to know how come Sqlite performed so bad on couple millions records.

Can you share what the table schema is? What the queries are? And what indices built for the Postgres and Sqlite tables?

ams61109y ago

With relational databases, you absolutely can see subsecond to 30+ second ranges for the same query on the same data, if you don't have proper indexing or stats on your tables.

voiper19y ago

Indeed! An index can be the difference between a 2-5minute full table scan and a subsecond query.

1 more reply

int_19h9y ago

While we're speaking of SQLite; one thing that has little exposure that could probably use more is that it now ships with Windows as a system DLL:

https://engineering.microsoft.com/2015/10/29/sqlite-in-windo...

creshal9y ago

int_19h9y ago

SQLite is not a competitor to SQL Express, because the latter is still an out-of-proc server. It is a competitor to SQL CE (which shipped its last release to date in 2011, so...).

adzm9y ago

Also, Visual Studio's Intellisense replaced the SQL server compact single database file format backend with sqlite!

c-smile9y ago

If to speak about desktop applications then any embedded DB will be unbeatable.

So I am speaking about embeddable DBs here.

Konstantin Knizhnik have implemented impressive set of various embedded DBs: http://garret.ru/databases.html

Like his POST++ has direct mapping to C++ classes so if you use C++ then you don't need any ORM.

In my Sciter[1] Engine I am using his DyBase library [3] as a bult-in persistence for Sciter's script [2] (JavaScript++).

  var storage = Storage.open(...);
  var dataRoot = storage.root; // all things inside are persitable
  dataRoot.topics = []; // flat persistable list
  dataRoot.topics.push({ foo:1, bar:2 }); // storing object
  /* create indexed collection with string keys, keys can be unique or not */ 
  dataRoot.titles = storage.createIndex(#string);

DyBase has Python bindings too.

[1] http://sciter.com - multiplatform HTML/CSS UI Engine for Desktop and Mobile Application

[2] TIScript - http://www.codeproject.com/Articles/33662/TIScript-Language-...

[3] DyBase - http://www.garret.ru/dybase.html

niftich9y ago

I was unfamiliar with this project and assumed it was a hosted service at first. Not so, this is a local application, so an embedded database makes sense.

It took until the very last paragraph for the blog post to make that point.

petercooper9y ago

I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment - WordPress and PHP seem more likely to trip over in most deployments I've seen before SQLite would.

spriggan39y ago

> I'm actually surprised WordPress hasn't ever moved over to it for ease of installation/deployment

StavrosK9y ago

What problems should I be expecting in a multi-access scenario? I've never had that question answered adequately.

8 more replies

ben_jones9y ago

Two questions:

Does wordpress completely abstract the database or do third-party plugins use their own interfaces to the database, in which case a migration to SQLite would break a lot of them?

1 more reply

stevekemp9y ago

Yeah I've written many small websites which, such as a dynamic DNS service[1], which use SQLite as their sole storage.

Providing there aren't too many updates at once, such that locking becomes a problem, it works really really well.

[1] - http://dhcp.io/

sampsOP9y ago

Thanks; I've added a phrase to the intro section to mention that beets is a desktop app.

dudul9y ago

Agreed. Not that the 3 items are not relevant, but the fact that it's a local application sounds to me like the most decisive argument in favor of an embedded DB.

emodendroket9y ago

Presumably the target audience is people asking why his application does not use MySQL, and not people who have never heard of it at all.

Feneric9y ago

eudox9y ago

The breadth and depth to which SQLite is tested[0] is both admirable and inspiring.

[0]: https://www.sqlite.org/testing.html

nickpsecurity9y ago

jtakkala9y ago

corysama9y ago

Reading that post makes me want to investigate LMDB as an alternative "competition for fopen()". I'd be very interested if anyone had opinions/experience on that idea.

1 more reply

strcat9y ago

omarforgotpwd9y ago

greenleafjacob9y ago

I've found SQLite absolutely amazing for getting the power of SQL for R data frames.

int_19h9y ago

1 more reply

IndianAstronaut9y ago

The set up is quite interesting. The sql data frames library will embed a dataframe in sqllite and let you use sql. It has been quite useful in transitioning our SQL data analysts to R.

chillacy9y ago

> we’re read-heavy and write-light

> we have almost no indices, no principled denormalization

creshal9y ago

Is the bottleneck even the database in the first place? From the article, that doesn't seem to be the case:

> The main case when beets writes to its database is on import, and import performance is dominated by I/O to the network and to music files.

Small file I/O is universally slow, even on SSDs, and if you're hitting rate-limited MusicBrainz servers for each file, database performance is almost irrelevant.

johnny229y ago

lots of people use beets to query their music to build playlists. those cases don't require network access.

takeda9y ago

Animats9y ago

If they don't need indices, almost anything will work. Few people have a big enough music library that the data won't fit in memory.

jwatte9y ago

cyberferret9y ago

I'm a long time user and lover of SQLite, since way back when. Use it in a lot of our projects (web and Win32) that require local databases for logging etc.

tracker19y ago

I find that Firebird has been pretty nice as well, with the option to move from embedded to server.

talideon9y ago

1 more reply

nickysielicki9y ago

Beet is an awesome program, you should really check it out if you still are among the minority of people who actually have a music collection and don't rent access from spotify/itunes/etc.

I'm glad to see this post; one of the reasons that I like beet so much is that everything is self-contained.

pskisf9y ago

oppositelock9y ago

andrewflnr9y ago

sampsOP9y ago

Yep! This suggestion comes up shockingly often, and I got tired of re-explaining my reasoning every time.

jedberg9y ago

I can't blame them. I've been a huge fan of SQLite for years. Anytime I need storage it's my default choice unless there is a specific reason to use something else.

[0] https://github.com/jedberg/wordgen

[1] https://github.com/jedberg/Postcodes

zaphar9y ago

What people actually seriously suggest that a desktop application needs more than sqlite offers in the way of databases?

Desktop apps are like the sweet spot for sqlite. It's practically made for them.

qwertyuiop9249y ago

Suggesting you add a server dependancy to your desktop app as a solution to a problem that isn't there is pretty braindead.

tedmiston9y ago

dietrichepp9y ago

SQLite is supposed to handle multiple concurrent writes by returning SQLITE_BUSY. I'm imagining hypothetical other causes for your corruption problem, like power loss at a bad moment.

kentonv9y ago

SQLite is designed to handle power loss at bad moments (indeed, anything that calls itself a "database" ought to be resilient against power failure).

Relevant: https://www.sqlite.org/howtocorrupt.html

1 more reply

tedmiston9y ago

This is helpful; it's possible I'm assuming too much. The corruption did happen only once and was also during a time period that the server (a PaaS) was running maintenance including some downtime.

2muchcoffeeman9y ago

I may be imagining things but I seem to recall that at one point you had to build your own SQLite to get safe concurrent writes, but that was made the default years ago.

takeda9y ago

Haven't used SQLite in a while, but this is how I did it.

You should enable WAL (pragma journal_mode=wal) I believe you need to do this on every connection.

Also note that locking might not work if you keep the database on network file system.

franciscop9y ago

I didn't know beet, but it looks exactly like what I've been wanting for years.

partycoder9y ago

Even a file can be convenient. It's all about how you integrate it into the system.

kefka_p9y ago

Some people are unfamiliar with the phrase "right tool for the job".

As the developers behind the project, I'd have to think the authors are in the best position to make the determination about which tool is appropriate.

mgkimsal9y ago

kefka_p9y ago

That is true. I failed to adjust my perspective for the fact that not everyone has nearly thirty years of programming experience given I still consider myself an amateur.

Reinforcing the other side of the argument is the fact that programmers are often subject to constraints placed on them from on high that restrict the choices they might make in such.

Given the specific situation, the author seems to build a pretty good case for SQLite at least by my own limited understanding of the facts.

1 more reply

therealdrag09y ago

For everyone loving SQLite, you should consider donating to them. I remember a post this last year about the maintainers working on it full time, but making much less than most of us probably do.

ww5209y ago

I wish HTML5 storage standardized on Sqlite. The inconsistent story on HTML5 storage across browsers is kind of sad.

combatentropy9y ago

So we have as standards the key-value-based localStorage and then, halfway between that and SQL, is IndexedDB.

More: https://hacks.mozilla.org/2010/06/beyond-html5-database-apis...

c-smile9y ago

niftich9y ago

[1] http://caniuse.com/#feat=sql-storage

1 more reply

chmike9y ago

The author doesn't say a word about synchronization when writing to SQLite.

talideon9y ago

It's intended for desktop, single-user use from the command line. In such cases, you're not going to have much of an issue with write contention.

chmike9y ago

Then SQLite is obviously a better solution than MySQL and PostgreSQL.

Ultimatt9y ago

The bigger news here is they arent using an ORM to make moving between databases trivial.

known9y ago

Firefox uses SQLite

_pgmf9y ago

Classic HN bait.

You don't even need to read the comments to know what people will say:

"SQLite is a great fit for this type of application. It's a replacement for fopen people. fopen."

"What about ALTER TABLE?"

"It's just a toy database, it doesn't even support concurrent writers"

----- "WAL mode"

"Hey, golang, rqlite"

----- "Whoa I wrote something similar for a ..."

----- "Why would you use this? Just use postgres"

"SQLite is the best database ever"

"SQLite is the worst database ever"

coleifer9y ago

If you want tangible info you can actually use, read sqlites documentation. There's a wealth of information there.

Here are some of posts, for the Python crowd:

http://charlesleifer.com/blog/five-reasons-you-should-use-sq...

http://charlesleifer.com/blog/using-the-sqlite-json1-and-fts...

http://charlesleifer.com/blog/my-list-of-python-and-sqlite-r...

j / k navigate · click thread line to collapse