Show HN: Simple-graph – a graph database in SQLite (opens in new tab)

(github.com)

237 pointsdpapathanasiou5y ago42 comments

42 comments

Interesting. SQLite is awesome.

I did something similar recently, a block store for a rust implementation of ipfs, which models a directed acyclic graph of content-addressed nodes.

https://github.com/actyx/ipfs-sqlite-block-store

I found that performance is pretty decent if you do almost everything inside SQLite using WITH RECURSIVE.

The documentation has some really great examples for WITH RECURSIVE. https://sqlite.org/lang_with.html

abetlen5y ago

The issue I found with WITH RECURSIVE queries is that they're incredibly inefficient for anything but trees. I've looked around and there doesn't seem to be any way to store a global list of visited nodes. This means that when performing a traversal of the graph the recursive query will follow all paths between two nodes.

hansvm5y ago

I'm pretty sure you could also maintain a temp table and use some kind of "insert...where...returning" construct to squeeze that into a recursive query.

At a moderate overhead you could also definitely return all seen nodes and a flag to identify them as such as part of your intermediate data at each recursive step.

The postgres query optimizer struggles with recursive queries even when well suited to the problem though. Are they actually efficient in sqlite even for trees?

2 more replies

rekwah5y ago

Tangentially related to graph dbs, but if you're looking for more hierarchical support, SQLite does has a transitive closure extension[0] that might be of some assistance. I leveraged this back in 2014 to write our framework-agnostic result storage on AWS Device Farm.

[0] - https://www.sqlite.org/cgi/src/artifact/636024302cde41b2bf0c...

[1] - https://charlesleifer.com/blog/querying-tree-structures-in-s...

abetlen5y ago

I've actually been working on an extension to perform breadth first search queries in SQLite on general graphs [0]. The extension is actually based off of the transitive closure extension. You can use it on any existing SQLite database as long as you can wrangle your edges into a single table (real or virtual) and the node ids are integers (I'm planning on removing this constraint in the future).

[0]: https://github.com/abetlen/sqlite3-bfsvtab-ext

kevas5y ago

Why not add that functionality directly to SQLite via stored procs*

https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...

*https://github.com/wolfch/sqlite-3.7.3.p1

pstuart5y ago

A more recent effort: https://cgsql.dev/

claytongulick5y ago

I just read through the docs of this, what an amazing project.

I'm considering doing a js template string implementation for node.. cql`...` type thing with an internal compilation cache.

kevas5y ago

Thanks

loxias5y ago

I really like this, OP. I'm member of the clan "why are you creating your own XDR, just use sqlite!!" and have oft jumped to that in technical discussions, so appreciate it.

However, what's lacking from something like this is a detailed bill of the cost. I'd love to see some, any benchmark on a DB with > 10^6 edges to see how it goes. That's the other hand of the equation "just use sqlite and be happy" -- the expectation that performance will actually be reasonable.

ptrik5y ago

Similar project: https://github.com/CodyKochmann/graphdb

bjornsing5y ago

How does this perform compared to a “native” graph database like Neo4J?

batmansmk5y ago

It really depends on what you want to do with it.

I would benchmark the tasks "traversal", "aggregation" and "shortest past" for a 10k to 10M node graph. Anything under 10k would be good enough with most techs and over 10M need to consider more tasks (writes, backup, the precise fields queried can become their particular problems at larger scale).

The Github link implements "traversal "in Python instead of pure SQLite. I suspect it will be around x10 slower than it could be with the same tech stack, because it queries once per node from Python to SQLite. Shortest path is not implemented and would be too slow to be useful in an interactive environment. "Aggregation" is also not implemented, but it would perform admirably, because SQL is good at that.

Traditional relational OLTP databases such as Postgres are already faster than dedicated graph databases for certain graph related tasks, according to this benchmark: https://www.arangodb.com/2018/02/nosql-performance-benchmark...

szarnyasg5y ago

> Traditional relational OLTP databases such as Postgres are already faster than dedicated graph databases for certain graph related tasks

It is indeed quite common that relational databases outperform graph databases on certain graph processing problems such as subgraph queries (a.k.a. graph pattern matching). There are two key reasons for this: (1) most graph pattern matching operations can be formulated using relational operations such as natural joins, antijoins, and outer joins; and (2) relational databases have been around longer and have well-optimized operators.

A lot of the value that graph databases provide lies in their query languages which (for most systems) allow formulating path queries using a nice syntax (unlike SQL's WITH RECURSIVE which many people find difficult to read and write). Their property graph data model supports a schema-optional approach, which makes them better suited for storing semi-structured data. They also "provide efficient programmatic access to the graph, allowing one to write arbitrary algorithms against them if needed" [1].

With all these said, graph databases could be much faster on subgraph queries than relational databases and there are recent research results on the topic (worst-case optimal joins, A+ indexes, etc.). But these are not available in any production system yet.

[1] http://wp.sigmod.org/?p=1497

chrisweekly5y ago

> "shortest past"

shortest path typo, right?

1 more reply

lsb5y ago

Neo4j has failed queries I have written, with "out of memory" errors. I have never, ever, ever gotten that from SQLite.

lolive5y ago

Performance issues are a very valid discussion. But to me, the availability of a graph-oriented query language on top of this graph variant of SQLite is, imho, the very first step to investigate. (RDF import/CSV import being next)

1 more reply

linkdd5y ago

Isn't it like asking "how does sqlite perform compared to databases like PostgreSQL" ?

SQLite is used a lot on edge (mobile apps, ...), sounds like this project provide a graph database for the very same use case (I probably won't run Neo4J on mobile).

afavour5y ago

IMO it’s a different question. SQLite and Postgres are both relational databases, it stands to reason that they’re at least doing things in similar ways. They’re two implementations of the same idea(ish). A graph database is something else altogether. Grafting that capability onto a relational database has the potential to perform horribly.

It’s a bad analogy, but SQLite to Postgres is like AMD vs Intel x86 CPUs, whereas a graph database is ARM. Can it be emulated? Yes. Is there a far greater potential for slowdown? Yes.

joelschw5y ago

I think the big difference here is that when comparing SQLite you are at least using the same query language.

In the graph space you have Gremlin, Cryper, GQL and many other proprietary query engines (which also looks to be the the case here).

Without that accessibility this feels a bit like pickling a NetworkX object.

szundi5y ago

Then maybe no such questions are necessary.

tasogare5y ago

It depends on how the graph is stored in the database. In this project the nodes ids are TEXT so it will likely not scale very well. I know because I use a similar implementation with GUID as string in Sqlite in a project since a couple of years and while it works fine for the graph I have (<1 million nodes, few edges per nodes) it won’t perform too well past that.

alexisread5y ago

To some extent I think it depends on what data you're storing in the graph ie. If it's temporal data using a ulid instead of a guid speeds things up significantly (30x for large data) as your ids are not as fragmented.

https://github.com/schinckel/ulid-postgres/blob/master/ulid....

mosselman5y ago

Thanks for the info. Do you happen to have some stats to share?

lolive5y ago

I wonder if there are ways, in SQLite, to build indices for s,p,o/s,p/p,o/ and maybe more subtle ones... That would be uber nice, given the fact that most graph databases have their own indexing strategies, and you cannot craft your own.

JimmyRuska5y ago

I saw this lecture some time back on the topic of implementation and tradeoffs https://www.youtube.com/watch?v=Dxwo9DYWV_c

westurner5y ago

rdflib-sqlalchemy is a SQLAlchemy rdflib graph store backend: https://github.com/RDFLib/rdflib-sqlalchemy

It also persists namespace mappings so that e.g. schema:Thing expands to http://schema.org/Thing

The table schema and indices are defined in rdflib_sqlalchemy/tables.py: https://github.com/RDFLib/rdflib-sqlalchemy/blob/develop/rdf...

You can execute SPARQL queries against SQL, but most native triplestores will have a better query plan and/or better performance.

Apache Rya, for example:

> indexes SPO, POS, and OSP.

mark_l_watson5y ago

Thanks for your comment. I use rdflib frequently but have never tried the SQLAlchemy back end. Now I will. That said, Jena or Fuseki, or the commercial RDF stores like GraphDB, Stardog, and Allegrograph are so much more efficient.

roland_nilsson5y ago

Isn't the whole point of graph databases that they can traverse graph edges efficiently by following pointers to nodes, which relational databases can't do? Then it seems a bit strange to implement a graph database on top of a relational database like SQLite?

tpoacher5y ago

Tongue in cheek answer, but: No. That is the whole point of "efficient" graph databases. The point of a "graph database" in the more general sense is simply to be a database that uses a graph paradigm.

This is a silly pedantic point to make, but it is not necessarily trivial. E.g. it may be the case that a particular use-case scneario does not require massive efficiency, and has a lot to gain from the simplicity of sqlite. In which case this kind of project is an amazing thing to exist.

And if there is a way to get a valid benchmark comparison against a more traditional "efficient" graph database, then informed decisions can be made.

As a personal anecdote, a friend and I based a graph-based project on neo4j and were very happy ... until it was time to deploy. We then realised the installations involved were highly complex, rarely supported on traditional webhosts, and costs involved for adopting 'formal' commercial solutions were highly prohibitive. Had we known about this project at the time we would have definitely used it instead (at least as a proof of concept; you can always switch to a more efficient database later if you really have to)

joshspankit5y ago

Just a quick side question: Why not deploy with Docker?

My latest API+multiple frontends application uses Neo4j as the only database and we deployed with Docker (compose) with great success. With the config in git we were able to do the traditional test-new-versions-on-a-branch-before-deploy and everything is solid.

1 more reply

brian_herman5y ago

Awesome! And you can write SQL queries on the data amazing SQLite is the best database.

j / k navigate · click thread line to collapse

42 comments

rklaehn5y ago

Interesting. SQLite is awesome.

I did something similar recently, a block store for a rust implementation of ipfs, which models a directed acyclic graph of content-addressed nodes.

https://github.com/actyx/ipfs-sqlite-block-store

I found that performance is pretty decent if you do almost everything inside SQLite using WITH RECURSIVE.

The documentation has some really great examples for WITH RECURSIVE. https://sqlite.org/lang_with.html

abetlen5y ago

hansvm5y ago

I'm pretty sure you could also maintain a temp table and use some kind of "insert...where...returning" construct to squeeze that into a recursive query.

At a moderate overhead you could also definitely return all seen nodes and a flag to identify them as such as part of your intermediate data at each recursive step.

The postgres query optimizer struggles with recursive queries even when well suited to the problem though. Are they actually efficient in sqlite even for trees?

2 more replies

rekwah5y ago

[0] - https://www.sqlite.org/cgi/src/artifact/636024302cde41b2bf0c...

[1] - https://charlesleifer.com/blog/querying-tree-structures-in-s...

abetlen5y ago

[0]: https://github.com/abetlen/sqlite3-bfsvtab-ext

kevas5y ago

Why not add that functionality directly to SQLite via stored procs*

https://www.amazon.com/Hierarchies-Smarties-Kaufmann-Managem...

*https://github.com/wolfch/sqlite-3.7.3.p1

pstuart5y ago

A more recent effort: https://cgsql.dev/

claytongulick5y ago

I just read through the docs of this, what an amazing project.

I'm considering doing a js template string implementation for node.. cql`...` type thing with an internal compilation cache.

kevas5y ago

Thanks

loxias5y ago

I really like this, OP. I'm member of the clan "why are you creating your own XDR, just use sqlite!!" and have oft jumped to that in technical discussions, so appreciate it.

ptrik5y ago

Similar project: https://github.com/CodyKochmann/graphdb

bjornsing5y ago

How does this perform compared to a “native” graph database like Neo4J?

batmansmk5y ago

It really depends on what you want to do with it.

szarnyasg5y ago

> Traditional relational OLTP databases such as Postgres are already faster than dedicated graph databases for certain graph related tasks

[1] http://wp.sigmod.org/?p=1497

chrisweekly5y ago

> "shortest past"

shortest path typo, right?

1 more reply

lsb5y ago

Neo4j has failed queries I have written, with "out of memory" errors. I have never, ever, ever gotten that from SQLite.

lolive5y ago

1 more reply

linkdd5y ago

Isn't it like asking "how does sqlite perform compared to databases like PostgreSQL" ?

SQLite is used a lot on edge (mobile apps, ...), sounds like this project provide a graph database for the very same use case (I probably won't run Neo4J on mobile).

afavour5y ago

It’s a bad analogy, but SQLite to Postgres is like AMD vs Intel x86 CPUs, whereas a graph database is ARM. Can it be emulated? Yes. Is there a far greater potential for slowdown? Yes.

joelschw5y ago

I think the big difference here is that when comparing SQLite you are at least using the same query language.

In the graph space you have Gremlin, Cryper, GQL and many other proprietary query engines (which also looks to be the the case here).

Without that accessibility this feels a bit like pickling a NetworkX object.

szundi5y ago

Then maybe no such questions are necessary.

tasogare5y ago

alexisread5y ago

https://github.com/schinckel/ulid-postgres/blob/master/ulid....

mosselman5y ago

Thanks for the info. Do you happen to have some stats to share?

lolive5y ago

JimmyRuska5y ago

I saw this lecture some time back on the topic of implementation and tradeoffs https://www.youtube.com/watch?v=Dxwo9DYWV_c

westurner5y ago

rdflib-sqlalchemy is a SQLAlchemy rdflib graph store backend: https://github.com/RDFLib/rdflib-sqlalchemy

It also persists namespace mappings so that e.g. schema:Thing expands to http://schema.org/Thing

The table schema and indices are defined in rdflib_sqlalchemy/tables.py: https://github.com/RDFLib/rdflib-sqlalchemy/blob/develop/rdf...

You can execute SPARQL queries against SQL, but most native triplestores will have a better query plan and/or better performance.

Apache Rya, for example:

> indexes SPO, POS, and OSP.

mark_l_watson5y ago

roland_nilsson5y ago

tpoacher5y ago

And if there is a way to get a valid benchmark comparison against a more traditional "efficient" graph database, then informed decisions can be made.

joshspankit5y ago

Just a quick side question: Why not deploy with Docker?

1 more reply

brian_herman5y ago

Awesome! And you can write SQL queries on the data amazing SQLite is the best database.

j / k navigate · click thread line to collapse