undefined | Better HN

0 pointslolive5y ago0 comments

Performance issues are a very valid discussion. But to me, the availability of a graph-oriented query language on top of this graph variant of SQLite is, imho, the very first step to investigate. (RDF import/CSV import being next)

0 comments

szarnyasg5y ago

There has been a lot of progress on creating standardized query languages for graphs. The two most notable ones are [2]:

- SQL/PGQ, a property graph query extension to SQL is planned to be released next year as part of SQL:2021.

- GQL, a standalone graph query language will follow later.

While it is a lot of work to design these languages, both graph database vendors (e.g. Neo4j, TigerGraph) and traditional RDBMS companies (e.g. Oracle [2], PostgreSQL/2ndQuadrant [3]) seem serious about them. And with a well-defined query language, it should be possible to build a SQL/PGQ engine in (or on top of) SQLite as well.

[1] https://www.linkedin.com/pulse/sql-now-gql-alastair-green/

[2] http://wiki.ldbcouncil.org/pages/viewpage.action?pageId=1062...

[3] https://www.linkedin.com/pulse/postgresql-oracle-graph-query...

beaconstudios5y ago

have SPARQL and Gremlin not seen adoption as standard graph traversal languages? They're the two names that spring to mind when I think "graph querying".

loliveOP5y ago

I second that. I have not followed the news about the Gremlin-to-SPARQL (or SPARQL-to-Cypher) bridge. But afaiu, making your graph system Gremlin-compatible is a first step in the right direction. (And yes, doing that on top of a SQL backend sounds not that natural).

szarnyasg5y ago

Both SPARQL and Gremlin have been adopted to some extent. SPARQL is a W3C standard and Gremlin is reasonably well-specified (it has good documentation and a reference implementation), so it's possible to implement a functionally correct SPARQL/Gremlin engine with a reasonable development effort.

Gremlin's main focus is defining traversal operations on property graphs. While it supports pattern matching [1], IMHO its syntax is not as clean as Cypher's. Gremlin queries are also difficult to optimize: while it is possible to define traversal rewrite rules, they are more involved than relational optimization rules. The fact that most open-source Gremlin implementations are focusing on distributed setups (e.g. a typical deployment of Titan/JanusGraph runs on top of Cassandra) has also implications on single-machine performance, which certainly did not help the adoption of Gremlin -- but this is not necessarily the problem of the query language. Overall, Gremlin is great for workloads where complex single-source traversal operations do the bulk of the work but it's less well-suited to global pattern matching queries such as the ones in the LDBC Social Network Benchmark's BI workload [2].

SPARQL focuses on the graph problems of the "semantic web" domain, which include not only pattern matching but semantic reasoning/inferencing. One can use it for pattern matching queries but with the following caveats:

- Its data model is based on triples so if one wants to return a node and its attributes (properties), one has to specify each of these attributes explicitly.

- On the execution side, returning these attributes might necessitate executing a number of self-join operations.

- Many SPARQL implementations also have performance limitations due to the extra complexity introduced by self-joins, lack of intra-query parallelism, etc.

The "RDF* and SRARQL* approach" is an initiative to amend the self-join problem by introducing nested triples in the data model. It's currently being worked on by a W3C community group [3]. SPARQL also has "property paths", which allows regular path queries, i.e. traversals where the node/edge labels confirm some regular expression (the "property" in "property paths" has nothing to do with "property graphs").

SQL/PGQ and GQL target the property graph data model and support an ASCII-art like syntax for pattern matching queries (inspired by Cypher). They also offer some graph traversal/shortest path operations (including shortest path and regular path queries). Additionally, GQL supports returning graphs so it's queries can be composed.

[1] https://en.wikipedia.org/wiki/Gremlin_(query_language)#Decla...

[2] https://ldbc.github.io/ldbc_snb_docs/workload-bi-reads.pdf

[3] https://blog.liu.se/olafhartig/2019/01/10/position-statement...

j / k navigate · click thread line to collapse

0 comments

szarnyasg5y ago

There has been a lot of progress on creating standardized query languages for graphs. The two most notable ones are [2]:

- SQL/PGQ, a property graph query extension to SQL is planned to be released next year as part of SQL:2021.

- GQL, a standalone graph query language will follow later.

[1] https://www.linkedin.com/pulse/sql-now-gql-alastair-green/

[2] http://wiki.ldbcouncil.org/pages/viewpage.action?pageId=1062...

[3] https://www.linkedin.com/pulse/postgresql-oracle-graph-query...

beaconstudios5y ago

have SPARQL and Gremlin not seen adoption as standard graph traversal languages? They're the two names that spring to mind when I think "graph querying".

loliveOP5y ago

szarnyasg5y ago

- Its data model is based on triples so if one wants to return a node and its attributes (properties), one has to specify each of these attributes explicitly.

- On the execution side, returning these attributes might necessitate executing a number of self-join operations.

- Many SPARQL implementations also have performance limitations due to the extra complexity introduced by self-joins, lack of intra-query parallelism, etc.

[1] https://en.wikipedia.org/wiki/Gremlin_(query_language)#Decla...

[2] https://ldbc.github.io/ldbc_snb_docs/workload-bi-reads.pdf

[3] https://blog.liu.se/olafhartig/2019/01/10/position-statement...

j / k navigate · click thread line to collapse