Graph Databases Intrigue Me (opens in new tab)

(blog.jlank.com)

52 pointsjlank15y ago21 comments

21 comments

Graph databases intrigue me too.

The thing is, it took me a little while to realize that the "the semantic web" is a very specific model where providers are be expected to explicitly provide the semantic decoration/meta-data for all their content. http://en.wikipedia.org/wiki/Semantic_Web

I basically don't believe that this particular approach will ever work (ie, the flood-gates won't open and content providers won't suddenly label all their data). I mean, this approach has been the failed-model of hypertext since ... project Xanadu, mid-sixties (a well-tended, fully meaningful store of data).

Instead, Google and other search engines and tools will just get smarter.

We'll find more ways to incidentally get semantic information from the raw data that's out there. But no will have enough incentive to manually provide that much deep-meaning for their data themselves (and anything whose semantic meaning can be automatically processed can be put on the web for someone else to automatically process). The semantic web approach is always going to be behind the curve compared to just putting raw, unstructured data out-there.

The more uses we find for information, the more ability we'll get to extract meaning from it without the data starting out labeled.

Look at what Watson could do.

-- And I am working on a tool that extract implicit information from the process of people interacting with data. Extracting implicit, inferred and deduced relations has much more promise even if it can't rely on explicit semantic labels. This is more or less what Google does also (it's true that so-far, Google's stuff is considered "semantically meaningless" and I know Google bought Metaweb. We'll see what they get from it...)

Homunculiheaded15y ago

I've tried many times over the years to get excited about the semantic web, but invariably always arrive at the same conclusion as you. I also think that semweb really suffers from over engineering a solution in search of a problem. If you have a problem that it seems like the semantic technologies could be helpful for, you can probably solve it faster using logic programming and automated reasoning techniques from the 90s (ask the fast majority of semantic web enthusiasts how RDF triples can be represented as prolog clauses and you'll likely be treated as though you asked how a raven is like a writing desk). Worse yet though are tools like OWL, which is over-engineered to the point where out-of-the-box it describes logic in a way that is computationally intractable. If you where designing semantic technologies to solve a real problem you would never arrive at something like OWL.

I actually think something like semantic technologies could be more useful than the tools that drive watson and google for smaller data sets, where machine/statistical learning are less useful, but even then they are an over engineered solution.

colanderman15y ago

I'm having trouble understanding how exactly graph databases differ from relational databases, especially seeing as how graphs are isomorphic to relations.

All the examples on this page look equivalent to how I'd model them in SQL: http://wiki.neo4j.org/content/Domain_Modeling_Gallery

The best I gather is that graph databases are schemaless (big whoop, more room for error), and that their implementations tend to perform well with transitive closures. I'm not seeing anything that can't be solved with materialized views in an RDBMS.

I'm tempted to think that just represents NoSQL folks coming to realize that relations are actually a good thing. Soon they'll be talking about how wonderful it is to take the Cartesian product of two graphs.

Someone care to enlighten me?

yummyfajitas15y ago

Suppose you have a graph with edge weights.

Query: find all the nodes a distance < D from some reference node.

With SQL92 or earlier, this will require an unbounded number of queries. Proof: A SQL92 query can only traverse a finite number of edges, since it has only a finite number of joins (say K). Consider the graph 1->2->3->...->N with edge weights 1/N. This will require N/K queries to traverse. N is arbitrary.

It's probably doable in SQL99 and various vendor specific extensions (I think it's Turing Complete), but it is unlikely to be fast.

So, as is the normal situation, you can do whatever you want in SQL. But the performance is likely to suck.

(That said, I agree with you that many people are using NoSQL because it's the hip new thing, when they should just use SQL. "Schemaless flexibility" is BS - SQL is very flexible if you use an ORM + migration tool. The only flexibility you lose is the flexibility to screw up the data in application logic.)

colanderman15y ago

Trivial with SQL-standard recursive queries:

WITH RECURSIVE r AS (SELECT parent, child, distance FROM graph UNION ALL SELECT r.parent, graph.child, r.distance + graph.distance AS distance FROM graph, r WHERE graph.parent = r.child) SELECT parent, child FROM r WHERE distance < 3.14159;

Testing your graph with N=1000 on a teeny VPS. 477,897 rows returned in 6.1s.

If we move the WHERE clause inside the sub-SELECT (which unfortunately makes the query slightly more awkward to use), selecting lower distances reduces the compute time. For example, a bound of 0.5 returns 196,238 rows in 2.5s.

What sort of performance gains does a graph database give for this same query?

1 more reply

aristidb15y ago

If you have stored a tree with adjacence (parent-child) relations, how would you retrieve all nested children?

Or would you propose a different storage mechanism that keeps the property that reparenting entire subtrees is cheap?

(All this in your RDBMS of choice.)

colanderman15y ago

You mean all descendants of a node / transitive closure?

WITH RECURSIVE r AS (SELECT parent, child FROM tree UNION SELECT r.parent, tree.child FROM tree, r WHERE tree.parent = r.child) SELECT parent, child FROM r;

transforms this: (A,B), (B,C), (B,D)

into this: (A,B), (A,C), (A,D), (B,C), (B,D)

1 more reply

Tichy15y ago

One thing I wonder is how far do graph databases actually take you? I take the "thing" of a graph database to be to give me all links with a single access (that is all links of a node are in a single data set). On the other hand, in a classical SQL setup links would probably be modelled in a table with one row per link.

But still, even if I can get all links of a node with one access, it seems to me memory will be too scarce in most cases. Let's say I want to consider all nodes that are three steps away, and every node has 100 outgoing links. That's 100^3 nodes to consider already. Or in the second step, 100^2 nodes, so we would have to access the db 10000 times to get all nodes of the third step.

What I am getting at: it seems to me for really interesting graph computations it will usually still be necessary to create some specialized/compressed model to fit all the data into main memory.

Prasannav15y ago

I hope graph databases intrigues me sometime in the near future . I have been playing around with freebase api for sometime and i must admit that freebase does a fairly decent job in recognizing some of the entities that they promise to identify . Location is one such entity . They use a mix of NLP and a DB of locations to identify location entity . They ofcourse have a semantic relationship (city -> country -> continent for eg) between these entities that can offer u more insight . However , things like "context" of a free flowing text need to mature . They provide something called as the Social Tag for any text that u paste but sometimes it is too generic and sometimes it is far from the right context and many a times there is no social tags . So we had to kind of move away from relying on Freebase and figure our own ways . I agree to Joe's comments that extracting semantic info has to get more smarter , and relying on webmasters to provide this data is certainly not going to be scalable and achievable in the near future . google's acquisition of freebase did come as a surprise to me (considering the current capability of FB) , but their promise of providing a weekly dump meant there was a good news (not for long as we did not continue relying on FB anyways)

Looking at Watson could do , made me wonder why is the technology world so lagging behind w.r.t interpreting information .If NLP is a solved problem (looks like in Watson's case) are we only pending creating a linkage between the real world entities???. Freebase has linked 20 million but thats not enough . The approach is non profit , "good for the world" kinds . Can there be an incentive for people to provide links between entities . Can we bring up a profit model where people/organizations compete to provide more and better linkages . Or can we extract such links from peoples web activity (search , social networking etc [FB,Twitter]) .. i am getting too many ideas now :D:D ... and ofcourse can we request IBM to donate their NLP technology for the greater good :P :P

aristidb15y ago

Graph databases are probably also nice for a lot of "non-semantic" stuff.

For example, I recently wondered about the best representation for deep hierarchical data, and graph databases appear to be a good fit.

olihb15y ago

I'm curious about the performance and scalability of those database. Right now at work we have a MSSQL server with 32 CPU, SSDs and 128GB RAM and I was wondering if a graph database would be better.

Our dataset is very peculiar, we have about 20M nodes and 500M edges in one of our smallest database. We also need fulltext search and complex joins between tables.

I'm wondering if graph databases are mature enough or if they are not quite ready yet for production work. We're a small shop and software development is not our core competency.

thesz15y ago

http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-lar...

Most graph databases use BerkeleyDB under the hood. They are pretty safe to work with.

PaulHoule15y ago

Graph databases are fascinating, but as an engineer who cares about query speed, indexing, and how things interact with memory hierarchies, I'm glad to leave them to the computer "scientists", who can go ahead filling volume after volume of conference proceedings with minimal impact on the practice of computing.

colanderman15y ago

This is a pretty ignorant comment. Try reading those conference proceedings sometime and you'll realize who's responsible for your fast query speed.

j / k navigate · click thread line to collapse

21 comments

joe_the_user15y ago

Graph databases intrigue me too.

Instead, Google and other search engines and tools will just get smarter.

The more uses we find for information, the more ability we'll get to extract meaning from it without the data starting out labeled.

Look at what Watson could do.

Homunculiheaded15y ago

colanderman15y ago

I'm having trouble understanding how exactly graph databases differ from relational databases, especially seeing as how graphs are isomorphic to relations.

All the examples on this page look equivalent to how I'd model them in SQL: http://wiki.neo4j.org/content/Domain_Modeling_Gallery

Someone care to enlighten me?

yummyfajitas15y ago

Suppose you have a graph with edge weights.

Query: find all the nodes a distance < D from some reference node.

It's probably doable in SQL99 and various vendor specific extensions (I think it's Turing Complete), but it is unlikely to be fast.

So, as is the normal situation, you can do whatever you want in SQL. But the performance is likely to suck.

colanderman15y ago

Trivial with SQL-standard recursive queries:

Testing your graph with N=1000 on a teeny VPS. 477,897 rows returned in 6.1s.

What sort of performance gains does a graph database give for this same query?

1 more reply

aristidb15y ago

If you have stored a tree with adjacence (parent-child) relations, how would you retrieve all nested children?

Or would you propose a different storage mechanism that keeps the property that reparenting entire subtrees is cheap?

(All this in your RDBMS of choice.)

colanderman15y ago

You mean all descendants of a node / transitive closure?

WITH RECURSIVE r AS (SELECT parent, child FROM tree UNION SELECT r.parent, tree.child FROM tree, r WHERE tree.parent = r.child) SELECT parent, child FROM r;

transforms this: (A,B), (B,C), (B,D)

into this: (A,B), (A,C), (A,D), (B,C), (B,D)

1 more reply

Tichy15y ago

What I am getting at: it seems to me for really interesting graph computations it will usually still be necessary to create some specialized/compressed model to fit all the data into main memory.

Prasannav15y ago

aristidb15y ago

Graph databases are probably also nice for a lot of "non-semantic" stuff.

For example, I recently wondered about the best representation for deep hierarchical data, and graph databases appear to be a good fit.

olihb15y ago

I'm curious about the performance and scalability of those database. Right now at work we have a MSSQL server with 32 CPU, SSDs and 128GB RAM and I was wondering if a graph database would be better.

Our dataset is very peculiar, we have about 20M nodes and 500M edges in one of our smallest database. We also need fulltext search and complex joins between tables.

I'm wondering if graph databases are mature enough or if they are not quite ready yet for production work. We're a small shop and software development is not our core competency.

thesz15y ago

http://markorodriguez.com/2011/02/18/mysql-vs-neo4j-on-a-lar...

Most graph databases use BerkeleyDB under the hood. They are pretty safe to work with.

PaulHoule15y ago

colanderman15y ago

This is a pretty ignorant comment. Try reading those conference proceedings sometime and you'll realize who's responsible for your fast query speed.

j / k navigate · click thread line to collapse