Who ranks better? Memgraph vs. NetworkX PageRank | Better HN

Who ranks better? Memgraph vs. NetworkX PageRank | Better HN

19 comments

rkwz3y ago

NetworkX is widely understood to be slower compared to other graph libraries - you tradeoff performance for ease of use.

How does it compare to igraph, graph-tool etc - https://graph-tool.skewed.de/performance

Wait, graph-tool is finally on Python 3? I loved graph-tool. Even did a conference talk on it once. But for years it was Python 2.7 only, so much that I finally gave up on ever using it again.

For me, NetworkX is good for exploration, but shouldn't be used for anything serious in production. Once you sorta figure out what you want to do there are so many other things you should use before NetworkX that may take longer to get the data in, and may be more annoying to query, but are ultimately more suited to the task.

It's hard to even compare these different tools directly because they solve different problems. For example, if data integrity was paramount I'd just use Postgres and recursive querying / intelligent partitioning. If speed were important, Memgraph or sparse matrices in Numpy. If the data were large documents with tag like relationships, some sort of document store (used Riak last time around, but that was over ten years ago).

shoo3y ago

similarly, see also the benchmarks in https://www.timlrx.com/blog/benchmark-of-popular-graph-netwo...

those benchmarks measure the julia lightgraphs library computing pagerank 429x to 26x faster than networkx, on larger graphs than the one featured in this advertisement.

katelatteOP3y ago

Yeah, but this is not an official benchmark, it’s just a simple demo on sample dataset. It would take much more effort to create the whole benchmark to prove how much exactly Memgraph is faster and on what kind of workload.

katelatteOP3y ago

But again, thanks for sharing. This is also a valuable resource and reference for future comparisons.

katelatteOP3y ago

Yes, I agree. I read this comparison in performance, it's a really good resource, thanks for sharing. It all depends on what are your needs of course. But C++ implementation of graph-tool definitely wins the battle.

This.

I made a mistake in using networkx in my research extensively. Eventually, I discovered performance issues and learned that networkx is much slower than alternatives (as much as an order of magnitude slower!).

zwaps3y ago

It seems you are also counting the filling of nx digraph from memgraph. You use a list op to add nodes, then python iterate over a list of nodes and use the nx list operation to add all edges.

And then you dont even use the numpy or scipy implementation of networkx. If inmemory, benchmark networkx by loading a sp matrix and use the numpy pagerank and only time that?

Like, it’s not as if anyone uses networkx for performance, so dunking on it for that is probably not as good of a marketing post as it seems.

But then also check your implementation bc this surely is the slowest way to use networkx and then having only five times speed up seems little. Doesn’t igraph or julia beat properly implemented networkx by much more?

Look, if it weren’t an advertisement I would not say anything, but it seems you compare your new performance car with the networkx family van, which is also maybe filled with concrete.

katelatteOP3y ago

I considered different ways of comparison here, and decided to go with a simple comparison on sample dataset, just to get a feel of it. I did consider doing it all in Python, but then it’s not fair towards Memgraph. Also, it depends on the query we are performing. I could have run a much more complicated query which would give better results, but then again it wouldn’t be fair. If I removed the time counted for filling the digraph, then just the pure algorithm time would be calculated, and the main difference between NetworkX and Memgraph is that Memgraph offers persistance, while NetworkX always has to load the graph into memory. It can be further discussed what would be the best way to do a true benchmark and on what kind of dataset. I did not go into details of the graph type here, but there are for sure cases where Memgraph outperforms NetworkX on much higher scale and on certain graph types. I didn’t claim that we are 5 times faster in any case, just in this certain case. When I do a proper benchmark in the future, I will make sure to be as fair as possible to both sides, and of course to showcase better when to use Memgraph, and when NetworkX, since it all depends on your needs.

Also, thanks for reading it, it means a lot to hear such comment. I get to learn from it too :)

zwaps3y ago

Firstly, congratulations for working on a very interesting project and thanks for the insights.

Indeed, there are some points in your reply which I think would fit better to a networkx vs. memgraph comparison post.

That being said, your blogpost is titled "Who ranks better?" and it is mainly about the speed of running a PageRank.

Networkx is a no frills Python package that is much easier to use and experiment on. Outperforming networkx in speed is not really a feat, however any new network package should certainly do this. And further, do this with networkx on equal footing. For instance, igraph outperforms networkX in pageRank by 20 times, and graph-tool by over 50 times (without load times)!

And I am sure memgraph can do so as well, just that this blog post doesn't seem to conclusively demonstrate that fact.

It would make little sense to me to use networkx as a tool to load data from memgraph. And to be honest, using this triple (quadruple) Python list operation and not even use the numpy-based performance that networkx does offer (little as it may be) just doesn't seem right.

I understand that memgraph has other advantages, like persistence, however in that case networkX is simply not a good comparison. If that's the focus, why not query a local Neo4j? That's gonna be a pretty speedy PageRank as well and an interesting challenge.

All in all, I am sure Memgraph performs great, and I am looking forward to other comparisons in the future!

katelatteOP3y ago

Who ranks better was my word play, because of PageRank :') But yes, I totally agree with what you wrote and I will aim for more detailed comparisons in the next articles. And you are right, NetworkX is easy to use Python library, and I wanted to show that Memgraph is also easy to use and Python friendly, as well as fast. It also has a bunch of popular graph algorithms already implemented, so if anyone is working with graphs and NetworkX, the performance gain of Memgraph introduced in this article may be useful to them. Regarding Neo4j, wait for it ;)

nerdfaktor423y ago

This is a very interesting benchmark! I'd like to add a library to the comparison, which has recently been released.

It's called graph-mate (https://pypi.org/project/graph-mate/) and it's a wrapper around a Rust implementation for parallel graph algorithms (https://crates.io/crates/graph) that a friend and I work on.

We created a Jupyter notebook that loads the same Wikipedia articles and runs page rank with the same parameters: https://github.com/s1ck/graph/blob/d0a45116b1891daa39f493647... We converted the memgraph dataset to an edge list: https://gist.github.com/s1ck/97e23af14b2e117fa47c713addef751.... To reproduce, put the dataset next to the notebook.

On my machine (Ryzen 3950x, 64GB RAM) ..

  memgraph takes 1285 ms
  graph_mate takes 3.347 ms

I have not looked into the C++ implementation of page rank in memgraph. The Rust implementation is a pull-based Page Rank which is not uncommon in parallel graph libraries. See https://github.com/s1ck/graph/blob/036cda61e1bbf2d9ccb2b0e46... for details.

graph-mate and the Rust graph project are pretty young, we don't support that many algorithms, yet. But if people are interested in contributing, feel free to open a PR / issue :)

katelatteOP3y ago

Well done on your work! It's nice to see new tools being developed in graph world, especially in Rust.

I would just like to emphasize that there is a big difference between a graph database and graph algorithms library. This is a thing that has to be taken into account. Besides that, real life use cases usually include dynamic data. That's the reason why Memgraph holds a set of dynamic graph algorithms. For example, we implemented dynamic PageRank algorithm [1] which is the approximation of PageRank carrying the same information as the results of PageRank - the likelihood of random walk ending in a particular vertex. In use cases such as credit card fraud detection, dynamic graph algorithms are of a huge importance to make important decisions as fast as possible. Besides that, we have implemented a set of modules built on top of NVIDIA cuGraph [2] which provides a set of wrappers for most of the algorithms from the cuGraph repository. With GPU-powered graph analytics from Memgraph you can explore huge graphs databases and make decisions without long waits for the results. [3]

[1] https://memgraph.com/docs/mage/query-modules/cpp/pagerank-on... [2] https://memgraph.com/docs/mage/query-modules/cuda/cugraph [3] https://developer.nvidia.com/blog/running-large-scale-graph-...

katelatteOP3y ago

I continued learning about NetworkX, and when it comes to issues with scaling and the need for persistence when working on applications in production, Memgraph saves the day. You can see the previous discussion at https://news.ycombinator.com/item?id=33463472.

mikkom3y ago

Are you working for memgraph perhaps? It certainly seems so based on your posts.

https://news.ycombinator.com/submitted?id=katelatte

Interesting how you got so many upvotes for this content that basically seems a lot like advertisement for memgraphs algorithm.

katelatteOP3y ago

Yes, I work for Memgraph, I am a developer there and I wrote this, and all of the previously published articles. I was comparing NetworkX to Memgraph algorithms, since that was the point of the whole article. I am mostly using Python in my day-to-day job and I love what they did with NetworkX. This article was influenced by many people who use NetworkX and are a part of Memgraph community. I just wanted to see how much of a difference does the underlying C++ implementation of Memgraph makes. Since I work with Python tools and Memgraph every day, and talk with a bunch of people working on graph analytics, it makes sense to compare by myself and get the facts right.

rkwz3y ago

Thanks for the context.

Offtopic, does Memgraph have something similar to NetworkX's connected components [1]? Wondering what's the performance difference between both for different sizes of graphs.

[1] https://networkx.org/documentation/stable/reference/algorith...

j / k navigate · click thread line to collapse