How does it compare to igraph, graph-tool etc - https://graph-tool.skewed.de/performance
For me, NetworkX is good for exploration, but shouldn't be used for anything serious in production. Once you sorta figure out what you want to do there are so many other things you should use before NetworkX that may take longer to get the data in, and may be more annoying to query, but are ultimately more suited to the task.
It's hard to even compare these different tools directly because they solve different problems. For example, if data integrity was paramount I'd just use Postgres and recursive querying / intelligent partitioning. If speed were important, Memgraph or sparse matrices in Numpy. If the data were large documents with tag like relationships, some sort of document store (used Riak last time around, but that was over ten years ago).
those benchmarks measure the julia lightgraphs library computing pagerank 429x to 26x faster than networkx, on larger graphs than the one featured in this advertisement.
I made a mistake in using networkx in my research extensively. Eventually, I discovered performance issues and learned that networkx is much slower than alternatives (as much as an order of magnitude slower!).
And then you dont even use the numpy or scipy implementation of networkx. If inmemory, benchmark networkx by loading a sp matrix and use the numpy pagerank and only time that?
Like, it’s not as if anyone uses networkx for performance, so dunking on it for that is probably not as good of a marketing post as it seems.
But then also check your implementation bc this surely is the slowest way to use networkx and then having only five times speed up seems little. Doesn’t igraph or julia beat properly implemented networkx by much more?
Look, if it weren’t an advertisement I would not say anything, but it seems you compare your new performance car with the networkx family van, which is also maybe filled with concrete.
Also, thanks for reading it, it means a lot to hear such comment. I get to learn from it too :)
Indeed, there are some points in your reply which I think would fit better to a networkx vs. memgraph comparison post.
That being said, your blogpost is titled "Who ranks better?" and it is mainly about the speed of running a PageRank.
Networkx is a no frills Python package that is much easier to use and experiment on. Outperforming networkx in speed is not really a feat, however any new network package should certainly do this. And further, do this with networkx on equal footing. For instance, igraph outperforms networkX in pageRank by 20 times, and graph-tool by over 50 times (without load times)!
And I am sure memgraph can do so as well, just that this blog post doesn't seem to conclusively demonstrate that fact.
It would make little sense to me to use networkx as a tool to load data from memgraph. And to be honest, using this triple (quadruple) Python list operation and not even use the numpy-based performance that networkx does offer (little as it may be) just doesn't seem right.
I understand that memgraph has other advantages, like persistence, however in that case networkX is simply not a good comparison. If that's the focus, why not query a local Neo4j? That's gonna be a pretty speedy PageRank as well and an interesting challenge.
All in all, I am sure Memgraph performs great, and I am looking forward to other comparisons in the future!
It's called graph-mate (https://pypi.org/project/graph-mate/) and it's a wrapper around a Rust implementation for parallel graph algorithms (https://crates.io/crates/graph) that a friend and I work on.
We created a Jupyter notebook that loads the same Wikipedia articles and runs page rank with the same parameters: https://github.com/s1ck/graph/blob/d0a45116b1891daa39f493647... We converted the memgraph dataset to an edge list: https://gist.github.com/s1ck/97e23af14b2e117fa47c713addef751.... To reproduce, put the dataset next to the notebook.
On my machine (Ryzen 3950x, 64GB RAM) ..
memgraph takes 1285 ms
graph_mate takes 3.347 ms
I have not looked into the C++ implementation of page rank in memgraph. The Rust implementation is a pull-based Page Rank which is not uncommon in parallel graph libraries. See https://github.com/s1ck/graph/blob/036cda61e1bbf2d9ccb2b0e46... for details.graph-mate and the Rust graph project are pretty young, we don't support that many algorithms, yet. But if people are interested in contributing, feel free to open a PR / issue :)
I would just like to emphasize that there is a big difference between a graph database and graph algorithms library. This is a thing that has to be taken into account. Besides that, real life use cases usually include dynamic data. That's the reason why Memgraph holds a set of dynamic graph algorithms. For example, we implemented dynamic PageRank algorithm [1] which is the approximation of PageRank carrying the same information as the results of PageRank - the likelihood of random walk ending in a particular vertex. In use cases such as credit card fraud detection, dynamic graph algorithms are of a huge importance to make important decisions as fast as possible. Besides that, we have implemented a set of modules built on top of NVIDIA cuGraph [2] which provides a set of wrappers for most of the algorithms from the cuGraph repository. With GPU-powered graph analytics from Memgraph you can explore huge graphs databases and make decisions without long waits for the results. [3]
[1] https://memgraph.com/docs/mage/query-modules/cpp/pagerank-on... [2] https://memgraph.com/docs/mage/query-modules/cuda/cugraph [3] https://developer.nvidia.com/blog/running-large-scale-graph-...
https://news.ycombinator.com/submitted?id=katelatte
Interesting how you got so many upvotes for this content that basically seems a lot like advertisement for memgraphs algorithm.