I would also add that the primary sell for Memgraph seems to be “fast enough that it can process data as it comes in via a stream, and present it to the user in a reasonable timeframe”. Anyone facing this use-case would want to use Memgraph regardless of how much faster it is than Neo4j.
* Memgraph's benchmark only show SQL ~where clauses, not graph ones
* (nor streaming ones)
* The existing memgraph numbers are questionable, and if the competitor tuned, who knows
* The memgraph team refuses to use community-defined graph benchmarks for these articles.. so we won't know
* Memgraph uses weird patterns like doing bulk loads as a query stream of atomic singleton creations vs batching (csv, arrow, ...), so even if it was graph/streaming, a proper benchmark would show tools going way faster b/c the relevant task would instead be for csv/arrow/etc bulk loaders or some other form of micro/macro batching
It's not just this article but the others too. It's frustrating to watch the memgraph leaders take their VC money and dump it into a big negative campaign lying about basically anyone in the community. They even spend money punching down at academics doing OSS. I haven't been this annoyed at a seemingly real tech company in a long time.
The workload and software used to benchmark are public on Github, which means they can be validated and tested. Memgraph as a company is committed to improving Memgraph and benchmarking further. That's why, in addition to other reasons, we raised funding. We have made no false statements and our findings are replicable. Everything, Memgraph source code + benchmark methodology, is public.
Benchmarks are always workload dependent and we always encourage people to test on their workload. The workload in the benchmark closely resembles the ones our customers have most often (mixed highly concurrent read/write with real-time analytics), and we perform well on it. Our default Snapshot Isolation consistency level further enables a vast class of applications to be built on top of our system which would simply break due to the weak consistency guarantees of legacy graph databases. That's precisely the reason why our customers choose us. You should always test on your workload because your mileage may vary and Memgraph might not be the right fit for you.
The main reason Memgraph is performing that much better is that Neo4j Community Edition 5.0 is limited for anybody in terms of how it uses available resources. On the other side, Memgraph Community (equivalent offering, it's not 100% the same, but it's closest to compare, no two systems are the same) does not restrict the performance of our public offering, and that's also something we want to highlight as just one of Memgraph's competitive advantages. So, all this is about comparing offerings rather than the underlying tech. Even if you take Neo4j Enterprise (which Max did, on completely different hardware, which is... "creative"), Memgraph has an advantage.
Isn’t Neo4j written in Java and Memgraph written in C++ (with lots of Python extensibility)? By that alone I would think Memgraph would be more performant most of the time, unless Memgraph is poorly-written/optimized vs Neo4j, which is very possible.
I work on the “R&D” team for my company so we spend a lot of time researching and building PoC apps. I did one with Memgraph a few months ago after concluding it ought to outperform Neo4j, however I did not build the app with Neo4j to do a side by side comparison of performance. Both support Cypher so I wasn’t attached to one or the other, but I’ve always liked the idea of using in-memory stuff (like RAMDisk) to achieve extreme performance, and I figured at worst Memgraph would be “as fast” as Neo4j… that is 100% an assumption though and assumes that Memgraph is well-written. It sounds like it’s not though.
I'm not a neo4j expert, and am not paid to write this. That said, their GDS subengine from the last couple of years appears to be distributed in-memory, essentially a view, and their year-over-year improvements there have been substantial. There might be no difference at the checkbox level. Likewise, when we did billion-scale work here with a variety of common queries, we found that the existence of basic features like indexes quickly changed what was fast vs slow. Historically, C++ vs Java is often < 2X of a difference, so when we're talking parallel & distributed hardware with tricky query planners & data representations... I have many questions beyond the language. If they were targeting something like FPGAs, I might feel differently.