Bullshit graph database performance benchmarks (opens in new tab)

(maxdemarzi.com)

378 pointsmaxdemarzi3y ago113 comments

113 comments

A plug: if you are looking for TPC-style application-level benchmarks for database systems, check out the LDBC Social Network Benchmark [1]. It has workloads for both OLTP and OLAP systems. We designed both of these to prevent many of the common benchmarking mistakes. To ensure that implementations follow the specification and their results are reproducible, we have a rigorous auditing process (similarly to TPC's benchmarks) [2].

[1] https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.p...

[2] https://ldbcouncil.org/benchmarks/snb/

mbuda3y ago

LDBC is really great, it is and it should be the standard non-biased benchmark to look at. The point with https://memgraph.com/benchgraph is to be more tilted towards some specific workloads, easy to extend and run, etc. Time will tell to which extend that could be achieved, but definitely a place to look for some interesting workloads and results. + the whole thing is super early and it's going to evolve!

Again, definitely take a look at LDBC benchmarks they are in general the most relevant.

mapleeman3y ago

This is indeed a good industry leading benchmark.

PreInternet013y ago

While this seems to be a pretty egregious example of a vendor benchmark misleading through cherry-picked unrealistic results, I'm not sure I share the author's pessimism about how these kinds of stunts will hold back the graph database market.

Why? Simple: pretty much any benchmark I've seen of anything, ever, was similar nonsense -- give people numbers to game and they'll do so, enthusiastically. Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.

The only way to get usable benchmark data is to run your own workloads in your own environment: everything else is pretty much noise.

LLcolD3y ago

Yes, running a benchmark on your data is the only way. I've taken a look at both benchmarks (the one from OP and the one from Memgraph). They seem like different types of benchmarks and different approaches. But I still find it interesting that although the numbers in OP's are not so much in favor of Memgraph it turns out that Memgrpah is faster than Neo4j in large number of benchmark queries. So yes, it all comes down to type of benchmark and data that you use.

I've also noticed (from OPs tweet https://twitter.com/maxdemarzi/status/1613075177704677376) that he used Enterprise version of Neo4j, but it doesn't say which Memgrpah version was used. I don't have experience with this two databases, but usually ENT versions are somewhat better than community ones.

[EDIT]: I fixed few typos.

dtomicevic3y ago

Memgraph compared the freely available open source editions of both databases. Neo4j Enterprise seems to have more performance optimizations compared to the community Edition.

makeitdouble3y ago

He didn't run the Memgraph one, only took their numbers https://news.ycombinator.com/item?id=34368362

I'd assume it's what the disclaimer at the top was about: that the code is in Python which he's not familiar enough with. The license bit on not having the right to integrate it if you're building a competing product might not be decisive but a good enough reason to not invest more time into running the code.

gymbeaux3y ago

I mean it should come as no surprise that an in-memory graph DB outperforms one that stores data on a hard disk, even an NVMe SSD.

I would also add that the primary sell for Memgraph seems to be “fast enough that it can process data as it comes in via a stream, and present it to the user in a reasonable timeframe”. Anyone facing this use-case would want to use Memgraph regardless of how much faster it is than Neo4j.

1 more reply

908B64B1973y ago

> Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.

It sets an upper bound on a server's performance given that page generation completes instantly. Sure it won't reflect real world performance, but in this case the benchmark should be read as "higher requests per second = lower resource footprint for the server".

Engineering is about being able to understand what a benchmark or measure truly means, and what useable information it contains.

mbuda3y ago

"egregious example" is unfair because there was and will be a huge effort in comparing different options. Ofc it's biased. Every single benchmark is biased towards something, but take a look at the specifications and what has actually being compared.

100% agree that everyone should run their own benchmark, that's not possible to do from the vendor perspective, it's just not possible for every single usage. Public benchmarks are something to look into if you want to get cheap info on how different vendors might do for you. Huge but is that it's just a guide, not the only piece of info you should base your decision on.

maxdemarziOP3y ago

Author here to clear up a few questions: I did not run any benchmarks for Memgraph, just Neo4j on my machine and compared them to their numbers on their machine. My 8 faster cores to their 12 slower cores, so not apples to apples, but close enough to make the point that Memgraph is not 120x times faster than Neo4j. I used to work at Neo4j, then at AWS for Neptune, I work on my own graph database http://ragedb.com/, and work for another database company https://relational.ai/

If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...

redbar0n3y ago

> If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...

Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)

Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?

Those algorithms are based on linear algebra and matrix operations on sparse matrices (which are like compressed bitmaps on speed, re: https://github.com/RoaringBitmap/RoaringBitmap ). The insight is that the adjacency list of a property-graph is actually a matrix, and then you can use linear algebra on it. But it may require the DB is built bottom up with matrices in mind from the start (instead of linked lists like Neo4j does). Maybe your double array approach in RageDB could be made to fit..

I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531

Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”

That way, for such queries one doesn't have to traverse the graph database as a discovery process through what each node "knows about", but could view and operate on the database from a God-like perspective, similar to table operations in relational databases.

Further reading: https://graphblas.org/

oxfordmale3y ago

Benchmarks are generally useless unless they test real world scenarios. The DataBricks data warehouse record costed $5,190,345 USD to run over a period of 3 years. If I spend that amount of money, I will get fired.

Such benchmarks also ignore the engineering expertise an organisation has. Do you need to be an expert to fine tune 6000 parameters or can you tune the system to an acceptable standard by reading a few blogs.

Some people pointed out the actual query only coated $242. My counter argument is that this appears to be based on buying reserved instances from AWS for 3 years. In real life this query would also run daily, or at least you would need several iterations to get the results you want.

The costs also include a super low budget laptop ($279). It is more than fine for running the query, however, you wouldn't use it a development machine. This shows these results have been heavily massaged.

dgb233y ago

Not to mention that engineering expertise is just the potential. You then also need the time and the willingness to actually do that kind of tedious and potentially slow moving work instead of all the other things on your list. And as we all know, the list of things that can be improved in any system typically grows over time.

The 'out of the box' or naive and un-optimized performance of something is the baseline. And with something as huge and self-contained as a database you want the happy path to be fine in terms of performance.

salt-shaker3y ago

I was curious about it, so I tried to figure out where you got this number from. It looks like your source is https://www.tpc.org/results/individual_results/databricks/da..., but you interpreted it wrong. The number you quoted is the projected 3-year ownership of the system configuration that was used to run the test, so the actual cost is a small fraction of the number you quoted.

oxfordmale3y ago

It is worth noting the compute cost appears to be based on purchasing reserved instances from AWS. The price of on demand instances is much higher.

The laptop is also very low budget. I am sure it is fine to run the final query, however, you would unlikely to be able to use that as a development machine.

alexott3y ago

That number should be "The total 3-year price of the entire Priced Configuration must be reported, including: hardware, software, and maintenance charges", so they just took the cost of the hardware used for benchmark, and extended it to 3 years.

If you look into the blog post: https://www.databricks.com/blog/2021/11/02/databricks-sets-o..., you will see that it costed $242

oxfordmale3y ago

Yes the final run, establishing the record, costed $242. I would love to know what the total compute costs for this project was. In real world situations, you run this query daily, or at least multiple times to fine tune it. The point still stands that I can't afford to run on this type of hardware, as it is too expensive, nor do I have such heavy workloads, so these results are not relevant.

1 more reply

alexchantavy3y ago

Thanks for digging and sharing, I enjoyed your snark.

> They decided to provide the data not in a CSV file like a normal human being would, but instead in a giant cypher file performing individual transactions for each node and each relationship created. Not batches of transactions… but rather painful, individual, one at a time transactions one point 8 million times. So instead of the import taking 2 minutes, it takes hours.

Yeahhh I noticed this too when I looked at the repo when their blog was posted a couple weeks back. Running a transaction for each object will of course be very slow and real production code will (hopefully) not do this.

> Those are not “graphy” queries at all, why are they in a graph database benchmark? Ok, whatever.

I’m definitely interested in seeing more realistic scenarios of actual “graphy” queries with batched transactions comparing the two. Oh, and comparing against Neptune would be cool too since that supposedly uses openCypher now (which I hear is kinda close to neo4j cypher?).

mapleeman3y ago

This is true regarding the transactions and cypherl. All data is cypherl transactions because memgraph can handle a large volume of transactions. mgbench was designed to run in-house CI/CD, and mgBench is still tightly coupled with Memgraph. That is the reason we are still running everything in transactions. We did open an issue where we plan to improve things, adding CSV support for faster imports being one of them. https://github.com/memgraph/memgraph/issues/689 Feel free to suggest things, some things Max suggested we will add. Agree on the more complex queries, and different vendors.

dwroberts3y ago

The author works on RageDB (https://ragedb.com/) and this doesn't seem to be disclosed in the article

alias_neo3y ago

I'm not sure why it would need to be "disclosed", other than to suggest the author knows what they're talking about due to "domain knowledge".

LLcolD3y ago

He does "disclose" in related blog post [1]: "I don’t work for Neo4j anymore, why am I here defending them? Well… that and the fact that I still have a dinghy load of vested shares I have to sell so I can buy a place in the Villages and begin a new life as a golf cart driving day drinker."

This seems like a series of post on benchmarking results from different vendors so if he "disclosed" it once I don't think that there is need for another one.

[1] https://maxdemarzi.com/2022/12/06/khop-baby-one-more-time/

chaps3y ago

It's because he has reason to destroy other competitors. Nice read otherwise.

1 more reply

ramraj073y ago

Maybe because I don’t trust someone who allegedly writes databases but is proud about not knowing python.

3 more replies

a4isms3y ago

The author is discussing facts and providing replicable results. Disclosing their "conflict of interest" more clearly would be laudable, but even if they lied to us and pretended to be an independent journalist, that might sway our opinion of their character, but it would have no effect on the veracity of their writing.

What they say can be classified in three categories: Objectively right, objectively wrong, or subjective claims. Their conflict of interest only affects our evaluation of subjective claims.

Things that can be assessed as right are right even if they were said by Vladimir Putin; things that can be assessed as wrong are wrong even if they were said by Florence Nightingale. It is an ad hominem appeal to motive to suggest otherwise.

https://en.wikipedia.org/wiki/Appeal_to_motive

distortionfield3y ago

The author disclosed it in the first line of the article, albeit in a joke about death row records. Looks like someone needs to brush up on their west coast rap discography.

t0ddst4vn3y ago

Based on this non-emotional and calm post seems like an appropriate db name.

I can see he also worked for Neo4j. I suppose that with all the trouble neo is going through [1], he wanted to protect them.

[1] https://news.ycombinator.com/item?id=33916240

taubek3y ago

For more context:

- blog post that sparked the discussion - https://memgraph.com/blog/memgraph-vs-neo4j-performance-benc...

- earlier discussion about this Memgraph benchmark HackerNews - https://news.ycombinator.com/item?id=33813781

- the benchmark results - https://memgraph.com/benchgraph/

- benchmark repo and methodology - https://github.com/memgraph/memgraph/tree/master/tests/mgben...

1 more reply

mhio3y ago

I get that this is trying to point out that neo4j shouldn't be that far behind, but why are the i7/gatling test numbers being directly compared to memgraphs g6 test results? The conclusion is a bit premature without the other half of the test... What performance does memgraph have on the newer, single socket hardware?

junon3y ago

Yeah that was strange, it's my understanding that you can't compare benchmarks between different machines, especially if they're not 1:1 identical hardware.

If you're referring to this line, then it struct me as very odd.

> Instead of 112 queries per second, I get 531q/s. Instead of a p99 latency of 94.49ms, I get 28ms with a min, mean, p50, p75 and p95 of 14ms to 18ms. Alright, what about query 2? Same story.

Otherwise, the article holds up.

mhio3y ago

It was...

> It looks like Neo4j is faster than Memgraph in the Aggregate queries by about 3 times. Memgraph is faster than Neo4j for the queries they selected by about 2-3x except...

Unless that's meant to be a joke? Maybe they were dunking on the "bullshit" benchmark with a worse comparison.

1 more reply

roetlich3y ago

I think that part is still fine, because there he's only saying that he got different results for the same test on his hardware, which might help to set a baseline. Really weird is the table after "Let’s see the breakdown". It's not super clearly labeled so I'm not fully sure which data is which, but it looks he's comparing neo4j on his machine to memgraph on their older hardware, that would be very silly. Looking at the source for the benchmark, that also seems to hold.

robertlagrant3y ago

Author is just stating the differences between the benchmarketing hardware and his own. Not comparing new hardware and one DB with old hardware and other DB.

3 more replies

mastermedo3y ago

I wouldn’t say the benchmarks put out by graph databases are bullshit. But there is a need for a standardisation of how they’re produced.

The main problem is that when you’re comparing two products you’re bound to be comparing apples to oranges. Every product solves a slightly or majorly different challenge.

So when you run n tests on two different products some tests are bound to perform better on one product and some on the other. Misleading marketing comes into the picture if you only publish the ones that went your way or just partial results.

But that’s why if you believe in your own product and want benchmarks you hire a reputable third party to do them on their own accord.

th3sly3y ago

it even has a name: benchmarketing :D

LAC-Tech3y ago

What are people using graph databases for, and what do your queries look like?

I've read about them briefly but I have to admit my imagination fails me as to how it would look in the real world.

HyperSane3y ago

I use Neo4j to create a CMDB that pulls in data from Active Directory, File Shares, Cloudstrike API, Okta API, Windows Services, Processes, and TCP ports, VCenter, Cisco CDP , ARP tables, Routing Tables, and MAC address tables from routers and switches. Powershell get-foo commands combined with the ConvertTo-JSON makes it very easy to import data from Windows.

A possible query would be match (host:ESXihost)-[:running]->(vm:WindowsVM)-[:running]->(:Process {name:$processName}) return vm,host

I feel graph databases work very well to document the myriad dependencies in a enterprise IT stack and to integrate siloed data.

robertlagrant3y ago

That's an interesting idea. Having done CMDB stuff in a previous life and also used Neo4J in my last job, I appreciate that one. I don't know whether you'd gain much vs using Postgres with JSON fields, but I bet the ergonomics are better, and if you do need a big relationally recursive query then it'd work well.

robertlagrant3y ago

Graph databases are very cheap to traverse relationships between things, but slower to do per-item-type operations. So finding your friends of friends of friends is cheap, but finding the mean age of everyone in the database is slow.

abdullah29933y ago

They are useful specifically in the intelligence field like NSA(no wonder they have so much graph stuff opensourced). Let me share one obvious use case you have data on a lot of people like call data records, Facebook friends list, Twitter followers/following list and potentially a lot of other data as well. Now you have two targets person A and person B with graph databases it is a trivial one liner to find how these two people are linked. They can be linked directly or they could have 5 people between them doing the same in SQL recursive CTE is a major PIA and takes a lot of time(see degrees of kevin bacon using graph database). There are very niche companies that are making big bucks by just selling libraries/softwares just to plot these graphs and most of their customers are government agencies with a lot of funds.

cerved3y ago

I don't think recursive CTEs are that bad

1 more reply

jiggawatts3y ago

In a word: Facebook.

A more technical use case that I liked was a system that can analyse the configuration of resources across and entire network and find a "path" from a normal user account to a full admin privilege.

Something like: "Helpdesk user A can reset the password of a service account that can write to a file share that contains a script that is run on logon by every user including the full admin, allowing user A to trigger an action in the context of an admin B, making them equivalent to an admin."

You map out "things" on the network like file shares, security groups, accounts, etc... with links between them, and then ask for the shortest path from A to B.

randomdata3y ago

> In a word: Facebook.

Which, funnily enough, uses a relational database.

1 more reply

manv13y ago

I'll give you an example of a graph database use case.

The police have a ton of data lying around, and the consensus in the industry is that the 80/20 rule applies to criminals as well ie: 20% of the population takes up 80% of the police resources. You could probably also posit that 20% of that 20% are "peak criminals."

Anyway, they would like to track interactions of "things."

Say a car is involved in an incident. They normally track the make, model, plate, and color of the car - on paper. There's a lot of other info they they track: who owns that car? Who's in the car? Where is the car? Where does the owner live? Where do the occupants live? What other incidents has that car been involved in? Given the addresses of the people involved, who else is known to be around them?

All this relationship information can give someone a better understanding of the relationship between criminal elements in an area. If a car is being used in lots of crimes, it's easier to find out using a DB than some cop going "I recognize that car." If lots of people are being picked up and all live in a 2 block area, it'll be easier to see that if it's in a DB than a cop recognizing that fact from multiple incident reports.

I actually tried doing this in SQL, and it's super slow because you have to iterate over your tables over and over. With the graph database this becomes, well, substantially easier if you model it correctly.

This product, BTW, is known as CopLink by IBM.

As an aside, fusion centers have this problem too but worse, because they're supposed to coordinate information between different police departments in a region...all of whom don't particularly give a shit.

alexchantavy3y ago

I’ve blogged about a couple of examples:

Understanding the spider web that is AWS IAM permissions: https://eng.lyft.com/iam-whatever-you-say-iam-febce59d1e3b,

Calculating whether a vuln was introduced from a parent image or from the service itself in a microservice arch: https://eng.lyft.com/vulnerability-management-at-lyft-enforc...

orlp3y ago

Fraud detection. Detecting and analyzing anomalous flows of financial transactions requires you to look at multi-hop series of transactions.

pantsforbirds3y ago

They are a joy to use for ontologies or certain types of metadata.

At my last job we had a bunch of entity categories, and each of those had a huge number of individual entity types. When an entity was picked up through the data pipeline we'd query the graph db and convert that entity into whatever the "base" entity is for the category.

It also allowed us to easily query for strange connections or one off transformations that our customers frequently had without worrying about having a more rigorous and structured RDBMS schema for relatively uncommon queries.

Finally it made using algorithms like PageRank in our data science pipeline an absolute breeze.

I loved it, but we never used it as our primary database (postgres & athena in this case)

college_physics3y ago

my feeling is that graph databases face an uphill battle for mass adoption not because their architects or vendors doing anything wrong but some intrinsic aspects of information exchange in most current situations and use cases

* information tends to be private and/or commercially sensitive, this severs the links that graph dbs are good at representing (and made the "node focused" SQL approach the ubiquitous model that it currently is)

* objects in typical schemas have many more attributes that relations. while you could model things RDF style where everything is a relation, it is not the most intuitive for people

* when the previous constraint does not apply (e.g. data from a centralized social network), it is typically not too hard to emulate an adequate graph structure on a Pareto 20/80 basis using an RDBMS

so graph dbs end up being optimal only for a niche of situations and probably not the impact that the people / investors involved in their development would be happy with

on the other hand the ginie is out of the bottle and the decades-long SQL monoculture seems to be coming to an end. but maybe what results is a relational database+ type thingy [0] rather than two disconnected paradigms

[0] https://postgresconf.org/conferences/2020/program/proposals/...

nemo44x3y ago

I don’t know, they’re the fastest growing database segment by far according to DBEngines.

I saw very niche use cases 10 years ago but have seen more and more common use cases recently. Knowledge Graphs in particular have a lot of use. I’d expect to continue to see explosive growth in graph over the next 10 years.

As ubiquitous as SQL or document databases? Probably not. But very likely more and more use cases will be uncovered.

SahAssar3y ago

According to DBEngines MS Access is also more popular than SQLite. I wouldn't trust their methodology.

apavlo3y ago

> but maybe what results is a relational database+ type thingy

This is called already called "object-relational" model. It was invented by Postgres in the 1980s. The relational model / SQL absorbs the best part of alternative systems and get better over time. SQL:2023 is adding support for graph queries (SQL/PCG).

Graph DBMSs are a passing fade.

college_physics3y ago

I find projects like apache/age [0] are very promising in this direction. But I wouldn't call graph db's a passing fade. A more appropriate description might be "too important to be left alone, yet not important enough to form a second type of mass market database engine".

[0] https://github.com/apache/age

mbuda3y ago

It takes time, but szarnyasg and I will convince you otherwise :)

alfiedotwtf3y ago

On a tangent, what Graph Database would people recommend in 2023? In particular, I would like something that's linked in like SQLite rather than a full blown service like MySQL etc

stevesimmons3y ago

Kuzu looks very interesting: https://github.com/kuzudb/kuzu

Discussed here yesterday: https://news.ycombinator.com/item?id=34358912

alfiedotwtf3y ago

Thanks for the link!

sum1ren3y ago

I've been using auradb, not exactly SQLite-like but good way for me to get started for free.

chillfox3y ago

If you only need a few graph queries then you could just use SQLite, it’s capable of doing it (I have done it before). But writing graph queries in SQL is painful, so I wouldn’t do it if you need more than a handful.

mcv3y ago

Exactly. There's no need for a graph database if you're not going to be searching for complex relationships. And once you do, SQL is hell.

I don't know of any SQLite-like graph database. I'm still a fan of neo4j.

2 more replies

rjh293y ago

For simple cases, you can get pretty far storing relations 6 times in SQLite, or any old key/value store. (a-b-son, b-a-father, son-a-b, father-b-a, a-son-b, b-father-a)

steve_gh3y ago

This is interesting - can you expand a little or provide a link. I get a-b links, but where do father and son come in?

1 more reply

chx3y ago

https://github.com/dpapathanasiou/simple-graph ?

lvca3y ago

What about https://arcadedb.com ? Open Source, Apache 2, Free for any usage. It supports SQL but also Cypher and Gremlin (and something of MongoDB and Redis query languages)

lbriner3y ago

There are lots of what I would call "grey" marketing/sales type articles like this across virtually every saas business, it's how they get people onto their site.

Unfortuantely, an article that overstates benefits without any caveats is not illegal so it will carry on.

Many of us would like disclaimer e.g. "I work for the company" but also a much more bounded discussion, "this performance test works for this particular scenario" and perhaps "Please note, your scenario might be very different" and especially "Please contact me if you think I have missed something out".

I have worked for a business where you felt compelled to amplify the good and not talk about the bad but the world keeps spinning...

0xbadcafebee3y ago

Completely useless tangent: the word "benchmark" comes from a mark that surveyors would make in rock so that they could place a leveling rod for surveying. Benchmarks are made relative to other benchmarks so that surveying can be done relative to the height of one known fundamental benchmark.

It could be argued that it isn't really a benchmark unless you can accurately calculate the result based off of a common fundamental benchmark.

DougBTX3y ago

Cool, I always assumed that a benchmark was a mark on a bench, but it turns out that it is the bench which goes on the mark.

https://en.wikipedia.org/wiki/Benchmark_(surveying)

AtlasBarfed3y ago

Bwahahaha, it's been against the terms of use of Oracle to benchmark forever.

Lies, damn lies, and benchmarks people, take them all with a huge grain of salt.

chairmanwow13y ago

This is a great article. Usually authors have snark and no substance but this author was able to back up his takes with excellent notes.

manv13y ago

The real problem with these kinds of "benchmarks" is that either the company doesn't have anyone on staff that's calling "bullshit" on it or the marketing people don't care that it's bullshit.

Either one is a bad sign if they're going to be a vendor. At that point how can you trust their SLAs and/or their presales team?

taubek3y ago

In this thread I've seen comments from Memgraph CTO and founder (mbuda), CEO and founder (dtomicevic), and one of the developers that worked on mbench (mapleeman). To me it seems that they are addressing all of the questions in comments section.

bjornsing3y ago

> Why would they do this? Because it’s a bullshit benchmark and they don’t actually want anybody looking too deeply at it.

Very unlikely I would say. Most likely ”the clowns” never considered how the license terms would impact use of benchmarks.

“Never attribute to malice why can be sufficiently explained by incompetence” as the saying goes.

AtNightWeCode3y ago

Neo4J is bad at aggregated queries but that is not what to use a graphdb for in the first place.

beastman823y ago

Which graph database is actually fast and doesn't use deceptive marketing?

zwaps3y ago

Ah yes, memgraph.

I argued with them here about another bullshit benchmark https://news.ycombinator.com/item?id=33717766

they did reply tho

jerf3y ago

One of my favorite things is "the thing that sounds obvious when I say it, but you didn't think of it before". Here's one related to benchmarking: For A to be 120x better than B in a comparable task, that has to mean that B is leaving that much performance on the table in the first place.

Now, let's combine this with one of the persistent tendencies of developers to take one specific benchmark as indicative of the overall performance, which is often preyed on by benchmarkers trying to sell things.

Is it really plausible that neo4j takes 120x longer than it needs to on all operations? A dedicated graph database that has been tuned and optimized for that task for quite a while now?

I'm not quite going to rate that a 0 probability, but it's definitely a very big claim. While the probability is not 0, it is comfortably below "someone's gaming the numbers" and "the benchmark is not as comparable as claimed". There's a faint chance the latter may match a production use case; for instance, certain comparisons of NoSQL DBs and SQL DBs are "not fair" in that they won't be doing remotely the same things for the queries and the performance landscape is very complicated, with one side winning handily for some tasks and the other side handily winning for others, but if your use case falls into one of those big wins you may not care about the "fairness". But it's still a pretty big chunk of probability mass that it's just plain not comparable; how many times have we seen a ludicrous benchmarking claim of relative superiority just for the losing side to pop up and say something to the effect of "Hey, did you consider adding the correct index to the data, oh look if you do that we win by a factor of 4."

Tell me you're 1.2x or 1.5x faster or something, or that your clever compression means I can remove 1/3rd of my systems or something. Keep it in the range of plausible.

While I'm sure this won't affect the marketing of this company any, ludicrously large claims of 10x+ speed improvements actually turn me off, not attract me. You'd better have some sort of super compelling reason why you somehow managed to be that fast over your competitor, like, "we're the first to successfully leverage GPUs" or something like that. Otherwise I'm going to guess "Actually, you have an O(log n log log n) algorithm over their O(log n log n) algorithm and you cranked the data set up to the ludicrous sizes it takes to get an arbitrarily large X factor improvement over your competition" or something like that.

(Always gotta love people comparing two completely different O(...) algorithms against each other and declaring one is X times faster than the other. This is another major source of "10,000x faster!"... yeah, O(n log n) is "10,000 faster!" than O(n^2), sure. It's also 100,000 times faster, 10 times faster, and a billionkajillion times faster, all at the same time.)

ashvardanian3y ago

I am really stunned by this story. It made me check the MemGraph benchmarks section. Don't get me wrong, it may be 10-100x faster than Neo4J in even the most basic operations. Moreover, given the quality of Neo4J, it is hard not to be that much quicker. Even Postgres and MySQL are better at storing graphs than Neo4J.

---

Disclosure: I have worked on Graph Algorithms, Graph Databases, and Database Engines for years, and we are now preparing a commercial solution based on UKV [1]. I don't know anyone at MemGraph or Neo4J. Never used the first. As for the second, I am not a fan.

---

Aside from licensing, there are 3 primary complaints. I will address them individually, and I am open to a discussion.

A. Using Python for Benchmarks instead of Gatling. I don't entirely agree with this. Python still has the fastest-growing programming community while already being one of the 2 most popular languages. Gatling, however, never heard of it. Choosing between the two, I would pick Python. But neither works if you want to design a High-Performance benchmark for a fast system. Without automatic memory management and expensive runtimes, you can only implement those in C, C++, Rust, or another systems-programming language. We have faced that too many times that the benchmark itself works worse than the system it is trying to evaluate [2].

B. Using hardware from 2010 [3], weird datasets [4]. This shocked me. When I looked at the charts [5] and the benchmarking section, it seemed highly professional and good-looking. I wouldn't expect less from a startup with $20M VC funding. But the devil is in the details. I would have never expected anyone benchmarking a new DBMS to use now 13-year-old CPUs and an unknown dataset. Assuming current developer salaries, hiring people to design a DBMS doesn't make sense if you will be evaluating on a $1000 machine is just financially irresponsible. We buy expensive servers, they cost like sports cars or even apartments in poorer countries. It is hard to maintain, but they are essential to quality work. It is sad to see companies taking such shortcuts. But to be a devil's advocate, there is no 1 graph benchmark or dataset that everyone agrees on. So I imagine people experimenting with multiple real datasets of different sizes or generating them systemically using one of the Random Generator algorithms. In UKV, we have used Twitter data to construct both document and graph collections. In the past, we have also used `ci-patent`, `bio-mouse-gene`, `human-Jung2015-M87102575`, and hundreds of other public datasets from the Network Repository and SNAP [6]. There are datasets of every shape and size, reaching around 1 Billion edges, in case someone is searching for data. For us the next step is the reconstruction of the Web from the 300 TB CommonCrawl dataset [7]. There is no such Graph benchmark in existence, but it is the biggest public dataset we could find.

C. Running query different number of times for various engines. This can be justified, and it is how current benchmarks are done. You are tracking not just the mean execution time but also variability, so if at some point results converge, you abrupt before hitting the expected iterations number to save time.

---

LDBC [8] seems like a good contestant for a potential industry standard, but it needs to be completed. Its "Business Intelligence workload" and "Interactive workload" categories exclude any real "Graph Analytics". Running an All-Pairs-Shortest-Paths algorithm on a large external memory graph could have been a much more interesting integrated benchmark. Similarly, one can make large-scale community detection or personalized recommendations based on Graphs and evaluate the overall cost/performance. It, however, poses another big challenge. Almost all algorithm implementations for those problems are vertex-centric. They scale poorly with large sparse graphs that demand edge-centric algorithms, so a new implementation has to be written from scratch. We will try to allocate more resources towards that in 2023 and invite anyone curious to join.

---

[1] https://github.com/unum-cloud/ukv [2] https://unum.cloud/post/2022-03-22-ucsb [3] https://github.com/memgraph/memgraph/tree/master/tests/mgben... [4] https://github.com/memgraph/memgraph/tree/master/tests/mgben... [5] https://memgraph.com/benchgraph/base [6] https://snap.stanford.edu/data [7] https://commoncrawl.org [8] https://ldbcouncil.org/benchmarks/snb

mbuda3y ago

A. If you take a deeper look, the benchmarking part is implemented in C++ (client + benchmark session management + measurements), Python is just layer on top to orchestrate everything but it's not on a critical path at all

B. Yes, old hardware is 100% a problem, that's why the benchgraph web is extensible with more hardware options, stay tuned for that, it's going to come soon!

C. Yep, legit, something to expand / improve

szarnyasg3y ago

In LDBC, we have a separate benchmark for graph algorithms: https://ldbcouncil.org/benchmarks/graphalytics/

This focuses on untyped, unattributed graphs and includes algorithms which are often formulated following a vertex-centric programming model (BFS, PageRank, community detection, connected components, etc.). The SNB Business Intelligence already covers a few of these algorithms (BFS, weighted shortest paths) and in the future it may incorporate more.

We plan to run a Graphalytics competition in the spring (on data sets with up to tens of billions of edges) - let me know if anyone is interested in participating in this.

ashvardanian3y ago

Sounds good, we may participate. Will you be adding larger graph datasets to the list?

1 more reply

j / k navigate · click thread line to collapse

113 comments

szarnyasg3y ago

[1] https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.p...

[2] https://ldbcouncil.org/benchmarks/snb/

mbuda3y ago

Again, definitely take a look at LDBC benchmarks they are in general the most relevant.

mapleeman3y ago

This is indeed a good industry leading benchmark.

PreInternet013y ago

The only way to get usable benchmark data is to run your own workloads in your own environment: everything else is pretty much noise.

LLcolD3y ago

[EDIT]: I fixed few typos.

dtomicevic3y ago

Memgraph compared the freely available open source editions of both databases. Neo4j Enterprise seems to have more performance optimizations compared to the community Edition.

makeitdouble3y ago

He didn't run the Memgraph one, only took their numbers https://news.ycombinator.com/item?id=34368362

gymbeaux3y ago

I mean it should come as no surprise that an in-memory graph DB outperforms one that stores data on a hard disk, even an NVMe SSD.

1 more reply

908B64B1973y ago

Engineering is about being able to understand what a benchmark or measure truly means, and what useable information it contains.

mbuda3y ago

maxdemarziOP3y ago

If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...

redbar0n3y ago

> If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...

Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)

Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?

I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531

Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”