[1] https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.p...
Again, definitely take a look at LDBC benchmarks they are in general the most relevant.
Why? Simple: pretty much any benchmark I've seen of anything, ever, was similar nonsense -- give people numbers to game and they'll do so, enthusiastically. Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.
The only way to get usable benchmark data is to run your own workloads in your own environment: everything else is pretty much noise.
I've also noticed (from OPs tweet https://twitter.com/maxdemarzi/status/1613075177704677376) that he used Enterprise version of Neo4j, but it doesn't say which Memgrpah version was used. I don't have experience with this two databases, but usually ENT versions are somewhat better than community ones.
[EDIT]: I fixed few typos.
I'd assume it's what the disclaimer at the top was about: that the code is in Python which he's not familiar enough with. The license bit on not having the right to integrate it if you're building a competing product might not be decisive but a good enough reason to not invest more time into running the code.
I would also add that the primary sell for Memgraph seems to be “fast enough that it can process data as it comes in via a stream, and present it to the user in a reasonable timeframe”. Anyone facing this use-case would want to use Memgraph regardless of how much faster it is than Neo4j.
It sets an upper bound on a server's performance given that page generation completes instantly. Sure it won't reflect real world performance, but in this case the benchmark should be read as "higher requests per second = lower resource footprint for the server".
Engineering is about being able to understand what a benchmark or measure truly means, and what useable information it contains.
100% agree that everyone should run their own benchmark, that's not possible to do from the vendor perspective, it's just not possible for every single usage. Public benchmarks are something to look into if you want to get cheap info on how different vendors might do for you. Huge but is that it's just a guide, not the only piece of info you should base your decision on.
If you want to be my hero, find a way to fix this problem: https://maxdemarzi.com/2023/01/09/death-star-queries-in-grap...
Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)
Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?
Those algorithms are based on linear algebra and matrix operations on sparse matrices (which are like compressed bitmaps on speed, re: https://github.com/RoaringBitmap/RoaringBitmap ). The insight is that the adjacency list of a property-graph is actually a matrix, and then you can use linear algebra on it. But it may require the DB is built bottom up with matrices in mind from the start (instead of linked lists like Neo4j does). Maybe your double array approach in RageDB could be made to fit..
I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531
Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”
That way, for such queries one doesn't have to traverse the graph database as a discovery process through what each node "knows about", but could view and operate on the database from a God-like perspective, similar to table operations in relational databases.
Further reading: https://graphblas.org/
Such benchmarks also ignore the engineering expertise an organisation has. Do you need to be an expert to fine tune 6000 parameters or can you tune the system to an acceptable standard by reading a few blogs.
Some people pointed out the actual query only coated $242. My counter argument is that this appears to be based on buying reserved instances from AWS for 3 years. In real life this query would also run daily, or at least you would need several iterations to get the results you want.
The costs also include a super low budget laptop ($279). It is more than fine for running the query, however, you wouldn't use it a development machine. This shows these results have been heavily massaged.
The 'out of the box' or naive and un-optimized performance of something is the baseline. And with something as huge and self-contained as a database you want the happy path to be fine in terms of performance.
The laptop is also very low budget. I am sure it is fine to run the final query, however, you would unlikely to be able to use that as a development machine.
If you look into the blog post: https://www.databricks.com/blog/2021/11/02/databricks-sets-o..., you will see that it costed $242
> They decided to provide the data not in a CSV file like a normal human being would, but instead in a giant cypher file performing individual transactions for each node and each relationship created. Not batches of transactions… but rather painful, individual, one at a time transactions one point 8 million times. So instead of the import taking 2 minutes, it takes hours.
Yeahhh I noticed this too when I looked at the repo when their blog was posted a couple weeks back. Running a transaction for each object will of course be very slow and real production code will (hopefully) not do this.
> Those are not “graphy” queries at all, why are they in a graph database benchmark? Ok, whatever.
I’m definitely interested in seeing more realistic scenarios of actual “graphy” queries with batched transactions comparing the two. Oh, and comparing against Neptune would be cool too since that supposedly uses openCypher now (which I hear is kinda close to neo4j cypher?).
This seems like a series of post on benchmarking results from different vendors so if he "disclosed" it once I don't think that there is need for another one.
[1] https://maxdemarzi.com/2022/12/06/khop-baby-one-more-time/
What they say can be classified in three categories: Objectively right, objectively wrong, or subjective claims. Their conflict of interest only affects our evaluation of subjective claims.
Things that can be assessed as right are right even if they were said by Vladimir Putin; things that can be assessed as wrong are wrong even if they were said by Florence Nightingale. It is an ad hominem appeal to motive to suggest otherwise.
I can see he also worked for Neo4j. I suppose that with all the trouble neo is going through [1], he wanted to protect them.
- blog post that sparked the discussion - https://memgraph.com/blog/memgraph-vs-neo4j-performance-benc...
- earlier discussion about this Memgraph benchmark HackerNews - https://news.ycombinator.com/item?id=33813781
- the benchmark results - https://memgraph.com/benchgraph/
- benchmark repo and methodology - https://github.com/memgraph/memgraph/tree/master/tests/mgben...
If you're referring to this line, then it struct me as very odd.
> Instead of 112 queries per second, I get 531q/s. Instead of a p99 latency of 94.49ms, I get 28ms with a min, mean, p50, p75 and p95 of 14ms to 18ms. Alright, what about query 2? Same story.
Otherwise, the article holds up.
> It looks like Neo4j is faster than Memgraph in the Aggregate queries by about 3 times. Memgraph is faster than Neo4j for the queries they selected by about 2-3x except...
Unless that's meant to be a joke? Maybe they were dunking on the "bullshit" benchmark with a worse comparison.
The main problem is that when you’re comparing two products you’re bound to be comparing apples to oranges. Every product solves a slightly or majorly different challenge.
So when you run n tests on two different products some tests are bound to perform better on one product and some on the other. Misleading marketing comes into the picture if you only publish the ones that went your way or just partial results.
But that’s why if you believe in your own product and want benchmarks you hire a reputable third party to do them on their own accord.
I've read about them briefly but I have to admit my imagination fails me as to how it would look in the real world.
A possible query would be match (host:ESXihost)-[:running]->(vm:WindowsVM)-[:running]->(:Process {name:$processName}) return vm,host
I feel graph databases work very well to document the myriad dependencies in a enterprise IT stack and to integrate siloed data.
A more technical use case that I liked was a system that can analyse the configuration of resources across and entire network and find a "path" from a normal user account to a full admin privilege.
Something like: "Helpdesk user A can reset the password of a service account that can write to a file share that contains a script that is run on logon by every user including the full admin, allowing user A to trigger an action in the context of an admin B, making them equivalent to an admin."
You map out "things" on the network like file shares, security groups, accounts, etc... with links between them, and then ask for the shortest path from A to B.
Which, funnily enough, uses a relational database.
The police have a ton of data lying around, and the consensus in the industry is that the 80/20 rule applies to criminals as well ie: 20% of the population takes up 80% of the police resources. You could probably also posit that 20% of that 20% are "peak criminals."
Anyway, they would like to track interactions of "things."
Say a car is involved in an incident. They normally track the make, model, plate, and color of the car - on paper. There's a lot of other info they they track: who owns that car? Who's in the car? Where is the car? Where does the owner live? Where do the occupants live? What other incidents has that car been involved in? Given the addresses of the people involved, who else is known to be around them?
All this relationship information can give someone a better understanding of the relationship between criminal elements in an area. If a car is being used in lots of crimes, it's easier to find out using a DB than some cop going "I recognize that car." If lots of people are being picked up and all live in a 2 block area, it'll be easier to see that if it's in a DB than a cop recognizing that fact from multiple incident reports.
I actually tried doing this in SQL, and it's super slow because you have to iterate over your tables over and over. With the graph database this becomes, well, substantially easier if you model it correctly.
This product, BTW, is known as CopLink by IBM.
As an aside, fusion centers have this problem too but worse, because they're supposed to coordinate information between different police departments in a region...all of whom don't particularly give a shit.
Understanding the spider web that is AWS IAM permissions: https://eng.lyft.com/iam-whatever-you-say-iam-febce59d1e3b,
Calculating whether a vuln was introduced from a parent image or from the service itself in a microservice arch: https://eng.lyft.com/vulnerability-management-at-lyft-enforc...
At my last job we had a bunch of entity categories, and each of those had a huge number of individual entity types. When an entity was picked up through the data pipeline we'd query the graph db and convert that entity into whatever the "base" entity is for the category.
It also allowed us to easily query for strange connections or one off transformations that our customers frequently had without worrying about having a more rigorous and structured RDBMS schema for relatively uncommon queries.
Finally it made using algorithms like PageRank in our data science pipeline an absolute breeze.
I loved it, but we never used it as our primary database (postgres & athena in this case)
* information tends to be private and/or commercially sensitive, this severs the links that graph dbs are good at representing (and made the "node focused" SQL approach the ubiquitous model that it currently is)
* objects in typical schemas have many more attributes that relations. while you could model things RDF style where everything is a relation, it is not the most intuitive for people
* when the previous constraint does not apply (e.g. data from a centralized social network), it is typically not too hard to emulate an adequate graph structure on a Pareto 20/80 basis using an RDBMS
so graph dbs end up being optimal only for a niche of situations and probably not the impact that the people / investors involved in their development would be happy with
on the other hand the ginie is out of the bottle and the decades-long SQL monoculture seems to be coming to an end. but maybe what results is a relational database+ type thingy [0] rather than two disconnected paradigms
[0] https://postgresconf.org/conferences/2020/program/proposals/...
I saw very niche use cases 10 years ago but have seen more and more common use cases recently. Knowledge Graphs in particular have a lot of use. I’d expect to continue to see explosive growth in graph over the next 10 years.
As ubiquitous as SQL or document databases? Probably not. But very likely more and more use cases will be uncovered.
This is called already called "object-relational" model. It was invented by Postgres in the 1980s. The relational model / SQL absorbs the best part of alternative systems and get better over time. SQL:2023 is adding support for graph queries (SQL/PCG).
Graph DBMSs are a passing fade.
Discussed here yesterday: https://news.ycombinator.com/item?id=34358912
I don't know of any SQLite-like graph database. I'm still a fan of neo4j.
Unfortuantely, an article that overstates benefits without any caveats is not illegal so it will carry on.
Many of us would like disclaimer e.g. "I work for the company" but also a much more bounded discussion, "this performance test works for this particular scenario" and perhaps "Please note, your scenario might be very different" and especially "Please contact me if you think I have missed something out".
I have worked for a business where you felt compelled to amplify the good and not talk about the bad but the world keeps spinning...
It could be argued that it isn't really a benchmark unless you can accurately calculate the result based off of a common fundamental benchmark.
Lies, damn lies, and benchmarks people, take them all with a huge grain of salt.
Either one is a bad sign if they're going to be a vendor. At that point how can you trust their SLAs and/or their presales team?
Very unlikely I would say. Most likely ”the clowns” never considered how the license terms would impact use of benchmarks.
“Never attribute to malice why can be sufficiently explained by incompetence” as the saying goes.
I argued with them here about another bullshit benchmark https://news.ycombinator.com/item?id=33717766
they did reply tho
Now, let's combine this with one of the persistent tendencies of developers to take one specific benchmark as indicative of the overall performance, which is often preyed on by benchmarkers trying to sell things.
Is it really plausible that neo4j takes 120x longer than it needs to on all operations? A dedicated graph database that has been tuned and optimized for that task for quite a while now?
I'm not quite going to rate that a 0 probability, but it's definitely a very big claim. While the probability is not 0, it is comfortably below "someone's gaming the numbers" and "the benchmark is not as comparable as claimed". There's a faint chance the latter may match a production use case; for instance, certain comparisons of NoSQL DBs and SQL DBs are "not fair" in that they won't be doing remotely the same things for the queries and the performance landscape is very complicated, with one side winning handily for some tasks and the other side handily winning for others, but if your use case falls into one of those big wins you may not care about the "fairness". But it's still a pretty big chunk of probability mass that it's just plain not comparable; how many times have we seen a ludicrous benchmarking claim of relative superiority just for the losing side to pop up and say something to the effect of "Hey, did you consider adding the correct index to the data, oh look if you do that we win by a factor of 4."
Tell me you're 1.2x or 1.5x faster or something, or that your clever compression means I can remove 1/3rd of my systems or something. Keep it in the range of plausible.
While I'm sure this won't affect the marketing of this company any, ludicrously large claims of 10x+ speed improvements actually turn me off, not attract me. You'd better have some sort of super compelling reason why you somehow managed to be that fast over your competitor, like, "we're the first to successfully leverage GPUs" or something like that. Otherwise I'm going to guess "Actually, you have an O(log n log log n) algorithm over their O(log n log n) algorithm and you cranked the data set up to the ludicrous sizes it takes to get an arbitrarily large X factor improvement over your competition" or something like that.
(Always gotta love people comparing two completely different O(...) algorithms against each other and declaring one is X times faster than the other. This is another major source of "10,000x faster!"... yeah, O(n log n) is "10,000 faster!" than O(n^2), sure. It's also 100,000 times faster, 10 times faster, and a billionkajillion times faster, all at the same time.)
---
Disclosure: I have worked on Graph Algorithms, Graph Databases, and Database Engines for years, and we are now preparing a commercial solution based on UKV [1]. I don't know anyone at MemGraph or Neo4J. Never used the first. As for the second, I am not a fan.
---
Aside from licensing, there are 3 primary complaints. I will address them individually, and I am open to a discussion.
A. Using Python for Benchmarks instead of Gatling. I don't entirely agree with this. Python still has the fastest-growing programming community while already being one of the 2 most popular languages. Gatling, however, never heard of it. Choosing between the two, I would pick Python. But neither works if you want to design a High-Performance benchmark for a fast system. Without automatic memory management and expensive runtimes, you can only implement those in C, C++, Rust, or another systems-programming language. We have faced that too many times that the benchmark itself works worse than the system it is trying to evaluate [2].
B. Using hardware from 2010 [3], weird datasets [4]. This shocked me. When I looked at the charts [5] and the benchmarking section, it seemed highly professional and good-looking. I wouldn't expect less from a startup with $20M VC funding. But the devil is in the details. I would have never expected anyone benchmarking a new DBMS to use now 13-year-old CPUs and an unknown dataset. Assuming current developer salaries, hiring people to design a DBMS doesn't make sense if you will be evaluating on a $1000 machine is just financially irresponsible. We buy expensive servers, they cost like sports cars or even apartments in poorer countries. It is hard to maintain, but they are essential to quality work. It is sad to see companies taking such shortcuts. But to be a devil's advocate, there is no 1 graph benchmark or dataset that everyone agrees on. So I imagine people experimenting with multiple real datasets of different sizes or generating them systemically using one of the Random Generator algorithms. In UKV, we have used Twitter data to construct both document and graph collections. In the past, we have also used `ci-patent`, `bio-mouse-gene`, `human-Jung2015-M87102575`, and hundreds of other public datasets from the Network Repository and SNAP [6]. There are datasets of every shape and size, reaching around 1 Billion edges, in case someone is searching for data. For us the next step is the reconstruction of the Web from the 300 TB CommonCrawl dataset [7]. There is no such Graph benchmark in existence, but it is the biggest public dataset we could find.
C. Running query different number of times for various engines. This can be justified, and it is how current benchmarks are done. You are tracking not just the mean execution time but also variability, so if at some point results converge, you abrupt before hitting the expected iterations number to save time.
---
LDBC [8] seems like a good contestant for a potential industry standard, but it needs to be completed. Its "Business Intelligence workload" and "Interactive workload" categories exclude any real "Graph Analytics". Running an All-Pairs-Shortest-Paths algorithm on a large external memory graph could have been a much more interesting integrated benchmark. Similarly, one can make large-scale community detection or personalized recommendations based on Graphs and evaluate the overall cost/performance. It, however, poses another big challenge. Almost all algorithm implementations for those problems are vertex-centric. They scale poorly with large sparse graphs that demand edge-centric algorithms, so a new implementation has to be written from scratch. We will try to allocate more resources towards that in 2023 and invite anyone curious to join.
---
[1] https://github.com/unum-cloud/ukv [2] https://unum.cloud/post/2022-03-22-ucsb [3] https://github.com/memgraph/memgraph/tree/master/tests/mgben... [4] https://github.com/memgraph/memgraph/tree/master/tests/mgben... [5] https://memgraph.com/benchgraph/base [6] https://snap.stanford.edu/data [7] https://commoncrawl.org [8] https://ldbcouncil.org/benchmarks/snb
B. Yes, old hardware is 100% a problem, that's why the benchgraph web is extensible with more hardware options, stay tuned for that, it's going to come soon!
C. Yep, legit, something to expand / improve
This focuses on untyped, unattributed graphs and includes algorithms which are often formulated following a vertex-centric programming model (BFS, PageRank, community detection, connected components, etc.). The SNB Business Intelligence already covers a few of these algorithms (BFS, weighted shortest paths) and in the future it may incorporate more.
We plan to run a Graphalytics competition in the spring (on data sets with up to tens of billions of edges) - let me know if anyone is interested in participating in this.