However, using a Hilbert Curve for sharding doesn't seem like the best approach. You can partition by anything you like, it doesn't have to be arbitrary points along your index. Using 1-dimension to shard 2D data isn't optimal.
For example, construct a heatmap of your 'load score' and shard based on that, in two dimensions. Then use an S2 curve to index inside that shard.
Yes, that's also what I thought. Searching for "same size k-means" yields a simple postprocessing step to even out the clusters produced by the usual k-means algorithm.
EDIT: k-means is adapted directly here: https://elki-project.github.io/tutorial/same-size_k_means
If you want to shard by proximity (items close in space are likely to be in the same shard, then the transform to 1d is the way to go, why wouldn't it be? What is your definition of "optimal"?
Sharding by proximity or by something else depends on the relative frequency of queries by location or something else. If you shard by location, then a query to one location goes (ususally) to one shard. That should scale better. Otherwise, each location-based query goes to each shard.
In the original design we did consider approach similar to the "heatmap approach" you mentioned, it would reduce the shard movement for people who commutes in the city.
Later we figured that simply sharding along on the Hilbert Curve already give what we need, and the shard move is unavoidable, we would explain more details around the shard move in future blog.
tl;dr - S2 is a good library.
There's also a joke in here somewhere about bikeshedding and yak shaving.
One solution would be to scale their recommendation radius with density; no need to look farther than 5 miles for true love in Manhattan, but in Wyoming you may have to drive a bit.
Redis has geo search for example, why not using it?
https://github.com/Qbix/Platform/tree/master/platform/plugin...
First of all, we normally do our sharding by the hash of the primary key. Typically it is the publisherId and hash of the name of a “stream”, which is our general dynamic data structure. What this does is essentially distribute the load evenly between hashes without having to worry about the “water” stuff mentioned above.
Ok now about location based search...
1. We use GeoHash for addressing the surface of the Earth. For lookups, we calculate actual corners of the lat-lng rectangle in Haversine distance and then translate it to geohash.
2. Technically we could have just used a database index at this point and be done with it. But we want to handle both PULL requests and PUSH notifications when something happens nearby.
3. So given a radius the subscriber is interested in, we set up 4 streams for SUBSCRIBERS that together (as rectangles) cover the circle surrounding the location and radius.
4. Then we also set up streams for PUBLISHERS which correspond to all the various drop-down values we have for the radius. This is compicated by the fact that we have one set for metric system and another for imperial. But we just make one stream for each of them. The more radii available the more streams you have to post to (like 10-20) but posting doesn’t happen that often and it’s better to precompute this stuff.
5. Relations between streams are one of the many features that streams do out of the box. So basically the user can just subscribe to a certain location and radius, interest, etc. and get either realtime updates or offline notifications. The item being posted is itself a stream and what’s being inserted is relations between it and the category stream the person is interested in.
Thus you can also see right away who is interested in what. In the future however we plan to encrypt the data end to end so the servers won’t know this info. Then the location search will be harder. (qbix.com/blog)
Elasticsearch has geo-indexing as well(based on geohash internally), and by default it does id hashing similar to what you said(murmurhash3), we actually leverages that for location based searches.
The challenge addressed in the blog is not in how to search/address(as said Elasticsearch handles it already), it is about how to distribute the load so calculation only happens on limited nodes, and reduce the index size so it can be more performant.
In the scheme above, by the way, it DOES localize searches on one shard. Essentially all relations to a stream are on the same shard as the stream. And each center+radius has one associated stream and therefore the search takes place on one shard.
Another option would be to map coords into very small 2D tiles and than look up "tile -> shard number" with a simple, large, array. A variable number of tiles would land into the same shard. Fast and lockless reads and writes. I wonder if it could be faster.
TLDR; Awful
I find it far more rewarding to try to hack the awful world around me instead of complaining how awful the world is/is becoming.
If you have the time and preference to meet people face to face, you can still do that. People still do that and Tinder is a nice supplement.
But many of us look at dating as a numbers game. And Tinder is a hell of an upgrade to sending messages to women on other dating platforms where you don’t even know if they like your skin color. That’s a waste of time.
You don’t need to be perfect to get Tinder dates. But being ugly in this world with or without Tinder already stacks cards against you. Tinder is not so different from cold-approaching women at the bar. Except in Tinder you know she is more likely to give you a shot beforehand. But you have to trade away the ability to charm her in person. Why does it bother you so much that many people like tht trade-off?
Your post reeks of someone who’s mad that all those attractive people seem to be fucking everyone but you. Dating is hard. I’m willing to try many paradigms/sources at once to date at my desired level. If I was meeting new women every week through my social circle (the ideal imo) then I wouldn’t use Tinder or go out just to find single women, but until then...
If you’re not happy with your dating life, then maybe consider increasing your hustle yourself and/or find the approaches that work for you. Maybe Tinder just isn’t for you, but seems a bit outward to turn it into technosocial criticism.
Doesn't sound like a perfect system, but actually does a wonderful job at debunking very appealing-sounding articles.
The fun of Hacker News is hearing pros talk shop about stuff too distant from what you're working on.