How to generate random geo-coordinates within a given radius (opens in new tab)

(jordinl.com)

94 pointsjordinl7y ago52 comments

52 comments

tzs7y ago

> On a previous project we needed to show markers on a map to indicate there was a property, but for privacy reasons we wanted to show the marker on the map close to the property but not at the exact location of the property. So we needed to generate random points within a fixed distance of the original point location

This raises a couple of interesting questions.

1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

2. Are there any negative consequences for the people whose property a displaced marker ends up on? For example, if someone is using the map lookup to find the home of a rival gang officer in order to do a drive by attack, and I happen to live down the street, I really don't want that marker showing up on my house.

Rather than a displaced marker, maybe divide the map into rectangular or hexagonal regions, and just highlight the region containing the property.

johnjreiser7y ago

> 1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

Back when YikYak was a thing, I was scraping the data for university campus through their API and found that their location data was randomized upon access. This meant that popular posts (they survived for longer) would end up producing pretty accurate location data.

In one case, I confirmed this by walking into the computer lab in the building where the Geography courses were located and asked "who was asking for lab help on YikYak?" and one student sheepishly raised his hand. I helped him and then also gave him a lesson on real world applications of the central limit theorem.

agwa7y ago

> 2. Are there any negative consequences for the people whose property a displaced marker ends up on?

That's definitely plausible: https://splinternews.com/how-an-internet-mapping-glitch-turn...

sjwright7y ago

To answer (1), if you seed the randomiser with a known static value—e.g. the co-ordinates, or a database record ID—then repeat queries will return the same mangled result.

Or an even simpler, more trivial solution would be to store mangled values next to the true values. Then all of your APIs and services can pick the appropriate value for the circumstance.

To make it more obvious that the data is imprecise, I would also ensure that the mangled geo co-ordinates returned with a minimum number of decimal places. Three decimal places gives you a precision of 50–100 meters, depending on latitude. Four gives you 5-10 meters, which is the most you'd ever need for an approximate house location.

The concern that comes to my mind is the size of the randomisation and whether they factor in the size of surrounding properties. The level of imprecision required in a dense city will have little to no effect on farmland properties. I would dynamically adjust the random amplitude so that the circle of confusion covered a minimum number of properties. At minimum you'd find a database that listed median property sizes for a given postal code/zip code.

jordinlOP7y ago

We used a slightly modified version of the code in the article. Instead of using random numbers we used pseudo-random numbers so the positions didn't change across requests.

agwa7y ago

Is the pseudo-random number generator cryptographically secure?

1 more reply

femto1137y ago

On 2. the consequences can be real, as in the article linked below where people keep accusing a couple of stealing their phones. I believe in that case their house was located at the centroid of a zipcode and whatever tool was used to map it simply put a pin at the centroid. This suggests that just lowering accuracy is not sufficient to protect innocent parties if that lowered accuracy isn't reflected in the visualization, and an approach like highlighting a region is probably a better way to go than giving a precise but inaccurate location.

https://www.washingtonpost.com/news/the-switch/wp/2016/01/26...

ademarre7y ago

https://gizmodo.com/how-cartographers-for-the-u-s-military-i...

https://splinternews.com/how-an-internet-mapping-glitch-turn...

In those cases there were negative consequences because low-precision markers were not displaced.

curiousgal7y ago

Amusingly, this is the first time that I've seen the Law of Large Numbers being treated as a negative thing.

usrusr7y ago

Any random distribution will be leaky when it is centered around the real location and multiple samples can be taken. Quantization to a desired lack of precision is the answer, nobody will be able to tell a point that happens to be rounded to itself from one that happens to be rounded from close to a neighboring cell.

If for some reason the data needs to appear random, you have to store a fixed random offset/alternative per point so that there cannot be multiple observations.

If for some reason the data needs to appear randomized between repeated lookups, you need to store a fixed offset/alternative per point and then apply an additional random offset per lookup. Repeated lookups will only leak the (badly) hidden permanently randomized stored alternative.

unterbahn7y ago

Leaflet.BlurredLocation https://github.com/publiclab/leaflet-blurred-location

femto1137y ago

If the rationale is privacy this is probably not the algorithm you want to use, especially if it's possible to force the generation of multiple random points (e.g. by refreshing the map). If you're using a circle with a fixed radius just 3 points would be enough to exactly locate the center. A variable radius could take more, but since the distribution is uniform each additional point would significantly improve the precision.

WrtCdEvrydy7y ago

If you used a fingerprinting library as the seed for random, it would be repeatable per user.

jordinlOP7y ago

As I mentioned in a different comment, we used a slightly modified version of the code in the article. Instead of using random numbers we used pseudo-random numbers so the positions didn't change across requests.

ape47y ago

I imagine its for something like AirBnb where the site doesn't want the users to the location the property so they only way to book it is via the site.

jeromegv7y ago

What's funny with Airbnb is that if you are looking at the map in search, it is approximate. However, if you save the property to your favorites and then go to your favorite list and look at the map, you get the exact location.

2 more replies

mtmail7y ago

I regularly use this tool (nodejs) for a list of random coordinates inside a country. Or at least inside any country to avoid the oceans. https://github.com/tsamaya/random-points-generator

MRD857y ago

Why can't you simply generate two random numbers with one between 0 and 2pi, and the second between the min and max of distance? Now you've got polar coordinates. I'm not sure if I'm missing something or if this is overengineered.

If people are interested I've included a link on how Tinder keeps your location private. It's really interesting.

https://robertheaton.com/2018/07/09/how-tinder-keeps-your-lo...

Misdicorl7y ago

This will give you a random distribution, but not a uniform one. Concrete description why: you're just as likely to end up 3 units from the center as 6 units from the center. However, there is twice as much space that is 6 units from the center than 3 units from the center.

MRD857y ago

I thought about this for a little bit, I might actually code up a demo. If you take a random number between the square of min and the square of max, then you make your polar r equal to square root of your random number.

1 more reply

MRD857y ago

Thanks, makes perfect sense and should have been obvious.

alanbernstein7y ago

That might work if you then apply a translation to the center point in pixel-space. But what if you need to send latitude and longitude to your front-end, or other display system? How do you translate your 2d polar coordinates to a change in spherical polar coordinates (lat+lon)? For small distances near the equator, it may not matter. Otherwise, it might.

The article could have helped answer your question by providing some images to illustrate the flaws with other solutions that the author tried.

maxander7y ago

As others have noted, this doesn’t result in a uniform distribution. But what you can do, instead, is generate a uniform distribution over a square centered on the point of interest, and simply re-do points that land outside of the circle. For most software engineers, this is probably simpler to understand and harder to screw up than implementing the Haversine formula. :)

MaxBarraclough7y ago

Which has the unfortunate property that it isn't assured to terminate. Although I believe that discard-and-reattempt strategy is used in real-life RNG libraries if you ask for a random integer in an awkward interval.

apcragg7y ago

That doesn't create a uniform distribution over a disk. Heres a quick image (https://imgur.com/a/xHEpVB9) I generated using that method.

chad_thunder7y ago

Buffer your point to create a polygon then use ST_PointOnSurface (or equivalent) in any competent spatially-enabled DBMS to create a point guaranteed to intersect the bounding polygon. Trivial in SQLite, PostGIS, etc.

platz7y ago

how is that generated point random in any way?

https://gis.stackexchange.com/questions/76498/how-is-st-poin...

Cyph0n7y ago

GIS buffer polygons were magical when I first stumbled upon them.

chad_thunder7y ago

Something wrong with them nowadays?

unterbahn7y ago

Leaflet.BlurredLocation https://github.com/publiclab/leaflet-blurred-location - A Leaflet-based HTML interface for selecting a "blurred" or low-resolution location, to preserve privacy.

H3 is similar and cool, but this builds a similar nested grid concept around latitude/longitude, which can map to existing coordinate systems and database storage.

Also note that population density affects how effective a given precision "blur" is for an area; we're also hoping to develop a web service or library to roughly return population density for any given region of the globe.

unterbahn7y ago

Thanks folks for the great links and articles related to the question - very helpful!

jimsmart7y ago

Converting lat/longs to Uber's H3 [1][2] system might prove useful here. Because H3 indexes are hierarchical, you can easily truncate them to obtain less precision. And then convert them back into lat/longs after truncation.

[1] https://github.com/uber/h3

[2] https://eng.uber.com/h3/

GhostVII7y ago

In practice, would it be easier to just randomly generate points within the bounding box of the radius, and then throw out any points that fall outside of the circle? Wouldn't work too well for the cases where you want points from some radius away, to some radius away, but would probably work well when you just want a random point inside of a given radius since only 1/4 of the points will have to be thrown out.

NamTaf7y ago

As far as I can tell, all the complexity comes about from the Haversine formula, which maps a circle to a spherical surface. If you were to treat that as a flat plane, I don't see why you couldn't just generate a radial angle and radial distance, then convert to an offset from your centre. That's trivial to perform.

rand(0,2*pi) for angle, rand(d1,d2) for distance, and then convert from polar coordinates to Cartesian.

Maybe I'm missing something fundamental but it seems more complex than need be, especially for small areas where the distortion of showing a sphere on a Mercator projection is minimal.

edit: Never mind, another comment answered why: This generates a random but not uniform (in area) distribution, because there's a larger circumference (and thus more swept area for a given radial angle) the larger the radius moves out.

MaxBarraclough7y ago

A couple of us (marcinzm got there first) stumbled onto what might be a much simpler solution, where we just counterbalance that bias. Perhaps we're missing something, but if so... well, then I missed it :-P

https://news.ycombinator.com/item?id=19193622

1 more reply

dang7y ago

This is a fine submission but can you please not put "Show HN" on blog posts? This is in the rules: https://news.ycombinator.com/showhn.html.

jordinlOP7y ago

Ah, sorry I missed that...

iknowordidthat7y ago

Over engineered much?

At those distances, you can treat the surface as a plane and calculate points on a circle. Sure, it won’t be accurate for large distances but that is not a requirement. Nor is accuracy for that matter since the whole point is to obfuscate the original point.

X6S1x6Okd1st7y ago

Seems like another option is to generate points inside a bounding square and throw them away if they are outside of the circle.

ummonk7y ago

Yeah, and it doesn't even need to be within a certain radius - just make it within a certain latitude / longitude.

marcinzm7y ago

I was thinking the same thing. Just generate two random numbers in polar coordinates and then map back to a cartesian offset. Only trickiness is taking the square root of one of the numbers to generate the radius rather than the random number itself.

MaxBarraclough7y ago

This is the solution I arrived at, too: explicitly balance out the bias, which seems to be as simple as just taking the square root. Surprised the whole thread isn't full of that idea.

Long rambling version:

The bias in the naive 'solution' is demonstrated in that if we consider a circle of radius 0.5, it has an area of 1/4 that of a circle with radius 1, not half. Which means that as we span our random parameter between 0 and 1, we 'spend too much time' in that small 'inner circle'.

(It's easiest to assume a circle of radius 1, centred about the origin. Nothing interesting arises from additional generality.)

So we don't want the displacement from the origin to rise linearly with the uniform-random parameter. Instead we want the area of our 'inner circle' to rise linearly. Which means that the displacement from the origin should be the square root of the random parameter. (The random parameter's interval will have to be adjusted appropriately of course.)

Our 'theta' has no bias trouble. Transforming from number (angle) into vector/coordinate form, is just routine trig. Nothing interesting to worry about there.

All that's left is to be careful that our theta span the interval [0, 2 * pi) (carefully excluding 2pi), whereas our area should span [0, pi * r^2] (inclusive, assuming that our 'circle' is meant to be an 'open ball' rather than a closed one).

Either we're missing something, or this Haversine business is a needlessly complex answer here.

I considered that another approach might be to define a spiral that 'winds through' all points in the circle, but I'm pretty sure that's not possible. If it were possible, it would give an injective space-filling curve, which is known not to exist. (If it were not injective, it would presumably be biased and so be a non-answer to our question.) Put another way, if your answer involves accepting just one random number, and not two, then your answer is necessarily wrong. (I think I'm getting all this right.) [0]

Another thought: the naive solution is biased despite being injective. Interesting. That seems to be the root of the failure of intuition that leads us to the naive (biased) 'solution'.

[0] https://en.wikipedia.org/w/index.php?title=Space-filling_cur...

1 more reply

chaoticmass7y ago

I wrote an app where we needed to get a list of properties within a radius of a given property, using lat/long coordinates. Given the search radius would be at most 5 miles, and didn't need to be 100% precise, I just treated the surface as a plane. A trivial solution to a trivial problem-- but it is neat to know the details of the formula for doing it precisely should I ever need it.

MertsA7y ago

>calculate points on a circle

This is a bad idea. If you don't randomly distribute points within the entire radius then you've just narrowed the search space for the real location from the entire area in the radius to just a circle around the point you give back.

j / k navigate · click thread line to collapse

52 comments

tzs7y ago

This raises a couple of interesting questions.

1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

Rather than a displaced marker, maybe divide the map into rectangular or hexagonal regions, and just highlight the region containing the property.

johnjreiser7y ago

> 1. How do you stop someone from finding the actual property location by querying that map a bunch of times and averaging the positions returned?

agwa7y ago

> 2. Are there any negative consequences for the people whose property a displaced marker ends up on?

That's definitely plausible: https://splinternews.com/how-an-internet-mapping-glitch-turn...

sjwright7y ago

To answer (1), if you seed the randomiser with a known static value—e.g. the co-ordinates, or a database record ID—then repeat queries will return the same mangled result.

Or an even simpler, more trivial solution would be to store mangled values next to the true values. Then all of your APIs and services can pick the appropriate value for the circumstance.

jordinlOP7y ago

We used a slightly modified version of the code in the article. Instead of using random numbers we used pseudo-random numbers so the positions didn't change across requests.

agwa7y ago

Is the pseudo-random number generator cryptographically secure?

1 more reply

femto1137y ago

https://www.washingtonpost.com/news/the-switch/wp/2016/01/26...

ademarre7y ago

https://gizmodo.com/how-cartographers-for-the-u-s-military-i...

https://splinternews.com/how-an-internet-mapping-glitch-turn...

In those cases there were negative consequences because low-precision markers were not displaced.

curiousgal7y ago

Amusingly, this is the first time that I've seen the Law of Large Numbers being treated as a negative thing.

usrusr7y ago

If for some reason the data needs to appear random, you have to store a fixed random offset/alternative per point so that there cannot be multiple observations.

unterbahn7y ago

Leaflet.BlurredLocation https://github.com/publiclab/leaflet-blurred-location

femto1137y ago

WrtCdEvrydy7y ago

If you used a fingerprinting library as the seed for random, it would be repeatable per user.

jordinlOP7y ago

ape47y ago

I imagine its for something like AirBnb where the site doesn't want the users to the location the property so they only way to book it is via the site.

jeromegv7y ago

2 more replies

mtmail7y ago

I regularly use this tool (nodejs) for a list of random coordinates inside a country. Or at least inside any country to avoid the oceans. https://github.com/tsamaya/random-points-generator

MRD857y ago

If people are interested I've included a link on how Tinder keeps your location private. It's really interesting.

https://robertheaton.com/2018/07/09/how-tinder-keeps-your-lo...

Misdicorl7y ago

MRD857y ago

1 more reply

MRD857y ago

Thanks, makes perfect sense and should have been obvious.

alanbernstein7y ago

The article could have helped answer your question by providing some images to illustrate the flaws with other solutions that the author tried.

maxander7y ago

MaxBarraclough7y ago

apcragg7y ago

That doesn't create a uniform distribution over a disk. Heres a quick image (https://imgur.com/a/xHEpVB9) I generated using that method.

chad_thunder7y ago

platz7y ago

how is that generated point random in any way?

https://gis.stackexchange.com/questions/76498/how-is-st-poin...

Cyph0n7y ago

GIS buffer polygons were magical when I first stumbled upon them.

chad_thunder7y ago

Something wrong with them nowadays?

unterbahn7y ago

Leaflet.BlurredLocation https://github.com/publiclab/leaflet-blurred-location - A Leaflet-based HTML interface for selecting a "blurred" or low-resolution location, to preserve privacy.

H3 is similar and cool, but this builds a similar nested grid concept around latitude/longitude, which can map to existing coordinate systems and database storage.

unterbahn7y ago

Thanks folks for the great links and articles related to the question - very helpful!

jimsmart7y ago

[1] https://github.com/uber/h3

[2] https://eng.uber.com/h3/

GhostVII7y ago

NamTaf7y ago

rand(0,2*pi) for angle, rand(d1,d2) for distance, and then convert from polar coordinates to Cartesian.

Maybe I'm missing something fundamental but it seems more complex than need be, especially for small areas where the distortion of showing a sphere on a Mercator projection is minimal.

MaxBarraclough7y ago

https://news.ycombinator.com/item?id=19193622

1 more reply

dang7y ago

This is a fine submission but can you please not put "Show HN" on blog posts? This is in the rules: https://news.ycombinator.com/showhn.html.

jordinlOP7y ago

Ah, sorry I missed that...

iknowordidthat7y ago

Over engineered much?

X6S1x6Okd1st7y ago

Seems like another option is to generate points inside a bounding square and throw them away if they are outside of the circle.

ummonk7y ago

Yeah, and it doesn't even need to be within a certain radius - just make it within a certain latitude / longitude.

marcinzm7y ago

MaxBarraclough7y ago

This is the solution I arrived at, too: explicitly balance out the bias, which seems to be as simple as just taking the square root. Surprised the whole thread isn't full of that idea.

Long rambling version:

(It's easiest to assume a circle of radius 1, centred about the origin. Nothing interesting arises from additional generality.)

Our 'theta' has no bias trouble. Transforming from number (angle) into vector/coordinate form, is just routine trig. Nothing interesting to worry about there.

Either we're missing something, or this Haversine business is a needlessly complex answer here.

Another thought: the naive solution is biased despite being injective. Interesting. That seems to be the root of the failure of intuition that leads us to the naive (biased) 'solution'.

[0] https://en.wikipedia.org/w/index.php?title=Space-filling_cur...

1 more reply

chaoticmass7y ago

MertsA7y ago

>calculate points on a circle

j / k navigate · click thread line to collapse