Nearby peer discovery without GPS using environmental fingerprints (opens in new tab)

(svendewaerhert.com)

78 pointswaerhert6mo ago23 comments

23 comments

Shameless plug: More than a decade ago, I wrote a paper [1] on how the random perturbations in the wireless channel between an ambient RF transmitter (FM radio, TV) to the two devices, allow nearby devices to authenticate locality because the perturbations are correlated only if they are nearby (where nearby is relative to the wavelength being monitored)

[1] https://dspace.mit.edu/handle/1721.1/121443

vessenes6mo ago

There are a couple of difficult cryptographic-type problems with this plan. (Which I like, by the way). I don’t think it’s privacy preserving, and I also don’t think it works well for finding shared locations as currently specified.

Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic.

On the one hand, a simple hash of ESSIDs near you, if you take all of them, is highly unlikely to ever match - radio strength varies and you’ll see stuff on the edges of your device pop in and out if you look at radio traces. So you need to limit the list.

However, if you limit the list, to say 3 ESSIDs, you’re well into rainbow table attack territory - if there are 100mm WiFi access points in the world, then you need 100mm^3 hashes - doable - and if you rough geoloc them first so that you’re not hashing out stuff more than a mile away from other APs, you’re down to “very manageable”.

At the same time, the question of “which 3” means that it’s going to be hard to ever get the same list, or at least you’re not in the one 9s territory of loc matching.

To do this without some sort of either trusted server or some sort of group key sharing (and therefore a totally different threat model) you’ll need to get some sort of location-aware hashing together, and I think also you’ll want to be able to get some sort of data from the local APs that’s not easily accessible elsewhere. Not sure what that is off the top of my head, but I bet there’s something in the WPS spec that you could hang off of.

So if you had the ability to be like “my hash puts me somewhere within this square (area) and only those of us here know that the secret salt for this minute is XXX” then I think you’d get back to the original goals of the project.

I bet that’s doable! Looking forward to v2

waerhertOP6mo ago

Thanks for the feedback! Rainbow tables probably won't work here since the epoch (rounded to 5 minutes) salts everything, so precomputed tables would expire constantly. You would use use BSSIDs + SSID in real implementations. The MinHash part creates 128 hash values from the entire observed set, not a subset of 3, then LSH divides these into bands where similar sets collide probabilistically based on Jaccard similarity. I've tested with two phones where one saw 3-4 networks and the other saw ~10. They still found each other, and you can try it in the interactive demo yourself. You're right it's not entirely privacy preserving, really depends on your threat model as I discuss in the security section. Your idea about combining area hints with time-based salts is interesting though, feels like it could bridge geospatial indexing with this approach! The whole geospatial indexing thing was something I became aware of late in the project and didn't go into further. It's probably a much simpler approach if geolocation is important. In the current approach, geolocation doesn't matter at all, only 'the environment', whatever that may be.

vessenes6mo ago

Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time.

I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location.

Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs?

Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future?

Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space.

Upshot - I think the salting needs to be something local / not predictable or stored remotely.

Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct.

waerhertOP6mo ago

I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here.

Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.

1 more reply

nullc6mo ago

> Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic. ... or some sort of group key sharing

Perhaps you might consider a pinsketch in the manner proposed for cryptographic biometric security. https://arxiv.org/abs/cs/0602007

I contributed to a fast implementation of the underlying algorithim: https://github.com/bitcoin-core/minisketch

With it two peers could compare their BSSID environments and learn ~nothing about each other unless they were nearly matching.

I can see how one could use it for location based key agreement for mutual authentication-- not as obvious to me how to apply it to privacy preserving location.

The latter would probably just best be accomplished by downloading the whole database, or (less optimally) using PIR to probe for the locations of single BSSIDs.

vessenes6mo ago

Oooh, this is very interesting and timely. Thank you, reading it now.

Out of curiosity, what's the motivation for Bitcoin-Core? Is it comparing mempool txs?

nullc6mo ago

Transaction relay.

Ideally nodes have lots of connections so that attackers can't so easy block transaction and block propagation and censor information. But lots of connections means lots of network bandwidth wasted relaying redundant information nodes already know about.

From day one bitcoin relayed transactions by offering just their hashes and only requesting what wasn't known. But even sending hashes ends up being a lot of data in total, and the bandwidth scales with peers*transactions. Using set reconciliation changes that to more like peers+transactions.

When using setrecon for authentication its important that the scheme is as close to information theoretically optimal as possible-- which makes approaches like pinsketch important.

For the transaction case faster but less communications efficient methods might be better except that latency is also a consideration for Bitcoin and minimizing latency means running reconciliation often. This stresses the inefficiencies of alternative tools as well as covers up for the quadratic decode cost of pinsketch.

I'm not aware of anyone using our minisketch library for authentication-ish uses but I'd be interested in seeing it.

figmert6mo ago

I've noticed many metro systems in cities do somehow receive GPS and cellular networks which I'm always impressed by. London on the other hand, has only recently started receiving signal for cellular networks, and rollout is slow.

GPS does not work at all. I've always thought using the WiFi access points that have been installed on the underground could be a great addition to something like Citymapper to figure out where you are located even when there is no GPS.

Sayrus6mo ago

Both Apple (WPS?) and Android (Location Accuracy) support improving location through WiFi access points and cellular network discovery. That's usually why you are able to get a lock onto your position even while underground.

ezfe6mo ago

Wi-Fi and Cellular is why your phone knows where it is, that already works.

vessenes6mo ago

I believe Google Maps on iOS can tell you where you are on the Underground. But I presume it’s not using GPS, as you say.

ahoka6mo ago

Phones already use WIFI for location.

thenthenthen6mo ago

Yep, you could actually fool phones into thinking they were somewhere else by just cloning a couple of ssids from another place (around 2014 iirc). I dont think this works anymore since phones now use multiple gnss bands etc. That said, my phone thinks its at my home station for the whole 20km’s ride underground, while 5g connectivity is excellent and i have wifi and gps turned on (edit, using Apple Maps).

arcbyte6mo ago

I never got very far with this idea beyond coming up with it, but whe thinking about unfakable calculable characteristics that could be used for locality verificacion, i could only come up with latency as the key characteristic. Based on some geographical threshold distance the speed of light gives a reasonable estimate of how much latency there ought to be when communicating with someone. So if you could issue a challenge and time the response, you should know someone is local to you or not.

tomgag6mo ago

Watch out, possibly similar to this patent: https://patentimages.storage.googleapis.com/e4/9b/4e/883a9df...

(disclaimer: I am co-inventor at a previous employer, I don't get royalties for it, just reporting)

waerhertOP6mo ago

Thanks for the heads up! Good to know what's out there. Interesting that I independently arrived at something possibly similar.

charcircuit6mo ago

>Good to know what's out there.

It opens you up to legal risk for knowingly infringing patents. If possible you never should look at a patent.

nullc6mo ago

That is somewhat outdated advise post seagate: https://www.stblaw.com/docs/default-source/cold-fusion-exist...

nanomonkey6mo ago

This could be useful for dynamic clique routing used in mesh networking in large crowds. Although I would think one would need to do the peer discover prior to any Internet shutdown events. It gets around the problem of GPS spoofing, and physical barriers that traditional GPS based groupings have.

meindnoch6mo ago

TL;DR: it's just using the WiFi networks' names to decide whether you're close or not.

I thought "environmental fingerprints" were referring to something more elaborate, like a fingerprint of the local audio environment, or using the accelerometer to measure the local spacetime curvature.

waerhertOP6mo ago

It's not limited to WiFi networks and indeed the fingerprints you suggest would be a lot better because they would change significantly over time. The whole WiFi networks idea kinda stuck because it's an obvious an easy thing to grab from the environment using a mobile phone. An audio fingerprint would definitely be doable on mobile.

j / k navigate · click thread line to collapse

23 comments

smath6mo ago

[1] https://dspace.mit.edu/handle/1721.1/121443

vessenes6mo ago

Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic.

At the same time, the question of “which 3” means that it’s going to be hard to ever get the same list, or at least you’re not in the one 9s territory of loc matching.

I bet that’s doable! Looking forward to v2

waerhertOP6mo ago

vessenes6mo ago

Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time.

Upshot - I think the salting needs to be something local / not predictable or stored remotely.

waerhertOP6mo ago

1 more reply

nullc6mo ago

> Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic. ... or some sort of group key sharing

Perhaps you might consider a pinsketch in the manner proposed for cryptographic biometric security. https://arxiv.org/abs/cs/0602007

I contributed to a fast implementation of the underlying algorithim: https://github.com/bitcoin-core/minisketch

With it two peers could compare their BSSID environments and learn ~nothing about each other unless they were nearly matching.

I can see how one could use it for location based key agreement for mutual authentication-- not as obvious to me how to apply it to privacy preserving location.

The latter would probably just best be accomplished by downloading the whole database, or (less optimally) using PIR to probe for the locations of single BSSIDs.

vessenes6mo ago

Oooh, this is very interesting and timely. Thank you, reading it now.

Out of curiosity, what's the motivation for Bitcoin-Core? Is it comparing mempool txs?

nullc6mo ago

Transaction relay.

When using setrecon for authentication its important that the scheme is as close to information theoretically optimal as possible-- which makes approaches like pinsketch important.

I'm not aware of anyone using our minisketch library for authentication-ish uses but I'd be interested in seeing it.

figmert6mo ago

Sayrus6mo ago

ezfe6mo ago

Wi-Fi and Cellular is why your phone knows where it is, that already works.

vessenes6mo ago

I believe Google Maps on iOS can tell you where you are on the Underground. But I presume it’s not using GPS, as you say.

ahoka6mo ago

Phones already use WIFI for location.

thenthenthen6mo ago

arcbyte6mo ago

tomgag6mo ago

Watch out, possibly similar to this patent: https://patentimages.storage.googleapis.com/e4/9b/4e/883a9df...

(disclaimer: I am co-inventor at a previous employer, I don't get royalties for it, just reporting)

waerhertOP6mo ago

Thanks for the heads up! Good to know what's out there. Interesting that I independently arrived at something possibly similar.

charcircuit6mo ago

>Good to know what's out there.

It opens you up to legal risk for knowingly infringing patents. If possible you never should look at a patent.

nullc6mo ago

That is somewhat outdated advise post seagate: https://www.stblaw.com/docs/default-source/cold-fusion-exist...

nanomonkey6mo ago

meindnoch6mo ago

TL;DR: it's just using the WiFi networks' names to decide whether you're close or not.

waerhertOP6mo ago

j / k navigate · click thread line to collapse