Kademlia: A Peer-To-peer Information System Based on the XOR Metric (2002) [pdf] (opens in new tab)

(pdos.csail.mit.edu)

191 pointsliamzebedee7y ago40 comments

40 comments

I'm familiar to this paper because I was using eMule. eMule has a direct citation to this paper in its "help" window, otherwise, the decentralization of eD2k would not be possible. The protocol is old, but still impressive.

The year 2000 is special, within the same year, Gnutella, eDonkey and Freenet were all released its initial version, as if the P2P network "emerged" from the Internet. By 2004, eMule, I2P and BitTorrent (and BTW, Tor) were all here. At the very beginning, the first generation of P2P, Gnutella was suffering from its scalability problem - its simple unstructured, flooding protocol needs a number of O(n) queries, and becomes unstable once the users exceeded millions.

Then, Kademlia almost saved P2P. BitTorrent, I2P, eD2k and many other P2P protocols are all Kad-based. For example, in eD2k, Kad allowed it to have P2P resource discovery, and also an efficient decentralized keyboard search engine (although there is spam, but mostly works, and in BitTorrent we still don't have it), all with only a O(log(n)) query, The P2P mechanism of I2P is also exclusively based on Kad.

The authors of this paper are the few people who have made great contributions to the Internet.

nostrademons7y ago

I love how people are rediscovering the previous generation of P2P software from 15-20 years ago. Reading through these papers was a huge part of my college procrastination; maybe now that cryptocurrencies give a way to provide economic incentives to those who contribute computing power, we might see a renaissance of some of these older ideas.

jamesblonde7y ago

Interestingly, the original Kademlia paper did not make it into any of the big p2p conferences at the time. It was published at some 2nd tier p2p workshop. It was only when the system gained actual adoption, that people referenced the paper.

In contrast, Chord, which was published 1 year earlier in 2001 is one of the most cited papers in computer science. However, Chord was not practical. It assumed the availability of bi-directional connections between any two hosts. However, NATs had just started to appear, which broke Chord. Kademlia, which does not require peers to maintain a well-defined routing structure (the ring in Chord) can muddle through with NATs - because the routing table is a huge messy tree and if any branches lead you to the destination, it will kind of work.

hvidgaard7y ago

Being the engine behind trackerless BitTorrent gave it quite some exposure too. I do not know if it was before or after it became known in academia.

jamesblonde7y ago

My point was that it became the engine behind trackerless BT because it worked in the presence of NATs - even if other architectures, like Chord, had superior properties (if you assume an unrealistic model of the open internet).

The other point of note is that Kademlia was not designed to work in the presence of NATs. That was a feature whose value was only appreciated when ISPs ran out of open IP addresses and NAT'd up whole networks. It shows the value of exploratory, basic research.

pjkundert7y ago

It is a key part of the implementation of Holochain (https://holo.host).

It turns out that global consensus is not required to maintain accounting balance in a global-scale ledger. DHTs and their validation of the data they host is a big part of maintaining the invariant of the system.

awgneo7y ago

I have been excited about this project for quite some time. They recently released their Rust client, although without networking. I cannot wait to play with it further as it holds the most promise amongst all "Web3" decentalized platforms.

gst7y ago

Kademlia is still in widespread use today. For example Bittorrent uses it to resolve magnet links.

geoah7y ago

IPFS is using kademlia, and iirc zeronet are still working on it. I don't think there currently are many alternatives actually being used. Haven't seen many projects using chord or pastry these days.

ghayes7y ago

Ethereum also uses Kademlia for its DevP2P mechanism. We recently implemented this in Exthereum/Mana.

sliken7y ago

Quite handy for any program to find a copy of itself running anywhere on the internet. It's the best solution I've found for peer discovery that doesn't depend on a central host.

There's a few implementations for most platforms. At least Java, Javascript, C++, C, and Go anyways.

voltagex_7y ago

Doesn't Kad require an initial bootstrap node? How do you find it?

2 more replies

squarefoot7y ago

I still miss the first Napster and later the OpenNap network with its OSS clients such as Lopster. Way flawed and not scalable at all compared to BitTorrent, but it allowed 1 to 1 connections between people, so that one day I could browse a random guy shared hard drive because he had some interesting music, see a band named "Spock's Beard" and being a trekkie myself, albeit a moderate one, I simply had to hear what they sounded like, and after an eternity due to then slow connections, thanks to other bands discovered the same way, I realized that progressive rock was (and still is) even more alive today than in the 70s, although obviously not mainstream. So a huge thankyou to the old flawed and slow Napster as well.

eeZah7Ux7y ago

Kademlia is also an interesting option for a small and resilient key-value store between a handful of hosts.

miguelrochefort7y ago

DHT: Distributed Hash Table

eeZah7Ux7y ago

...and?

1 more reply

jMyles7y ago

Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the DevP2P project that many decentralized projects are using at the moment.

At NuCypher, we found that, all the way to the core of Kademlia, there were some "inter-planetary" assumptions that, while both mindblowing and fit for many use-cases, didn't fit our "intra-planetary" distributed notion of network topology.

If Kademlia interests you, you might also be interested in this overview of what we did and why:

https://blog.nucypher.com/why-were-rolling-our-own-intra-pla...

mohitmun7y ago

Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-encryption to build decentralised KYC system. Amazing work you guys are doing!

jondubois7y ago

A lot of cryptocurrency projects are using Kademlia but it's prone to vulnerabilities such as eclipse attacks. Because peer connections are made according to deterministic rules (e.g. often based on unique Node IDs), the topology of the network can be analyzed and then the information can be exploited to disrupt the network.

There are solutions which involve making it hard to generate IDs (I've seen it done with node IDs generated using hash-based proof of work) but nowadays with all the cheap ASIC miners, you could compute these hashes much faster than any regular computer could; this means that node IDs need to be generated using top-of-the-range ASICs in order to make the network truly secure; this adds a lot of complexity when it comes to deploying a node. I don't believe that any of the existing Kademlia-based cryptocurrencies currently require sufficiently complex node IDs.

tombert7y ago

I wrote a video-sharing thing during a hackathon based on Kademlia. It's one of my favorite algorithms and I found the paper to be easier to read than most papers based on algorithms. Here's a video of a talk I gave on it:

https://www.youtube.com/watch?v=byIFT8OzfX0

mohitmun7y ago

Recently I figured Kademlia is how bittorrent achieves trackerless torrents. I used bittorrent-dht[0][1] to build simple decentralised ride hailing app(was able implement POC with end to end booking). I wonder what other modern problems can be solved by this technology

[0]: https://github.com/webtorrent/bittorrent-dht

[1]: http://www.bittorrent.org/beps/bep_0005.html

woadwarrior017y ago

Over a decade ago, I was a part of a small team at a startup that built a p2p messenger called cSpace[1], I fondly remember reading the paper and implementing a Kademlia DHT for it. IIRC, the bittorrent DHT was the only open source Python implementation of it that we could find, it was using Twisted for networking and quite complicated, we decided to keep things simple and rewrite it ourselves. The resulting implementation[2] is quite simple and elegant, IMO.

[1]: https://github.com/jmcvetta/cspace (The only copy of it, I could find online) [2]: https://github.com/jmcvetta/cspace/tree/master/cspace/dht

loeg7y ago

(2007), if not earlier, FWIW. I used this protocol in a college final project ("make a botnet") years ago :-).

quickthrower27y ago

2002 according to https://en.wikipedia.org/wiki/Kademlia - and this is a nice article for the layman (who still understands basic CS) to understand it.

I'm a bit dubious about where the name "Kademlia" came from :-)

loeg7y ago

> Where does the word “kademlia” come from? What I know is—it is a Turkish word for a “lucky man” and, more importantly, is the name of a mountain peak in Bulgaria.

http://www.maymounkov.org/kademlia

cf7y ago

Interestingly, Kademlia ended up being a major component in lots of botnets for a time like https://en.m.wikipedia.org/wiki/Storm_botnet

miguelrochefort7y ago

Has any progress in the P2P a been made since Kademlia?

I've been searching for a while, for a DHT (actually a distributed graph database) that self optimizes based on usage. Basically, I want data that's frequently accessed together to get close to eachother and to nodes that frequently access it.

xj97y ago

in addition to a DHT, you need another layer that can map key names to their hashes. ideally this is also something you can store in DHT (using ipns / mutable keys). once you have a (distributed) key-value store, you can build pretty much any kind of database[1]. this is known to work well for hadoop-style workloads[2] and i'd expect you could get similar performance for a p2p graph db.

[1]: https://apple.github.io/foundationdb/layer-concept.html

[2]: https://news.ycombinator.com/item?id=15543596

new123457y ago

I don't understand inspite of having working p2p protocol why we don't yet have an good p2p alternative to dropbox that just works without any setup hassles. Closest working thing I know of is BitSync but that too requires an mediator server.

yholio7y ago

Because Dropbox is a file storage service, not a file distribution system. The typical use case is that an uploaded large file will be downloaded a single time by someone else, at different times when computers are online.

This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.

Since files must be private and there is no commonality like in BitTorrent, it follows that running a P2P Dropbox would consume 5-10 times the storage and bandwidth of it's centralized counterpart for the same useful service.

amitport7y ago

IMO. The p2p business model is problematic in this case.

1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth

1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)

2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap

2* Given investment, what are the assets of the investors?

e12e7y ago

Ipfs doesn't take much setup, but it's global sharing - so you'll have to bring your own access control (eg: encrypt the files, and use à key distribution/sharing scheme of some sort).

jonathanstrange7y ago

I'm out of the loop for a few years now, but isn't NAT hole punching still one of the biggest problems? AFAIK, there is still no reliable solution that works without an accessible 3rd party server.

dmourati7y ago

Attended a "Papers we Love" presentation on this paper last year.

j / k navigate · click thread line to collapse

40 comments

bcaa7f3a8bbc7y ago

The authors of this paper are the few people who have made great contributions to the Internet.

nostrademons7y ago

jamesblonde7y ago

hvidgaard7y ago

Being the engine behind trackerless BitTorrent gave it quite some exposure too. I do not know if it was before or after it became known in academia.

jamesblonde7y ago

pjkundert7y ago

It is a key part of the implementation of Holochain (https://holo.host).

awgneo7y ago

gst7y ago

Kademlia is still in widespread use today. For example Bittorrent uses it to resolve magnet links.

geoah7y ago

IPFS is using kademlia, and iirc zeronet are still working on it. I don't think there currently are many alternatives actually being used. Haven't seen many projects using chord or pastry these days.

ghayes7y ago

Ethereum also uses Kademlia for its DevP2P mechanism. We recently implemented this in Exthereum/Mana.

sliken7y ago

Quite handy for any program to find a copy of itself running anywhere on the internet. It's the best solution I've found for peer discovery that doesn't depend on a central host.

There's a few implementations for most platforms. At least Java, Javascript, C++, C, and Go anyways.

voltagex_7y ago

Doesn't Kad require an initial bootstrap node? How do you find it?

2 more replies

squarefoot7y ago

eeZah7Ux7y ago

Kademlia is also an interesting option for a small and resilient key-value store between a handful of hosts.

miguelrochefort7y ago

DHT: Distributed Hash Table

eeZah7Ux7y ago

...and?

1 more reply

jMyles7y ago

Kademlia is awesome and is, of course, the basis of libp2p-kad-dht and the DevP2P project that many decentralized projects are using at the moment.

If Kademlia interests you, you might also be interested in this overview of what we did and why:

https://blog.nucypher.com/why-were-rolling-our-own-intra-pla...

mohitmun7y ago

Huge fan of Nucypher. Recently At one hackathon, I used your proxy re-encryption to build decentralised KYC system. Amazing work you guys are doing!

jondubois7y ago

tombert7y ago

https://www.youtube.com/watch?v=byIFT8OzfX0

mohitmun7y ago

[0]: https://github.com/webtorrent/bittorrent-dht

[1]: http://www.bittorrent.org/beps/bep_0005.html

woadwarrior017y ago

[1]: https://github.com/jmcvetta/cspace (The only copy of it, I could find online) [2]: https://github.com/jmcvetta/cspace/tree/master/cspace/dht

loeg7y ago

(2007), if not earlier, FWIW. I used this protocol in a college final project ("make a botnet") years ago :-).

quickthrower27y ago

2002 according to https://en.wikipedia.org/wiki/Kademlia - and this is a nice article for the layman (who still understands basic CS) to understand it.

I'm a bit dubious about where the name "Kademlia" came from :-)

loeg7y ago

> Where does the word “kademlia” come from? What I know is—it is a Turkish word for a “lucky man” and, more importantly, is the name of a mountain peak in Bulgaria.

http://www.maymounkov.org/kademlia

cf7y ago

Interestingly, Kademlia ended up being a major component in lots of botnets for a time like https://en.m.wikipedia.org/wiki/Storm_botnet

miguelrochefort7y ago

Has any progress in the P2P a been made since Kademlia?

xj97y ago

[1]: https://apple.github.io/foundationdb/layer-concept.html

[2]: https://news.ycombinator.com/item?id=15543596

new123457y ago

yholio7y ago

This requires an intermediary system with very high availability and performance, which in P2P can only be implemented by duplicating data on a dozen or so systems.

amitport7y ago

IMO. The p2p business model is problematic in this case.

1 People need to incentives to join a p2p network and certainly to contribute storage+bandwidth

1* Reliably and privacy issues makes this a bigger problem (though p2p may also be the solution, it's not easily achievable)

2 This can't easily happen without investment. Marketing + setup without "hassles" is not cheap

2* Given investment, what are the assets of the investors?

e12e7y ago

Ipfs doesn't take much setup, but it's global sharing - so you'll have to bring your own access control (eg: encrypt the files, and use à key distribution/sharing scheme of some sort).

jonathanstrange7y ago

I'm out of the loop for a few years now, but isn't NAT hole punching still one of the biggest problems? AFAIK, there is still no reliable solution that works without an accessible 3rd party server.

dmourati7y ago

Attended a "Papers we Love" presentation on this paper last year.

j / k navigate · click thread line to collapse