Beyond the hashing algorithm, some important additions that were previously proposals without widespread use (e.g. merkle tree for hashing pieces) are becoming required. The focus has mostly been on optimizing latency for the P2P protocol and making sane improvements to the file spec. I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.
Ideally, BitTorrent would be broken down into separate specifications that could be used together or in separate systems: one for the file format and piece representation for sharing files, one for the P2P protocol, and one for discovery (trackers, DHTs). I want to believe that there would be far more interesting P2P projects if you could just lift robust primitives from BitTorrent.
> I feel like trackers were largely overlooked in this update, but I'm biased because I work on a popular tracker.
Yes, we did not pay much attention to trackers, but BEP52 basically seized the opportunity to do some incompatible changes we always wanted to do anyway (quite a few accumulated over the years), and there were no such open issues with the http tracker protocol.
This is because the HTTP protocol is so much overhead that most trackers don't even really run it anymore. I think UDP being promoted to the spec would've been a step in the right direction. Modern trackers have a bunch of tricks like BEP34[0] to avoid getting pounded that would be great if every client conformed to.
I hope I'm not coming off as aggressive. I really appreciate this work and I'm really glad to see a spec revision. It's just as you said, there's been many years and many good improvements that I'd like to see made while there's still a change to break things.
Yeah, if I remember correctly bittorrent DHT ultimately just maps 20 byte hashes to peer-lists (IP + port pairs). It's obviously designed to be convenient for bittorrent swarm discovery, but nothing about it limits it to bittorrent usage. Indeed, I'm surprised it's not more widely exploited for p2p bootstrapping.
Did you guys talk with the IPFS team? Do both of you have a desire to start bringing both families of protocols and technologies closer together?
I feel in this age we must make de-fragmentation of efforts our topmost priority.
In particular, I note that there's nothing in there regarding which infohash should be used in the tracker updates. Should traffic with v1/v2 clients be reported separately, or should it be consolidated under the v2 infohash?
What are the security implications of doing this? It seems it wouldn't increase the strength beyond the original 160 bits, no? Was there anything preventing redesigning the protocol to use full 32-byte SHA256 hashes throughout?
80-bit of collision resistance is usually the number accepted for legacy cryptosystems or for lightweight crypto. It's not great but it's not "too bad".
By removing 96 bits from the state you also prevent length extension attacks (which SHA-256 is vulnerable to, see [1]). Or rather, provide 96-bit of security against them. Which should be enough.
This is better than using SHA-1 because SHA-1 has "efficient" chosen-prefix algorithms to find collisions while SHA-2 currently does not.
Now if it were me I would have chosen a hash function like KangarooTwelve which is faster, provides parallelization for large inputs, allows you to customize the output length and has received a substantial amount of cryptanalysis.
[1]: https://cryptologie.net/article/417/how-did-length-extension...
The hash only gets truncated when used in places as unique identifier. When you start with a v2 magnet link or torrent file you get the full 32bytes hash, which means your integrity-checking is unaffected.
Moving bittorrent away from it's present image could be achieved by making p2p useful beyond bluray rips
But I just used it a lot when running bigger downloads like install discs for Linux distros, OpenOffice etc., and it made a difference when there was some major release and half of the plain old http mirrors were painfully slow or down entirely. Admittedly, that situation got a lot better compared to 10 years ago, but still I'm delighted by how natural it felt to use, since it seamlessly integrated with the browser's download manager. And you didn't have this "uh, I need to start an external program for this" kind of reluctant thought when you saw a website offered download via torrent. Today I just wonder if BT would have evolved differently if all browsers would have included a client.
Developers will code it into their download pages, decentralized systems like a p2p wikipedia will be possible and always accessible by anyone with a browser.
here you go https://en.wikipedia.org/wiki/Twister_%28software%29
I wonder why not SHA512? It's actually faster to compute than SHA512 on 64-bit architectures.
It is an arms race that is not won by updating a slowly evolving core protocol.
2) SHA1 is replaced with SHA2-256 (2x longer hashes and not broken).
3) Files are represented by a tree structure instead of a list of dictionaries with paths-- this reduces duplication in deeply-nested hierarchies.
4) Backwards compatible-- you can make a .torrent file with both old and new pieces, and a swarm can speak either. This requires padding files from BEP47, which most clients probably don't support.
Per-file metadata increases pretty significantly, from ~19B (just length) to ~68B (length + hash).
The .torrent file only stores the merkle tree's root hash for each file, and the torrent client will query it's peers to get the rest of the merkle tree (verifiable against the root hash). The leafs of the merkle tree are the hash of each 16kb block.
Interesting consequences of this:
Piece size isn't baked into the file anymore (and I've seen torrents with 16mb blocks), the client can dynamically chose it's verification piece size by requesting only so many layers of the merkle tree. Or it could skip requesting the tree and verify the whole file at once.
Merkle tree roots will be globally unique. You can scan torrent files for duplicated files and download common files from multiple swarms.
Piece size is still baked into the file (as piece length), and is used for presence bitsets, which are a crucial part of the swarm algorithm. Clients download the rarest pieces first to boost efficiency, and this information is handled as bitsets shared between clients indicating "I have chunk {1, 2, 3, ... 50, 52, ... }".
Merkle tree roots will only be unique for each piece length. Piece length should still correlate with total size, to prevent huge bitsets-- a 16KB piece length on a 64GB torrent would have a 4 million item / 500KB bitset (!), so it could take 500KB of RAM per connected peer to maintain state-- or maybe compressed bitsets make this problem irrelevant in practice?
This is one of the biggest things I feel is missing from the current protocol and I'm very glad it's in v2 draft. Now when a group of related torrents are repacked into a single torrent all the swarms are complementary instead of competitive. You don't have to choose between seeding the big pack instead of the individual files, just do what you want and the whole swarm still benefits.
To clarify, this works by the client deterministically reconstructing the tree once they have the whole file, then checking the root's hash, correct?
if torrent A and B both contain the exact same file, but torrent A only has the first half available, and torrent B has the second half available, could I combine both torrents to download that file? this could help fix old dead torrents or at least make the file searchable elsewhere by it's sha256 for example
But can't you already download one file? I suppose if a chunk spans two files, you may get a few extra KB of another file you don't want, but it's not noticeable from a user perspective.
I never really thought about the details of how it works, or the really really impressive feats that were accomplished to get it to work. I knew it was a really good technology, but reading this and the comments here puts it on a whole other level.
Why isn't this technology talked about more? Why are blockchains the big "thing" right now with people trying to use them everywhere to see where they fit best, but torrent networks are kind of just... ignored?
The decentralized nature of it seems to open so many possibilities at first glance, is there a reason they aren't being taken advantage of? Is there some kind of "great filter" kind of thing that is preventing widespread usage of something like a torrent network?
Similarly I heard that Skype used to do something similar, I'm not sure exactly how it worked and apparently it was a pain to maintain so I think it has been scraped as well by now. I think some software updaters do use Bittorrent still, though.
If I were to guess, the really big reason for the lack of interest from big corporations is that collecting as much data as possible for use in machine learning is very much in vogue, while at the same time bandwidth seems to be very much a no-issue. Thus there is not much to gain and possible something to lose from employing bittorrent.
Streaming wants us to download A, B, C, D just in time.
Bottorrent (simplified) wants me to download piece P, you to download piece G, then I get P from you and you get G from me.
There are Bottorrent streaming apps but they kind of mess with the nature of BT.
OTOH things like RPM/Deb/WindowsUpdate etc it would make great sense.
The BitTorrent DHT is great for storing and exchanging metadata, but a DHT is not something most people associate with BitTorrent (Bitcoin also has uses a DHT (for client discovery), as do countless other services).
Blockchain technology on the other hand offers verifiable distributed timestamping (with ok-ish resolution). That has much wider applicability than just payment tracking (which is essentially all bitcoin does), which is why there's plenty of people exploring what's possible.
In this case, I was trying to use it to ask if there is some kind of "unsolved problem", inherent limitation or issue/problem with torrent networks that prevents their widespread usage.
combining BEPs 46 and 50 enables rapid updates of torrents, but they are fairly new and there are no implementation designed with low latency in mind. Most bittorrent implementations focus on large amounts of data and throughput, so this use-case is not well served in practice even though the protocol could support it now.
On the other hand, the an uncensorable imageboard would profit from the verifiable timestamping of a blockchain, with just the images distributed via a bittorrent-like mechanism. That also gives you a decent anti-spam mechanism (you can post in exchange for mining blocks, similar to the original idea of hash-cash)
Discussion of other changes: https://github.com/bittorrent/bittorrent.org/pull/59
Users hated it for general use, even when downloading big files. 1) They didn't like having to install/run some special software to download a file. 2) They didn't like the effects of uploading to others and it slowing down the connections.
Consumer networks are asymmetric having far more download capacity in upload capacity. This makes sense since 1) most users download and want to use the available bandwidth for faster downloads, and 2) it prevents commercial applications on consumer circuits. This is far from ideal for applications like BitTorrent.
I'm not saying there isn't an application for this technology, I'm saying all the good applications don't want to ask the users to pay for distribution to other users. Thus it's relegated to mostly piracy, open source, etc.
Bittorrent Inc. has been trying to commercialize this for a decade now, I just don't see it happening. If there was anyone who could commercialize it, it was Travis Kalnik, and while he exited for 20m, he was very lucky, (and happy) to get out of that market.
It already is though.
Merkle trees allow torrents to start faster from magnet links since only the tree roots need to be front-loaded while the tree can be fetched incrementally.
Is it considered the spiritual successor to the original uTorrent?
Now it's full of ads and performs poorly.
[1] Like all the different Linux distro install images over and over again.