Anyone then could build a search index and build a good search experience.
To combat spam, instances should reveal up/downvotes to indicate quality, I guess your fake math video would not get much love from the community.
Please take extra care to correctly parse what I actually wrote in response to the gp. Yes, speed-of-light is still a limitation based on the gp's constraint of "search/discovery in a _distributed_ way" which means the search algorithm avoids central servers and loops through a bunch of remote p2p nodes to parse a bunch of exposed JSON manifest files.
If instead, the search algorithm loops through data in a cached index server, that's no longer "search in a distributed way" that the gp was originally wondering about. That's the particular point I was responding to.
>Anyone then could build a search index and build a good search experience.
Now, as to the issue with that "cache index server" that pre-parses the JSON files...
The cache server that also contains the actual video data will naturally attract the most users because when they hit the "play" button on their smartphone, the video starts immediately instead of waiting or suffering stuttering from somebody's flakey home video server.
So, the index server with the "good experience" as perceived by users will be the one that also includes the actual videos -- basically acts as a CDN -- and this emergent behavior of user preferences defeats the decentralized ideals of p2p video.
We see that p2p of things like illegal software already works and is proven. However, p2p of mainstream videos has massive technical hurdles that oppose how typical users like to discover content and play them with immediate gratification.
So DNS isn't distributed because my computer caches queries?
I think this is arguing semantics rather than practicalities. Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself. What we care about is the ability to aggregate search results from multiple places, to bypass search if we have a specific video URL that's being shared, and to build our own search engines without running into copyright problems.
If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?
> So, the index server with the "good experience" as perceived by users will be the one that also includes the actual videos -- basically acts as a CDN -- and this emergent behavior of user preferences defeats the decentralized ideals of p2p video.
My reading of this argument is I might as well just host my blog on Medium, because Google search is just another point of centralization. And after all, for speed reasons users will prefer to use a search engine that hosts both the blog and the search results -- so eventually Google search is definitely going to lose to Medium anyway.
But of course Medium isn't going to unseat Google, because in the real world speed improvements are relative, and at a certain point users stop caring, or at least other concerns like range of accessible content and network effects begin to matter a lot more.
It's both I would argue. Distributed systems professor here. My lab has been working on a "academically pure" distributed Youtube for 14 years and 7 months now. That means no central servers, no web portals, and no discovery website. Pure Peer-to-Peer and lawyer-proof hopefully. Distributing everything usually means developer productivity drops by roughly 95%. Plus half of our master-level students are not capable of significantly contributing. Decentralised==hard. This is something the "Distributed Apps" generation is re-discovering after the Napter-age Devs got kids/s
> All there needs to be done is to expose a static, daily generated JSON file that contains all videos on the instance.
Or simply make it real-time gossip. Disclaimer; promoting our work here. We implemented a semantic clustered overlay back in 2014 for decentralised video search, that could make it just as fast as Google Servers[1]. This year we finished implementing a real-time channel feed of Magnet links protocol + deployment to our users. Our 51k concurrent users ensure that we can simply re-seed a new Bittorrent hash with 1 million hashes, then everybody updates. Complete research portfolio, including our decentralised trust function [2].
> does anyone actually care if it's technically decentralized?
That is an interesting question. Our goal is real Internet freedom. In our case, logically decentralisation is a hard requirement. Our users often don't care. Caching servers quickly introduce brittleness into your architecture and legal issues.
[1]https://www.usenix.org/system/files/conference/foci14/foci14... [2]https://github.com/Tribler/tribler/wiki#current-items-under-...
Again, I'm not talking about a technical engineering component. I'm talking about users aggregate behaviors. Please see my other reply of how we seem to be talking at different abstraction levels.
>Centralization isn't binary -- it's a continuum, and we care about it because of the benefits it provides, not because we think it's an end in and of itself.
Right, but that's not what I'm arguing. I'm talking about centralization as a emergent phenomenon that bypasses the ideals decentralized protocols that the protocol's designers didn't intend.
>If all of those goals can be accomplished with a caching server, then does anyone actually care if it's technically decentralized?
I guess I don't understand the premise then because if that were true, why would the adjective "distributed" even be mentioned in the question "search/discovery in a _distributed_ way?" To me, something about distributed/decentralized as a characteristic in the technical implementation is very important to the person asking the question.
EDIT: here's another example of that type of "search without central indexing server" question: https://news.ycombinator.com/item?id=20282397
That's all an end-user cares about.
Indexing videos once a day (or once an hour or whatever) would be very feasible. Indeed, different servers could create their own indexes, and some might be better at sorting for relevance than others.
I imagined gp (mikece) as a HN techie (not an oblivious end-user) and thought he was wondering about how to use programming technology to avoid central servers ... and therefore, me interpreting "search/discovery in a distributed way" in a very literal manner was the appropriate level of abstraction to mikece. Avoiding central servers (if possible) is an interesting goal to discuss because they have a tendency to attract disproportionate users which defeats the goals of decentralization.
>Indexing videos once a day (or once an hour or whatever) would be very feasible.
And here, you're interpreting what's feasible only at the level of the technical stack instead of considering several chess moves ahead to emergent group behaviors which renders the metadata-only type of index a solution as not end-user friendly.
>, and some might be better at sorting for relevance than others.
And that's the server that would end up becoming a defacto "centralized" server that people were trying to avoid. This is especially true if that superior server also includes the video data.
Consider that the http protocol itself is already decentralized. If that's true why do people perceive Youtube and Facebook as centralized when they're only nodes on a http network? Because decentralized protocols don't stop emergent group behavior towards centralization.