undefined | Better HN

0 pointsPaulHoule7mo ago0 comments

I find the whole thing where you configure your web server to serve the same thing from

  http://example.com/application

and

  http://example.com/application/with/path?and=parameters

to be absolutely nerve-wracking. Not hard to do but it's just batshit crazy and breaks the whole idea of how web crawlers are supposed to work. On the other hand, we had trouble with people (who we know want to crawl us specifically) crawling a site where you visit

   http://example.com/item/448828

and it loads an SPA which in turn fetches a well-structured JSON documents like

   http://api.example.com/item/448827
   http://api.example.com/item/448828
   http://api.example.com/item/448829

with no cache so it downloads megabytes of HTML, Javascript, Images and who knows what -- and if they want to deal with the content in a structured way and it put it in a database it's already in the exact format they want. But I guess it's easier to stand up a Rube Goldberg machine and write parsers when you could look at our site in the developers tools and figure out how it works in five minutes... and just load those JSON documents into a document database and be querying right out of the gate.

0 comments

eadmund7mo ago

What I would want is to GET http://example.com/item/448828 with an Accept header of ‘application/s-expression,application/json;q=0.1’ instead of retrieving the HTML representation of the resource. HTTP is the API.

I also want http://example.com/application/with/path?and=parameters and http://example.com/application to return Link headers with rel=canonical appropriately.

I’d also like world peace.

lyu072827mo ago

it felt like this was an opportunity for AI craze to adopt on top of the existing standards, instead they all invented their own stuff with llm.txt and MCP *sigh*

j / k navigate · click thread line to collapse