I find the whole thing where you configure your web server to serve the same thing from
http://example.com/application
and
http://example.com/application/with/path?and=parameters
to be absolutely nerve-wracking. Not hard to do but it's just batshit crazy and breaks the whole idea of how web crawlers are supposed to work. On the other hand, we had trouble with people (who we know want to crawl
us specifically) crawling a site where you visit
http://example.com/item/448828
and it loads an SPA which in turn fetches a well-structured JSON documents like
http://api.example.com/item/448827
http://api.example.com/item/448828
http://api.example.com/item/448829
with
no cache so it downloads megabytes of HTML, Javascript, Images and who knows what -- and if they want to deal with the content in a structured way and it put it in a database it's already in the exact format they want. But I guess it's easier to stand up a Rube Goldberg machine and write parsers when you could look at our site in the developers tools and figure out how it works in five minutes... and just load those JSON documents into a document database and be querying right out of the gate.