The stats can be misleading, nginx is very good at being a reverse proxy or software load balancer and tends to be put to use in those contexts with pass-thru to existing web servers.
Because the stats look at headers, the last header before hitting the internet will be the nginx caches.
How can they detect the programming language in use other than by looking at .php, .aspx, .jsp, etc? You won't see this on a most professionally-authored sites that use a router and RESTful URLs.
That's not what RESTful means. What your URLs look like has absolutely nothing to do with REST -- the whole point is that it treats URLs as opaque references to other similarly hypertextual resources.
I believe via HTTP headers. Sometimes (or maybe by default?) PHP installs will add something PHP-specific to the Server: line. I remember having to go in and disable that at one point...
I'd like to see sites that use mootools, Prototype, YUI, Dojo, etc. Those are some simple statistics to compile I'd think (filename based, or just simple regex of the first 200 characters).
Google said that the list excludes, "adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites".