Severe over engineering, unstable interfaces, massive boiler plate and huge development overhead is preventing the long tail "at the protocol level".
As example: Compare the Wayland "Hello World" [1] with X11 "Hello World" [2]. If you want to add the ability to take screenshots it gets exponentially worse. (Also the Wayland version is not even capable to render strings.)
1.: https://github.com/emersion/hello-wayland/blob/master/main.c
No, all those can be solved with client libraries. X11 had Xlib and the other client libraries. Those X11 libraries don't make sense in Wayland, but the client libraries that stayed relevant (freetype, cairo, etc) can still be used.
>If you want to add the ability to take screenshots it gets exponentially worse.
No, in both modern X11 or Wayland, you should use the same API for screenshots: the XDG screenshot portal.
>Also the Wayland version is not even capable to render strings.
Sure it is, but you have to use client-side rendering. Client-side rendering is also the norm in X11, since decades ago when Xft was released as another one of those client libraries. That X11 hello world is short because it's using obsolete APIs.
There is no easy way to open a Window and render a string. Period! You either need to write it yourself completely (as OP stated) or you can use Gtk/Qt or other heavy weight "client libraries" (cairo and freetype do not create Wayland windows and therefore are not applicable here).
Look at the code again! If you really think that the code at [1] is in any way a great solution as compared to [2] we are going to disagree.
> No, in both modern X11 or Wayland, you should use the same API for screenshots: the XDG screenshot portal.
ZERO screenshot apps on X11 use XDG screenshot portal, they all use XGetImage(). Mainly because the assumption that a dbus-daemon is always running everywhere is mostly false. Also XDG screenshot portal is simply not a good solution. It is cumbersome to use, contains tons of edge-cases and pulls a dbus dependency for something that could be solved much simpler with onboard OS-functionality without the need for extra daemons and weird binary protocols
> Client-side rendering is also the norm in X11, since decades ago when Xft was released
Besides the point, but you are still wrong. Xft does server-side rendering via XRender. The cache is rendered only once on the client but that's a technicality, spline tessellation was supposed to go into the server but Keith Packard had more important things to do at the time.