TL;DR: always think of the upper left pixel's position as (0.5, 0.5).
The position of the center of the pixel. That pixel goes from (0,0) to (1,1) and therefore the center is (0.5,0.5).
The image area goes from the upper left corner of the upper left pixel to the lower right corner of the lower right pixel.
The centers are shifted by half of a pixel versus a grid starting at 0 and ending at N-1.
When drawing a line you need to hit the center of the pixel you want to fill.
I have spotted similar bugs too which become evident very quickly when working with smaller resolutions and likewise with a smaller number of grayscales.
> We choose a H×W rectangular grid of points, from which we will draw samples.
An additional thing to keep in mind is how a camera capturing the image would be operating. It's not sampling in the true theoretical sense of picking points from a continuous signal. The pixels are of finite size, which makes the grid (2) as closer to the reality. There are additional complications for color images where the red, green and blue channels are integrating over different regions within the pixel area (see [A] for example). This makes the real grid as different from even (2) for different color channels. However, the math suggested by the author should not change still.
> It seems the mess is unique in the deep learning world.
The title, "Where Are Pixels? -- a Deep Learning Perspective", looks unjustified. What's presented is not a deep learning perspective. It applies generically.