You can't add the noise after quantization, it needs to be done at the same time as quantization, because otherwise you are losing the information needed to intelligently feather the edges.
Think of it this way. You start out with a series of numbers between 0 and 100. Your job is to represent this series as best as possible within a range of just 0 to 10. Without dither, you would just round each original number to its closest multiple of 10; all your 31's become 3 and all your 34's become 3. With dither, nearly all of your 31's become 3 and many of your 34's become 3 but nearly half of them become 4.
Without dither: 31, 31, 34, 34 becomes 3 3 3 3.
With dither: 31, 31, 34, 34 might become 3 3 3 4 on a typical run.
You absolutely cannot calculate 3 3 3 4 based on 3 3 3 3. You need the original full set of information in order to calculate 3 3 3 4.
Now add on the fact that it's not just random noise that makes this work. Neighboring pixels influence whether to round up or round down. You want that influence to come from original high-depth data, not already-rounded data.