These days processors prefer to draw shapes by having a coarse grained pass that conservatively selects tiles of interest then brute-force evaluates each pixel in each tile independently of all others.
Instead of minimizing total work, the goal is to maximize pipelined parallelism while skipping over unnecessary work in large blocks.
1 instruction per cycle? What luxury bygone era did you grow up in?
Wikipedia tells me the algorithm is from 1962 on an IBM 1401 (https://en.wikipedia.org/wiki/Bresenham's_line_algorithm#His...
That definitely didn’t have many single cycle instructions. Skimming https://ibm-1401.info/A24-6447-0_1401_1460_Instruction_and_T..., I couldn’t find any.
Certainly, in the era of 8-bit CPUs like Z80 and 6502 programmers would have been lyric about “1 instruction per cycle”
Actually, did any CPU ever “execute 1 instruction per cycle without pipelining or prediction” (or, slightly looser “had fixed time instructions without pipelining or prediction”)?
RISC introduced fixed time instructions, but also pipelining.
but there are a lot of cpus out there—maybe the majority now—that are pipelined microarchitectures that get one instruction per cycle without much or any prediction. avrs, most of the cortex-m* (all?), most modern implementations of old slow isas like the z80 and 8051, etc. big processors like your laptop cpu and cellphone cpu are of course superscalar, but they are a tiny minority of all processors. even inside the cellphone case, they're outnumbered by in-order scalar microcontrollers
without prediction, of course, you have a pipeline bubble every time you have a branch, so you never quite hit 1 ipc with scalar execution. but it's usually pretty close, and even with prediction, sometimes you miss. and usually if you have branch prediction, you also have a cache, because ain't nobody got time to share one triflin memory bus between instructions and data
so pipelining gets you from, say, 3 clocks per instruction down to 1.2 or so. then prediction gets you from 1.2 down to, say, 1.02
So far as 1 instruction per cycle - Wikipedia says the 80286 (the top dog PC processor of the time) could execute 0.21 "typical" instructions per clock. Optimized code was up to 0.5 instructions per clock. And that agrees with my memories.
Today, I would try and use parallelism if I could. With lots of conditions though - is the process being applied to the image trivially parallelizable, will most/all of it fit in cache, etc. Trying to parallelize Bresenham's algorithms though would be futile - when drawing circles you can reflect it into the different quadrants (big savings) but the algorithm itself is going to be pretty serial because it has to keep up with the error coefficient as it draws.
We are talking about instructions which took 8-12 cycles to complete.
even with a register file, it isn't really inherent that you need to decode inputs, do alu operations, and write outputs in separate clock cycles; you can do all of that in combinational logic except writing the outputs, and you can even decode which register to write the output to. it just means your max clock rate is in the toilet
for that kind of thing a harvard architecture is pretty useful; it allows you to read an instruction in instruction memory at the same time you're reading or writing data in data memory, instead of in two separate cycles
This last is not quite true. The exact distance from the circle, call it G(x,y), is the corresponding difference of square roots, i.e.,
def G(x, y, r):
return math.sqrt(x * x + y * y) - math.sqrt(r * r)
and G(x,y) isn't just the square root of F(x,y), and indeed doesn't behave monotonically with respect to F(x,y).It's an interest property of Bresenham's algorithm, that I've never seen even stated let alone proved in the literature, that this doesn't matter, and the algorithm is indeed exact in the sense that it always chooses the next point based on which is truly closest to the circle... despite using an error function that is only an approximation.
For t in 0 to 2 pi in whatever small steps you want, draw_pixel(x=a+rcos t, y=b+rsin t) where a and b are the x and y coordinates you want for the centre of the circle and r is the radius.
The derivation of this form is pretty simple if you know basic trig. https://publish.obsidian.md/uncarved/3+Resources/Public/Unit...