I was thinking more of a behavioral model of the whole ALU, just so that the FPGA tools can map it onto a collection of the smaller ALUs built into each slice.
What clock speed does your latest design synthesize at?
There was already a design of CADR for FPGAs [1] that does synthesize (and boot), I don't know why amszmidt needed to start again from scratch or if his design is a modification of the earlier one.
A similar comment applies to lm-3. Maybe it is built on a fork of the previous repo, it is hard to tell.