Currently working on an accurate model of the MIT CADR in VHDL, and merging the various System source trees into one that should work for Lambda, and CADR.
Maybe try replacing the ALU with one written directly in Verilog, I suspect this will run a lot faster than building it up from 74181+74182 components.
The current state is _very_ fast in simulation to the point where it is uninteresting (there are other things to figure out) to write something as a behavioral model of the '181/'182.
~100 microcode instructions takes about 0.1 seconds to run.