Keeping a "stack" of crosswords, each one with one word added, iterating through possible words, and when nothing matches dropping "stack frame" and retrying with another word on previous level, would allow to generate much denser crosswords.
It made it possible to produce incredible dense crosswords, but it wasn’t very efficient, as it would retry certain impossible local solutions again and again and again. I wonder if it would be more or less efficient to do one whole word at a time, as you suggest.
I was very confused because I had implemented a bad word filter which I knew worked (it was the most interesting part of building a word search). And obviously, I didn't hard code that string in my program!
I asked to see the offending puzzle. Upon inspection, I found that the word search contained the word "ORIGINAL". When reversed for the word search it is "LANIGIRO" (which contains the substring "NIGIRO"). The start of another word - "LAW" - was placed next to "LANIGIRO" (by random chance) creating the string "LLANIGIRO". The two preceding randomly chosen letters were "K" and "I", forming the string "KILLANIGIRO".
I explained it to the superintendent, and all was well again.
And now that you mention it, the bad word filter does sound like fun to build. I've got a bunch of ideas for how to do it best floating around in my head.