--- edit ---
After reading the paper more thoroughly, I find the way they implement differentiable logic clever. They use continuous relaxations of 16 logic operators, run them in parallel and apply a softmax to select the most useful operator. At inference time, everything is binarized.
If I ever get it to a point where I feel proud of the result, I'll write a blog post about it and submit it to HN.
Would this be amenable to "morphing between presets", or even manually combining a selection from one network into another network? Lots of things to try out here!
Logic gates implemented with non-ideal transistors have non-zero rise times. Therefore, they are smooth and differentiable.