So we can do a decent job with hand designed filters... Why aren't they in use in the problem the parent describes? Are they not good enough to deal with small text boundaries?
A lot of hand built filters (I see a lot of these in the audio space) have many hand tuned parameters, which work well in certain circumstances, and less well in other circumstances. One of the big advantages of NN systems is the ability to adapt to context more dynamically. The NN filters can generally emulate the hand designed system, and pick out weightings appropriate to the example.