What you are describing is a class of filters known as edge preserving filters. You can look at bilateral filters and guided filters for examples that have been around for decades at this point.
So we can do a decent job with hand designed filters... Why aren't they in use in the problem the parent describes? Are they not good enough to deal with small text boundaries?
A lot of hand built filters (I see a lot of these in the audio space) have many hand tuned parameters, which work well in certain circumstances, and less well in other circumstances. One of the big advantages of NN systems is the ability to adapt to context more dynamically. The NN filters can generally emulate the hand designed system, and pick out weightings appropriate to the example.
This is effectively noise reduction, which bilateral and guided filters are actually used for. They take the weights of their kernels based on local pixels and statistics. You can also look up other edge preserving filters like BM3D and non-local means.
I don't know what you mean by hand made filters and I don't know why that's a conclusion you jumped to.