While in practice this is true, a lot of work is being done to try to figure out...

aborsy · on Nov 7, 2020

It depends on the function that you are trying to approximate.

I can give you a function that a shallow nnet would approximate better and functions that deep nets approximate exponentially better even with one more layer (in terms of number of neurons n). In the limit n->\infty, both reach arbitrary small errors (obviously often with different number of parameters).

falcor84 · on Nov 7, 2020

Is there a particular theorem that you're invoking here?