This doesn't quite match Information Theory. First of all, if you were to use th...

This doesn't quite match Information Theory.

First of all, if you were to use the image as a communication channel, how much you could theoretically communicate is exactly the entropy of messages (by definition), and optimal communication means maximum entropy. Information theory already assumes shared existing knowledge in the form of codes; the codes essentially encode all this knowledge, which you could make an analogy in images to digits and shapes, etc. -- what makes them decodable is there is statistical redundancy, shapes do not occur arbitrarily (i.e. not every possible shape occurs, at least not with equal probability) and exhibit dependence between different pixels of a shape and even between shapes elsewhere in the image. Again the dependence is (generally) based on the statistics of the distribution of all possible images -- it essentially encodes all prior knowledge.

This redundancy allows reconstruction of losses in on part of the image from data elsewhere. It's the same principle used in error correcting codes, except codes are designed, while shapes are mostly natural (except things like alphabets, which are designed and indeed follow some principles of codes). But because they're not designed there's not guarantee of having a unique/reliable decoding (i.e. you can get a distribution).

I think that's a significant issue, because if your estimate doesn't match reality it could have important consequences for the use of the image: perhaps text goes from 'X is good' to 'X sucks'.

In this case a few things could be done:

1) Have some kind of watermark indicating the image was enhanced by a neural network, and possibly contains false information;

2) Have some kind of indication of reliability of the image: it should encode the multimodality/confidence of the decoding distribution -- how many different solutions does this have. If it is more or less unique, it would show as high confidence; otherwise it would show a low confidence indicator;

3) Instead of trying to convey uncertainty, the system could simply give up in cases where there is too much uncertainty, i.e. leaving the image dark. This could be done locally or globally, although locally it introduces a lightning consistency problem.

---

There's another important observation w.r.t. Information Theory/Statistics: it essentially assumes unbounded computational power (since this distribution could require analyzing arbitrarily large datasets). Of course this isn't true in reality. For example, the entropy of an encrypted of a redundant text is exactly the entropy of the plaintext string plus the entropy of the key (given an encryption ensamble or encryption prior) -- the process of (e.g. through brute force) finding the key doesn't concern statistics. However, with reasonable computational power, the (properly) encrypted stream is indistinguishable from a random string, hence it would have maximum entropy. So there are further computational limits beside statistical limits. In the case of encryption the function is again designed (to be not tractable), while in natural images the correlations are of simpler and hopefully more tractable nature (although I'm sure not always the case).