No, that shows only that the dataset is comprised of common problem patterns. The paper explicitly investigates whether memorization has overly impacted performance. From page 10 [1]:
> A central question in interpreting Minerva’s solutions is whether performance reflects genuine analytic
capability or instead rote memorization. This is especially relevant as there has been much prior work
indicating that language models often memorize some fraction of their training data ... In order to evaluate the degree to which our models solve problems by recalling information memorized from
training data, we conduct three analyses on the MATH dataset ... Overall,
we find little evidence that the model’s performance can be attributed to memorization.
in Appendix j.2. they say that accuracy degraded after modification, figure 11 shows that accuracy degraded in 15 out of 20 examples after large modification.
> A central question in interpreting Minerva’s solutions is whether performance reflects genuine analytic capability or instead rote memorization. This is especially relevant as there has been much prior work indicating that language models often memorize some fraction of their training data ... In order to evaluate the degree to which our models solve problems by recalling information memorized from training data, we conduct three analyses on the MATH dataset ... Overall, we find little evidence that the model’s performance can be attributed to memorization.
[1] https://arxiv.org/pdf/2206.14858.pdf