It is quite advanced: "The essential prerequisites for reading this book are a rigorous course in probability theory (on Masters or Ph.D. level), an excellent command of undergraduate linear algebra, and general familiarity with basic notions about metric, normed and Hilbert spaces and linear operators. Knowledge of measure theory is not essential but would be helpful."
This material is quite challenging enough without measure theory. :) Moreover it doesn’t really help you. Most of the material on concentration inequalities, which are fundamental in this setting, doesn’t make any significant use of measure theory.
Sorry! I LOVE measure theory and find it much better and especially EASIER than traditional Riemann integration!!!
In simplest terms, do the partitioning on the Y-axis (the range of the function being integrated) instead of the X-axis (the domain). Right away that makes good sense because we KNOW the range is just the real numbers, and partitioning on the range frees up the domain to be quite general, with a sigma algebra, e.g., where do the sampling as in probability theory.
In one step further, get to f'get about the domain of the function being a compact set and the function being continuous and uniformly continuous -- just get to set all that aside. Also get a straight forward theory of the domain being, say, the whole real line without any extra work, without the Riemann approach where have to take a limit as the limits of the integral go to infinity. And, again compared with Riemann, the measure theory limit that defines the integral is simpler and makes more sense.
For an extra goodie, get a nice, unusually general proof of Leibniz's rule, that is, differentiation under the integral sign!
W. Rudin's Real and Complex Analysis has a really nice, succinct, clean (precisely reasoned) treatment early in that book.
Maybe what I have explained here can provide enough of an overview and some intuition to make the Rudin material quite easy to read.
Lebesgue measure is a lot of annoying terminology, but the end result are basically integrals that work like how you intuitively "want" them to work. Basically so long as you never have to evaluate one using the actual definition then you're gucci, just pretend like you're a physicist and think about it terms of things like point-masses.
The High Dimensional Probability textbook is one of my all time favorites. The elegant mix of probability, geometry, and linear algebra can generate some really non-intuitive insights. The intuitions developed are also pretty useful for reasoning about modeling in a lot of applications
Skimming through, this looks like an attempt at a foundation with a bunch of applications. Meanwhile, it looks to me that the applications can generally be explained without the fairly advanced foundation (e.g. stochastic block model, concentration inequalities, and so on).
So, the "with applications to Data Science" is right, and should not be confused with "a bunch of advanced maths that you need to know in order to understand these applications".
For the relationship between concentration of the measure and this: the main theorems of concentration generalise the CLT beyond the additive case, for any lipschitz operator over iid vars, the first i can even be relaxed.
You can find online, works by eg. Gromov and his students for a deeper understanding.
The author has compiled these notes into a textbook and you can download the draft of the book for free on this webpage itself. It is also available on Amazon as a physical hardcover.