More

dbaupp · on Aug 25, 2024

Yes, it couldn't not block, it's obvious... but I've encountered non-trivial amounts of "magical thinking" around both async/await ("go-fast juice") and mmap ("go-fast juice") separately, so the intersection surely has a bunch of magical thinking too, where people haven't taken the time to properly think through what's going on.

Hence, my investigation to try to make the "obvious" conclusion obvious to more people.

(Author here)

dbaupp · on Aug 25, 2024

(Author here)

Thank you, you've expressed one of my goals with doing this sort of investigation/getting this data far better than I have. :)

It's something that's feels obvious once the dots are connected, but I was pretty sure many people wouldn't connect these dots automatically.

dbaupp · on Aug 25, 2024

Yeah, of course a synchronous call that might block the thread is blocking, I agree... but, if I didn't have the context of "we're in a comment thread about a blog post about mmap", I'm pretty sure I wouldn't flag `x[i]` on `x: &[u8]` (or any other access) as a synchronous call that is worth worrying about.

Hence the discussion of subtlety in https://huonw.github.io/blog/2024/08/async-hazard-mmap/#at-a...

It's obvious when pointed out, but I don't think it's obvious without the context of "there's an mmap nearby".

(Author here, this behaviour wasn't surprising to me, but felt subtle enough to be worth investigating, and get some concrete data.)

dbaupp · on Aug 25, 2024

(Author here.)

Good point. Do you think the difference will be observable? Will it be observable on an SSD (vs. HDD)?

Retr0id · on Aug 25, 2024

I think so, yes - only one way to find out, though!

dbaupp · on Aug 25, 2024

(Author here)

> The singled thread async "traditional IO" example is NOT single threaded

The threads backing the single-threaded IO are an implementation detail of fulfilling the `.read().await` calls. The key is that there's a single coordinator thread that's issuing all the work, with the user-space runtime multiplexing tasks on that thread. I thought the fact that the "start a request and come back when it is finished" behaviour happens to be implemented via user-space threads rather than kernel-level epoll (or similar) is unlikely to affect behaviour.

I considered scaling up the number of files and using a multi-threaded runtime, but I felt that'd make everything more complicated without fundamentally changing behaviour.

However, maybe my theory is incorrect, in which case someone else can do their own experiments to provide more concrete information.

(This is referenced in a footnote: https://huonw.github.io/blog/2024/08/async-hazard-mmap/#fn:t... )

> The issue is comparing 8 OS threads no async to 1 thread async is fundamentally not very useful as long as you didn't pin all threads to the same physical core

The point is not to benchmark async vs. non-async, but provide a general reference point for "mmap working well" for comparison. As you suggest you agree with the "minor" issue tag, I don't think the parallelism vs. concurrency distinction matters much here... but again, definitely happy to see some concrete data that suggests otherwise!

dathinab · on Aug 25, 2024

yes no of the issues are affecting the outcome of the blog about memory mapped files

> The threads backing the single-threaded IO are an implementation detail of fulfilling the `.read().await` calls.

IMHO it's not just an implementation detail, it's a very relevant design aspect for anything related to blocking and benchmarks. Through yes it doesn't matter to much for this blog.

dbaupp · on June 30, 2024

Little o is still an asymptotic statement: it doesn’t have to apply for small n. A definition of f(n) = o(g(n)) is something like

   lim (n -> infinity) f(n)/g(n) = 0

Or, in other words, for sufficiently large n, g grows faster than f.

For instance, this function is o(n), because 1e1000/n goes to 0 as n grows.

   f(n) = 10**n if n < 1000 else 1e1000

(Pseudo-Python for a piecewise function that grows exponentially to 10**1000 at n = 1000 and then remains constant after that.)

dbaupp · on June 22, 2024

Interesting idea.

It looks like powers of larger primes are currently rendered with the exponent ring overlapping the "main" rings, e.g. 7 and 49 look the identical, while 343 has an overlapping 2 and 3 knot.

aravindet · on June 22, 2024

True! Thanks for the bug report, I'll try to make the exponent rings larger.

dbaupp · on April 3, 2024

Yes, the data is the bytes 00, 01, …, FF repeating, and that pattern is highly visible with a power-of-2 encodings, but not visible with other bases (for similar reasons that 0.1 as a (binary) float doesn’t behave as people expect).

dbaupp · on April 2, 2024

The base 10 is referring to conversion of bytes into a long decimal (base 10) integer, not that it's being stored in chunks of 10 bits.

But yes, you're right, it would be reasonable to think of this as encoding the bytes in base 1000, where each "digit" just happens to be shown to humans as 3 digits.

dbaupp · on April 2, 2024

Good catch! I should've tested. I've added a paragraph to https://huonw.github.io/blog/2024/03/qr-base10-base64/#extre... about this.