What I never understood is why the modem plays this through its speakers instead of some other tone that lets us know that it is connecting. Any explanation?
That became a common practice before typical modems were able to identify call progress elements like dialtone, ring, busy, etc. Long after that changed, interoperability problems with the ad-hoc in band signaling conventions meant they were often disabled by default. Couple this with the fact that up through 9.6k or some negotiation problems were audibly diagnosable and the feature itself made sense up to a point. Inertia and industry dynamics probably account for why it persisted into the consumerization of the Internet - by this point the consumer market was dominated by low margin white box manufacturing.
Keep in mind that the very high rate of change in signaling standards (9.6 -> 14.4 -> 19.2 -> 28.8 -> 33.6 -> 56 in roughly ten years) seemed to essentially leave little energy for non-core engineering changes.
From what I've understood it's so you can identify if the line is, in fact, a modem. Easy to diagnose the issue when you hear "Mach Pizza, can I take your order?" coming through the speaker
Agreed. I've already witnessed Windows making a modem call the fire brigade emergency number (probably because it was confused about the country I was in). Woops.
The sound was useful. One of my modems in the 1990s would falter about 1 out of 8 times. When it faltered, it would never connect and never hang up. Without knowing or learning any of the technical information in the OP, I learned to distinguish the sound of a faltering attempt to connect from the sound of a normal one. If I were not able to hear the "tones"/sounds, the only way I would have been able to detect a faltering attempt is when the attempt lasted longer than successful attempts do. So, being able to hear the sound saved me time: I was able to detect a faltering attempt faster. (About 10 seconds faster. My response was to hang up and redial.)
In other words, having the modem duplicate the lines' "tones"/sounds over a speaker was a nice hack on the natural human ability to distinguish between different sets of complex patterns.
A similar hack: there is a blind programmer named Karl Dahlke who pipes the character stream being sent to his Linux console through his PC speaker. Even though it probably sounds like radio static or a cacophony to the untutored ear, he has been able to learn to distinguish certain patterns quickly without having to wait for his text-to-speech software to read him any of what is on the console.
ADDED. The "blinking lights" on the front panels of early computers and mini-computers is another example. I understand that computer operators learned to extract a lot of relevant information from the row of lights on the front of the computer that formed a binary representation of the contents of the program counter.
I understand that computer operators learned to extract a lot of relevant information from the row of lights on the front of the computer that formed a binary representation of the contents of the program counter.
Parenthetically, these "out of band" debugging cues still work. Try holding an AM radio next to your PC's motherboard. :)
This was occasionally useful for debugging before CPU clock frequencies started to look less like AM radio and more like microwave ovens...
My pet theory is that it's to convince you that your computer is doing something, instead of sitting there. One of my happiest moments growing up was finding out how to mute the damn thing - it annoyed everyone in my family :)
Cheaper and easier to just play back what's happening on the phone line (and it might help in debugging, if you're seriously competent), than somehow generating a "prettier" sound?
That feels like the obvious explanation to me ... Not sure if that means it must be wrong. :)
As modems got faster, the tones being played were pretty identifiable by a human. I could tell if my 56k connection was negotiating down to 28.8 or some other slower speed and preemptively disconnect and try again in order to get a better connection.
I seem to recall being able to turn the speaker noise off on at least one of my modems growing up. But I'd end up leaving it on, because you could follow the sounds like an auditory progress bar.