Explanation of 44.1 kHz CD sampling rate

bartekko · on May 23, 2014

This has already been posted to HN before, but its the best explanation of digital audio I have seen: http://xiph.org/video/vid2.shtml

masklinn · on May 23, 2014

http://xiph.org/video/ is probably a better link, the first episode has plenty of interesting and important stuff.

smackfu · on May 23, 2014

Also on Youtube: https://www.youtube.com/watch?v=Ny7krNFAD1s&feature=kp

bartekko · on May 23, 2014

Good point!

jhallenworld · on May 23, 2014

This confuses me because the vertical rate is not 60 Hz, it's 3579545 Hz / (525/2 * 455/2) = 59.94 Hz. In other words it's odd that they would have chosen to be compatible with black and white instead of color.. instead of 44.1 KHz, it would be 44.056 KHz.

Edit:

Well it turns out that 44.056 KHz was used for the "EIAJ digital-audio-on-videotape standard"

http://recordingtheworld.infopop.cc/eve/forums/a/tpc/f/22260...

paulgerhardt · on May 23, 2014

Keen.

Sony was originally proposing 44.056 kHz (NTSC - popular in Japan) with 16 bits while Philips was pushing for 44.1 kHz (PAL - popular in Europe) with 14 bits. The two reconciled their differences at the 4th Red Book meeting in 1980[1]. Sony was further ahead in developing the CD players but Philips supposedly was in the lead when it came to making the CD's[2]. Sony insisted on 16 bit vs while Philips was pushing for 14 bit. As a compromise they may have gone with the 44.1 kHz Philips was proposing and the 16 bit Sony was proposing because it would be easier to remember. Posts [1] and [2] are in direct conflict with each other on this point. There was further tension over what size disc to use [2].

The CD was one meeting away from launching another format war in the spirit of VHS vs Betamax or Blu-ray vs HD-DVD

Of course 44.056 kHz products did make it into the field for professional audio engineers. Anecdotally this made for some trouble: http://www.realhd-audio.com/?p=2197

[1] http://www.exp-math.uni-essen.de/~immink/pdf/beethoven.htm [2] http://www.whathifi.com/blog/the-cd-is-30-years-old-today

cnvogel · on May 23, 2014

Quoting your first link [http://www.realhd-audio.com/?p=2197]...

❝Of course, lots of CDs were released with the original 44.056 kHz rate simple reclocked at 44.1 kHz. This resulted in a very slight speed increase AND a pitch shift of less than a quartertone.❞

While technically true (less than a quartertone), it's much, much, much less than a quartertone. It's 1.7cent (1.7 percent of a half-tone, or 1/30th of a quartertone). If you accidentally mix up 48kHz and 44.1, it's a much more noticable 1.5 half-tones. I doubt that this slight detuning is so blatantly obvious that it "feaks out" even a very well trained and very hot tempered classical violinist.

If you want to check the math: There are 12 semitones in an octave. One octave doubles the frequency, so the frequency difference between two adjacent semitones (e.g. from any key on your piano to the adjacent white or black key) is the twelfth root of two: ~1.0595. When tuning your guitar, your tuner might display the deviation from the true tone in cents, that's 1% of the interval between two halftones, or (python) math.pow(2,1.0/12.0e2) -> 1.0005777895065548.

Frequency ration between 44.1 and 44.056 kHz is 1.0009987.

grapeshot · on May 24, 2014

Here's what happens when you do mix up 44.1 khz and 48 khz. Van Halen - Jump (Greensboro): http://youtu.be/Mjx_GjyXCs4

wazoox · on May 24, 2014

Quite a lot of early CD players were actually 14 bits. Many early CDs were probably mastered with 14 bits in mind too, as they have a noticeably low average mix level.

beagle3 · on May 24, 2014

That makes no sense at all. A 14-bit player would play the top 14-bit.

Playing the bottom 14-bits would make anything with nonzero top two bits sound mostly like noise.

cjensen · on May 23, 2014

The original American TV standard and TV recorders were 60Hz. When color was introduced, the frequency was shifted to 59.94 to avoid interference between the color signal and the sound signal.

(Color was encoded as a high-frequency sine-wave on top of the black-and-white signal which is mostly invisible on a black-and-white set (which allows for backwards compatibility). The phase of the fuzz indicates hue, and the amplitude indicates color intensity. This is why in the old system, if someone wore a shirt with vertical stripes on TV, viewers would see a rainbow of color over the shirt.)

jbit · on May 23, 2014

I believe all NTSC equipment is required to support Black&White System M signals, which are exactly 60Hz[1]. It probably made their equipment much simpler to forget about colour encoding entirely. (And it made the 44,100Hz fit too)

[1] http://en.wikipedia.org/wiki/CCIR_System_M

ar7hur · on May 23, 2014

Also, the fact that 44,100 can be factored as 2^2 * 3^2 * 5^2 * 7^2 makes it very efficient to do Fast Fourrier Transforms on moving windows. Which we do a lot in Speech Recognition and DSP in general.

rockdoe · on May 23, 2014

What you say actually makes no sense whatsoever. What is forcing you to consider 1 second windows?

Also, 48kHz is 2^7 * 3 * 5^3 so it actually avoids the larger prime factor of 7. Not that anyone cares: just use an actual power of two as your window size to begin with!

ar7hur · on May 23, 2014

What you say actually makes no sense whatsoever. Why are you talking about 1 second windows? If 44100 can be decomposed easily in small prime numbers, it means that you can split 44100 into windows that are themselves products of small prime numbers.

colechristensen · on May 23, 2014

The length of a second is arbitrary and irrelevant to audio processing. The fact that there are 44100 samples in an arbitrary length of time is meaningless. You would only care about the prime factorization of 44100 if the second was some special amount of time, which it is not.

recursive · on May 23, 2014

Why are you splitting 44100 samples into windows? That's the number of samples in one second. Where did you get one second from? You could have started from any duration.

ar7hur · on May 23, 2014

I see what you mean, but since the 1970s speech researchers have been splitting seconds into 10ms or 25ms intervals, that's not my decision. They could have started from any duration but they haven't. And you could start to express time with the Aztec calendar if you wanted. But you won't.

woodson · on May 24, 2014

It's not (entirely) arbitrary. Speech is variable, our vocal tract shape changes all the time, but not arbitrarily fast. From a source-filter theory perspective it does make sense to consider speech production over short time frames as linear time-invariant systems.

What this has to do with 44.1 kHz is beyond me, however..

nullc · on May 23, 2014

Actually 44.1kHz is unhelpful for that... a lot of infrastructure exists that expects things to work on 10 and 20ms intervals (and multiples there-of), e.g. the normal windows audio apis work in 10ms chunks.

10ms of 44.1kHz is 441 samples, which is odd and thus a not really fantastic size for easily implemented factorizations.

Of course, if you can pick your own analysis interval you can pick some nice power of two or other very smooth number. But thats true regardless of the sampling rate.

ar7hur · on May 23, 2014

Almost all ASR engines use 25ms windows, not 10ms (they typically overlap and are processed every 10ms).

dbecker · on May 23, 2014

25ms of 44.1kHz isn't even an integer. So the original argument is even less compelling with 25ms windows than it was with 10ms windows.

ArkyBeagle · on May 23, 2014

10-20 msec is the neighborhood of the Haas Limit. That's why, sfaik.

jcr · on May 23, 2014

Yep. But understanding Fourier Transforms takes some effort. Luckily, HN user 'bsilvereagle' posted this earlier today:

"Linear Systems and Optimization | The Fourier Transform and its Applications"

https://news.ycombinator.com/item?id=7789767

http://see.stanford.edu/see/courseInfo.aspx?coll=84d174c2-d7...

abbeyj · on May 24, 2014

https://www.youtube.com/watch?v=GAE-c7aksxI shows what audio recorded this way looks like when you play it back as video.

I'm only guessing but it looks like 3 stereo samples per line plus some extra data (ECC?) on the right. Its quite interesting to see how the pattern changes when the music fades out (around 3:50, 9:20, and 13:25).

I wonder what kind of sound you could extract given only the low-quality YouTube video as a source.

csense · on May 23, 2014

It seems strange to me that they would be re-purposing video tape for digital audio, given that digital tape technology was used in computers since UNIVAC in the 1950's [1].

[1] http://en.wikipedia.org/wiki/Tape_storage

Theodores · on May 23, 2014

The key innovation is 'helical scan':

http://en.wikipedia.org/wiki/Helical_scan

This came along for video applications. Up until then the idea of tapes for audio or data was to put as many linear tracks on them as possible. With video some new thinking was needed, hence the helical scan. As it turned out helical scan was the future for data and audio storage too.

cjensen · on May 23, 2014

Quad recording [1] preceded helical. In Quad, the mag stripes are across the tape almost perpendicular to the tape's movement. The head spun at an insane speed -- 14,400 RPM in open air. Helical was originally invented as a lower-cost recorder. Once helical gained the ability to do slow-mo and still-frame, Quad died.

https://en.wikipedia.org/wiki/Quadruplex_videotape

chiph · on May 23, 2014

I used to use VHS set on 6-hour EP recording speed along with dbx noise reduction (compression/expansion, really). I used dbx because the bandwidth & sound quality was so bad on the slower speeds -- until the HiFi audio technology was introduced later on.

I'd get the play time of a reel-to-reel in a device that I already owned, and so no additional space was needed on my shelf.

ArkyBeagle · on May 23, 2014

Video tape was vastly less expensive than tape drives used for nine track data storage and all. You could actually buy a decent Beta or VHS machine for 1) less than $2k for a while and 2) less than a kilobuck later ( before DVD drive it down ) .

guard-of-terra · on May 23, 2014

Turns out you can store data on VHS too:

https://en.wikipedia.org/wiki/ArVid

th0ma5 · on May 23, 2014

Tape is tape, and VHS was a popular platform. I do remember people wanting "higher grade" Super VHS for some digital audio recording platforms, but for the most part you could even use regular VHS in a pinch.

cardiffspaceman · on May 24, 2014

In the Wikipedia it states that the CD's capacity target was to hold Beethoven's Ninth Symphony on one CD. There was a tale that this and the desire to fit a CD in a standard Japanese car stereo form factor determined the sampling rate. Submitted for your amusement.

__david__ · on May 23, 2014

Related: Digital Cinema uses a 48Khz sample rate which gives exactly 2000 samples per frame, per channel. That makes it very easy to sync the audio to the film.

lcrs · on May 23, 2014

At 25fps and 48Khz you rather neatly have 1920 samples per frame, which is coincidentally the width of an HD picture. At least I think it's a kind of a coincidence... I believe the number 1920 was derived from the 720 horizontal pixels of Rec. 709, which doubled give a 1440 pixel 4:3 picture and hence a 1920 pixel 16:9 picture.

The 48Khz rate used by everything apart from CDs also came from video standards according to http://www.tvtechnology.com/opinions/0087/digital-audio-samp...

Here's some more about the number-theory used in digital video, from 1990 during the standardisation of HD: http://www.poynton.com/papers/SMPTE_90_Magic_Numbers.html

linuxhansl · on May 24, 2014

It's also convenient that highest frequencies a human ear can detect is about 20khz, so the sampling rate needs to be at least 40khz (see "Nyquist Rate")

asimpletune · on May 23, 2014

How does this make it to the top page?

noonespecial · on May 23, 2014

I can't say for anyone else, but for me, I was an AV nut in the old days. I remember pulling apart an early Sony CD player in '89 or '90 and screwing each part down on a piece of wood so I could reach everything and looking at every single trace with my "new" old Phillips oscilloscope trying to understand the magic.

These stories are like finally getting to see the cards from a game long over. Pure gold to me. Upvoted.

anyfoo · on May 23, 2014

This is exactly what I want to read on Hacker News. Some people come here for the posts about startups/founder culture (I don't, but I fully accept that it's part of the site), some because of the highly technical posts. It's probably beneficial in some way that Hacker News is a mixture of both.