This is a little awkward. The Touhou album in question is already in the Touhou Lossless Music Collection (at least the last release: http://www.tlmc.eu/2018/01/tlmc-v19.html since it's from 2005, it's probably been in most of them), and has track 3 ("The End of Theocratic Era" by "弘世"). I just checked the TTA and track 3 sounds fine.
A decent amount of the albums in the TLMC were ripped and uploaded to Japanese P2P programs, a large amount of these rips were ripped "losslessly" with EAC, but they had the "normalize" option turned on in the settings, which adjusts the volume of the rip if it's above or below a certain threshold, basically ruining all the work EAC put into getting a bit-exact copy. This is likely why the rip in the TLMC doesn't match OPs rip.
Edit: If anyone is more curious about this, a large amount of the rips in the TLMC were ripped by one uploader (or a group of people using a singular alias) on the Japanese P2P programs Share[0] and Perfect Dark[1].
He was so prolific in the scene that he has his own Baidu wiki page[2] (albeit, it's not as strict as Wikipedia with notability, I still find it impressive).
All of his rips, as far as I know, were affected by this "normalize" setting, so even though he ripped hundreds (thousands?) of albums, a large majority of them probably wouldn't be considered "archive quality".
Normalizing alone with no compression/limiting involved shouldn't ruin fidelity at all. Actually in some (admittedly extreme) situations it could enhance it. A track that is very low from start to end could leave one or more most significant bits unused, therefore reducing bit depth, so especially during mastering I would welcome normalization if it brought the highest peak to 0 dB or slightly less.
Of course I mean just normalizing without any compression/limiting, ie the dynamics curve stays the same: I'm well aware of the loudness wars plague.
I believe you're correct that raising the volume is not necessarily a destructive process, since it would just be increasing the amplitude of the samples already there.
Unfortunately, the majority of rips fall over EAC's normalize threshold, which means the volume is decreased, which ends up truncating some of the bits (assuming the album was using the full range of the 16bit audio).
Edit: I noticed you used the term "fidelity", so you might be talking about being able to perceive the damage to the audio, which is not necessarily what I'm talking about.
The point of EAC (Exact Audio Copy) is to make archival quality copies of audio CDs, this means as close to bit-exact as possible. Modifying the audio in any way would be counterproductive to that purpose, even if it doesn't necessarily affect the actual sound quality.
The CUE data in TLMC is a real bummer. I don't know if it's my music player, Clementine, but a good portion of the tracks I listened to in TLMC didn't advance correctly and instead just kept playing with the metadata of the first song in the album that I listen to. At this point I've mostly just given up on it... It took me 3 months to download it too, haha (curse you, Comcast 1TB monthly data limit!)
Yes: you can use Audacity (https://www.audacityteam.org/) to "diff" two waveforms by subtracting one from the other.
For comparing .wav files you can also treat them as plain binary data, and use standard diff tools. For example I used `cmp -l` to count differences between rips, and `vbindiff` to view them.
Thanks for mentioning vbindiff. I examine/compare binary files a lot (reverse engineering for security) and I've used a bunch of tools, but somehow I've never come across vbindiff.
First thought: Invert the phase of one of the tracks 180 degress, then sum them together. Anywhere the is a bump in the waveform would show a difference between the two sources.
Funny, you can use this technique sometimes to pull clean vocals (for remixing) from certain kinds of music (they have to be panned center in an otherwise stereo mix):
a) reverse one of the channels and them sum them and you'll sum will contain only stereo data
b) reverse and sum (a) with the original track to get only the mono content
Finding the signal difference is easy, the nature of PCM means we can just subtract one recording from another.
However human hearing doesn't just import the signal, the initial processing is done during acquisition itself. Psychoacoustic lossy audio encodings like MP3 rely on this, an MP3 sounds to you very much like the original but a naive audiodiff would find huge differences.
As a fellow ripper of physical CDs in 2018, the point is that a correct rip of CDDA from the CD is the one and only way to be sure you won't ever have to rip that music again.
Any other file, any other delivery/download mechanism, any other format ... you'll be fooling around with that song again sometime. But not if you have the correct, error free rip of the WAV/PCM from the CD.
So it's really not about the sound vs. some other format ...
> you'll be fooling around with that song again sometime
I've been ripping in AAC 256kbps for 15 years now and have yet to see any reason to revisit any of those CDs. Even if a more efficient encoding mechanism gains mainstream support, I'll just use that for new rips only since storage is cheap enough.
It's not that cheap though. My laptops 1 TB SSD can comfortably store 60 GB of AAC files and I forget they even exist, but if those were uncompressed, the 300 GB they take up would be a major problem.
I'm aware - my collection is nearing 1 TB of lossless music at this point.
However I store it on a local NAS as well as a an external SSD (in case I need it in a portable way, which rarely happens) - I opted for an expensive SSD because it's very light, small and portable, but a traditional external drive would handle music just fine if price is a major concern.
At the end everyone has different requirements, but since I consider ripping a huge amount of music a major task I wouldn't want to repeat, I'd make sure I do it right from the beginning - and opting for a lossy format doesn't fit into that idea.
also being lossless you are futureproofed on new formats you might need to convert to.
I mantain both: a tree of lossless files and an exact copy already encoded as 320kbit mp3 that is automatically mantained (just because i can quickly cram those into a usb thumbdrive and they'll play anywhere -- i can't say the same about flac support).
How do you _automatically_ maintain a shadow-library? I always argued against the parallel-collection-approach as you'll likely run into inconsistencies while trying to maintain it manually.
I'm glad iTunes offers to transcode my lossless collection to lossy AAC on the fly before syncing to the iPhone (yes, some people still do that).
Of course my concern is storage in mobile devices. Yours seems to be flac support? However is that really an issue these days? I use ALAC (due to using iOS devices), but even then I never run into compatibility issues. Any worthwhile software/hardware supports either format - with Apple being the big outlier requiring ALAC.
> also being lossless you are futureproofed on new formats you might need to convert to.
Part of my original point was that I've been using AAC exclusively for 15 years now, and with all the content encoded in AAC now I don't see support for it disappearing in the next 15 either.
As another fellow ripper of physical CDs - is there a good way to check for quality of my rips besides looking at the bitrate and listening to all of them (talking mp3s here, of course)?
I'm not interested in perfect lossless quality, but I'm quite sure I should re-rip the CDs I did 10-15 years ago, but for VBR mp3s it's kinda hard to decide what is 'good enough', as compared to say 128kb/s rips - so I'm looking for a scriptable way to give me a rough "keep/maybe/definitely throw away" metric...
Rephrasing the question: What are people typically using these days when not going for FLAC? Some of my mp3s are definitely from the "HDD space is expensive" age 15+ years ago...
MP3, with LAME (-q0). A couple of the really old rips used average bit rate instead. Will probably re-rip those at some point, but the rest sounds fine to my ears.
The article says "there were about 20,000 bytes different betwen each rip", which if it happened in one stretch corresponds to ~113ms of audio. Depending on the exact nature of the error and how it changes the samples, it may be very obvious (e.g. a 113ms burst of silence or static) or inaudible (one different bit in a few hundred thousand --- probably won't even get past the lowpass filtering on the DAC.)
That 20,000 bytes might not be a real difference, as I understand OP. It sounds like it's the XOR or Hamming distance, byte or bitwise - in which case you could get very large differences in the number by very small errors like dropping 1 bit (thereby putting all following bits out of alignment) yet the perceptual difference would be unhearable (and maybe not exist at all depending on how the encoding works). eg 0101010 vs 101010. You'd want to calculate some sort of edit distance which takes into account the possible bit flips and burst errors to see how big the real delta is. And of course, as pointed out in the other comments, for all of OP's hard work, there's no way to know which source is right or how many errors remain in his final version.
This matters for long-term music archival when you want to sample it and play with the audio, which could exagerrate the defects as you edit it. We should preserve pristine source material.
Are you aware of CUETools, CUERipper and the ctdb for ripping and repairing? Just wondering if it would have helped your case. It sure helped me recovering some damaged old cds.
Available at http://cue.tools/wiki/Main_Page
Not the author but a very happy customer
If the TLMC is lossless and gapless, then you could paste the last few seconds of 2, all of track 3, and the first few seconds of track 4, then search for an offset that yields a minimum diff.
He has no idea if he has an exact copy of the actual track. He has a copy of the data that is consistently ripped by his drives. There is nothing to assure that these are the same. Sounding fine -- for some values of fine -- is the only way to tell if it's correct -- for some values of correct.
Sounding fine says nothing, it's rather that not sounding fine tells you something, namely that ripping failed. Having some of the least significant bits of a sample flip for example is inaudible and will sound fine.
Sure, op has no absolute guarantee, but the fact that his crc32 matches that of the AccurateRip DB is, apart from his statistical approach, another strong indicator that he actually got it right. The alternative is that either the other user who submitted the crc32 to the database coincidentally ended up with exactly the same read errors, or that it's a hash collision (not entirely unlikely given the size of the checksum).
He didn't get a match against the AccurateRip database:
> That single AccurateRip entry for this album matched my CRCs for all tracks except track #3 – they had 0x84B9DD1A, vs my result of 0xA595BC09. I suspect that original ripper didn’t realize their disk was bad.
I've enjoyed the Touhou music fandom for a good many years now. I don't play the games, I just like the music - it's fascinating how big and diverse it is while still being unified around a canon of Zun themes/melodies.
It reminds me of traditional Japanese court poetry, where half the esthetic work is done by knowledge of allusions and borrowings and the poet is trying to express a familiar theme in a slightly better way rather than seeking for novelty like most forms of literature.
A very large lossless (.ogg) collection of doujin [0] Touhou [1] music. Googling the name should have taken you to the website [2] which might provide a bit more information. It's split into three torrents: One for just the music (1.65TB in size), one for Album Scans (76.5GB in size), and another for extras like lyrics, bundled item scans, and other misc. extras (24.2GB in size).
Or in simple terms: A large collection of cover songs for the soundtracks of a popular Japanese bullet-hell game series.
Thanks for the correction. I converted everything to .flac to maintain consistency with my personal library, so I must have forgotten which other lossless format I had converted from.
If anyone is terribly curious what it sounds like, I've put up an OGG copy here: https://www.dropbox.com/s/u88u1xpdmdxbqal/03%20%E5%BC%98%E4%...
Oh well. I'm sure it was a great learning experience anyway. :)