This is a little awkward. The Touhou album in question is already in the Touhou ...

Hakkin · on July 31, 2018

A decent amount of the albums in the TLMC were ripped and uploaded to Japanese P2P programs, a large amount of these rips were ripped "losslessly" with EAC, but they had the "normalize" option turned on in the settings, which adjusts the volume of the rip if it's above or below a certain threshold, basically ruining all the work EAC put into getting a bit-exact copy. This is likely why the rip in the TLMC doesn't match OPs rip.

Edit: If anyone is more curious about this, a large amount of the rips in the TLMC were ripped by one uploader (or a group of people using a singular alias) on the Japanese P2P programs Share[0] and Perfect Dark[1].

He was so prolific in the scene that he has his own Baidu wiki page[2] (albeit, it's not as strict as Wikipedia with notability, I still find it impressive).

All of his rips, as far as I know, were affected by this "normalize" setting, so even though he ripped hundreds (thousands?) of albums, a large majority of them probably wouldn't be considered "archive quality".

[0] https://en.wikipedia.org/wiki/Share_(P2P)

[1] https://en.wikipedia.org/wiki/Perfect_Dark_(P2P)

[2] https://baike.baidu.com/item/%E5%8F%B0%E9%AD%94%E7%8E%8B

mxfh · on July 31, 2018

Also depends if ReplayGain information is merely stored as extra non-destructive metadata or applied to the waveform: https://wiki.hydrogenaud.io/index.php?title=ReplayGain

squarefoot · on July 31, 2018

Normalizing alone with no compression/limiting involved shouldn't ruin fidelity at all. Actually in some (admittedly extreme) situations it could enhance it. A track that is very low from start to end could leave one or more most significant bits unused, therefore reducing bit depth, so especially during mastering I would welcome normalization if it brought the highest peak to 0 dB or slightly less. Of course I mean just normalizing without any compression/limiting, ie the dynamics curve stays the same: I'm well aware of the loudness wars plague.

Hakkin · on July 31, 2018

I believe you're correct that raising the volume is not necessarily a destructive process, since it would just be increasing the amplitude of the samples already there.

Unfortunately, the majority of rips fall over EAC's normalize threshold, which means the volume is decreased, which ends up truncating some of the bits (assuming the album was using the full range of the 16bit audio).

Edit: I noticed you used the term "fidelity", so you might be talking about being able to perceive the damage to the audio, which is not necessarily what I'm talking about.

The point of EAC (Exact Audio Copy) is to make archival quality copies of audio CDs, this means as close to bit-exact as possible. Modifying the audio in any way would be counterproductive to that purpose, even if it doesn't necessarily affect the actual sound quality.

jmillikin · on July 31, 2018

Most of the TLMC rips are range rips with unclear CUE data, which makes it difficult or impossible to verify they were correctly ripped.

For this album in particular, the audio CRCs of the other tracks don't match up between TLMC and a fresh rip:

TLMC:

  01.wav: 9A5E3226
  02.wav: 577A675B
  03.wav: 3548C299
  04.wav: E5DEC006
  05.wav: D33AC4AE
  06.wav: 7427AC6F
  07.wav: 83A58517
  08.wav: 0DCA8419
  09.wav: 703BAEAC

My rip:

  01.wav: 565F2B5A
  02.wav: E916B3A2
  03.wav: A595BC09
  04.wav: D989F0D6
  05.wav: C6A2DD2B
  06.wav: C8403284
  07.wav: D01D6BC4
  08.wav: 8786C1AA
  09.wav: 1E3641DD

rococode · on July 31, 2018

The CUE data in TLMC is a real bummer. I don't know if it's my music player, Clementine, but a good portion of the tracks I listened to in TLMC didn't advance correctly and instead just kept playing with the metadata of the first song in the album that I listen to. At this point I've mostly just given up on it... It took me 3 months to download it too, haha (curse you, Comcast 1TB monthly data limit!)

scrollaway · on July 31, 2018

Is there such a thing as an audiodiff to tell how different they are to yours?

jmillikin · on July 31, 2018

Yes: you can use Audacity (https://www.audacityteam.org/) to "diff" two waveforms by subtracting one from the other.

For comparing .wav files you can also treat them as plain binary data, and use standard diff tools. For example I used `cmp -l` to count differences between rips, and `vbindiff` to view them.

thr0w__4w4y · on July 31, 2018

Thanks for mentioning vbindiff. I examine/compare binary files a lot (reverse engineering for security) and I've used a bunch of tools, but somehow I've never come across vbindiff.

dylan604 · on July 31, 2018

First thought: Invert the phase of one of the tracks 180 degress, then sum them together. Anywhere the is a bump in the waveform would show a difference between the two sources.

tomc1985 · on July 31, 2018

Funny, you can use this technique sometimes to pull clean vocals (for remixing) from certain kinds of music (they have to be panned center in an otherwise stereo mix):

a) reverse one of the channels and them sum them and you'll sum will contain only stereo data

b) reverse and sum (a) with the original track to get only the mono content

tialaramex · on July 31, 2018

Finding the signal difference is easy, the nature of PCM means we can just subtract one recording from another.

However human hearing doesn't just import the signal, the initial processing is done during acquisition itself. Psychoacoustic lossy audio encodings like MP3 rely on this, an MP3 sounds to you very much like the original but a naive audiodiff would find huge differences.

r721 · on July 31, 2018

Audio DiffMaker: http://libinst.com/Audio%20DiffMaker.htm

gwern · on July 31, 2018

But do they sound any different?

jmillikin · on July 31, 2018

You're asking the wrong person: I can't reliably tell the difference between FLAC and YouTube.

When you've got a .wav file open in a hex editor, it's no longer really about the music itself.

plg · on July 31, 2018

I bet you could, given good audio equipment (good DAC/amp/speakers or DAC/amp/headphones)

rsync · on July 31, 2018

"But do they sound any different?"

As a fellow ripper of physical CDs in 2018, the point is that a correct rip of CDDA from the CD is the one and only way to be sure you won't ever have to rip that music again.

Any other file, any other delivery/download mechanism, any other format ... you'll be fooling around with that song again sometime. But not if you have the correct, error free rip of the WAV/PCM from the CD.

So it's really not about the sound vs. some other format ...

kalleboo · on July 31, 2018

> you'll be fooling around with that song again sometime

I've been ripping in AAC 256kbps for 15 years now and have yet to see any reason to revisit any of those CDs. Even if a more efficient encoding mechanism gains mainstream support, I'll just use that for new rips only since storage is cheap enough.

thirdsun · on July 31, 2018

> since storage is cheap enough.

That's exactly the point. Using anything but a lossless codec for ripping music is an unwise decision.

kalleboo · on July 31, 2018

It's not that cheap though. My laptops 1 TB SSD can comfortably store 60 GB of AAC files and I forget they even exist, but if those were uncompressed, the 300 GB they take up would be a major problem.

thirdsun · on Aug 1, 2018

I'm aware - my collection is nearing 1 TB of lossless music at this point.

However I store it on a local NAS as well as a an external SSD (in case I need it in a portable way, which rarely happens) - I opted for an expensive SSD because it's very light, small and portable, but a traditional external drive would handle music just fine if price is a major concern.

At the end everyone has different requirements, but since I consider ripping a huge amount of music a major task I wouldn't want to repeat, I'd make sure I do it right from the beginning - and opting for a lossy format doesn't fit into that idea.

fb03 · on Aug 1, 2018

also being lossless you are futureproofed on new formats you might need to convert to.

I mantain both: a tree of lossless files and an exact copy already encoded as 320kbit mp3 that is automatically mantained (just because i can quickly cram those into a usb thumbdrive and they'll play anywhere -- i can't say the same about flac support).

o/

thirdsun · on Aug 1, 2018

How do you _automatically_ maintain a shadow-library? I always argued against the parallel-collection-approach as you'll likely run into inconsistencies while trying to maintain it manually.

I'm glad iTunes offers to transcode my lossless collection to lossy AAC on the fly before syncing to the iPhone (yes, some people still do that).

Of course my concern is storage in mobile devices. Yours seems to be flac support? However is that really an issue these days? I use ALAC (due to using iOS devices), but even then I never run into compatibility issues. Any worthwhile software/hardware supports either format - with Apple being the big outlier requiring ALAC.

kalleboo · on Aug 1, 2018

> also being lossless you are futureproofed on new formats you might need to convert to.

Part of my original point was that I've been using AAC exclusively for 15 years now, and with all the content encoded in AAC now I don't see support for it disappearing in the next 15 either.

thirdsun · on Aug 2, 2018

Probably not, but now AAC support is forever a fixed, inevitable requirement for you.

wink · on July 31, 2018

As another fellow ripper of physical CDs - is there a good way to check for quality of my rips besides looking at the bitrate and listening to all of them (talking mp3s here, of course)? I'm not interested in perfect lossless quality, but I'm quite sure I should re-rip the CDs I did 10-15 years ago, but for VBR mp3s it's kinda hard to decide what is 'good enough', as compared to say 128kb/s rips - so I'm looking for a scriptable way to give me a rough "keep/maybe/definitely throw away" metric...

Rephrasing the question: What are people typically using these days when not going for FLAC? Some of my mp3s are definitely from the "HDD space is expensive" age 15+ years ago...

ahje · on July 31, 2018

MP3, with LAME (-q0). A couple of the really old rips used average bit rate instead. Will probably re-rip those at some point, but the rest sounds fine to my ears.

namibj · on July 31, 2018

Usually the formats are FLAC, 320, V0, V3.

_9vzr · on July 31, 2018

I've never seen anyone use V3 for VBR rips, usually V2.

userbinator · on July 31, 2018

The article says "there were about 20,000 bytes different betwen each rip", which if it happened in one stretch corresponds to ~113ms of audio. Depending on the exact nature of the error and how it changes the samples, it may be very obvious (e.g. a 113ms burst of silence or static) or inaudible (one different bit in a few hundred thousand --- probably won't even get past the lowpass filtering on the DAC.)

gwern · on July 31, 2018

That 20,000 bytes might not be a real difference, as I understand OP. It sounds like it's the XOR or Hamming distance, byte or bitwise - in which case you could get very large differences in the number by very small errors like dropping 1 bit (thereby putting all following bits out of alignment) yet the perceptual difference would be unhearable (and maybe not exist at all depending on how the encoding works). eg 0101010 vs 101010. You'd want to calculate some sort of edit distance which takes into account the possible bit flips and burst errors to see how big the real delta is. And of course, as pointed out in the other comments, for all of OP's hard work, there's no way to know which source is right or how many errors remain in his final version.

ddevault · on July 31, 2018

This matters for long-term music archival when you want to sample it and play with the audio, which could exagerrate the defects as you edit it. We should preserve pristine source material.

zuzzurro · on July 31, 2018

Are you aware of CUETools, CUERipper and the ctdb for ripping and repairing? Just wondering if it would have helped your case. It sure helped me recovering some damaged old cds. Available at http://cue.tools/wiki/Main_Page Not the author but a very happy customer

aidenn0 · on July 31, 2018

If the TLMC is lossless and gapless, then you could paste the last few seconds of 2, all of track 3, and the first few seconds of track 4, then search for an offset that yields a minimum diff.

iforgotpassword · on July 31, 2018

> I just checked the TTA and track 3 sounds fine.

That doesn't necessarily mean it was ripped without errors. I think op was trying to make sure he has an exact copy, not just one that sounds alright.

wang_li · on July 31, 2018

He has no idea if he has an exact copy of the actual track. He has a copy of the data that is consistently ripped by his drives. There is nothing to assure that these are the same. Sounding fine -- for some values of fine -- is the only way to tell if it's correct -- for some values of correct.

iforgotpassword · on July 31, 2018

Sounding fine says nothing, it's rather that not sounding fine tells you something, namely that ripping failed. Having some of the least significant bits of a sample flip for example is inaudible and will sound fine.

Sure, op has no absolute guarantee, but the fact that his crc32 matches that of the AccurateRip DB is, apart from his statistical approach, another strong indicator that he actually got it right. The alternative is that either the other user who submitted the crc32 to the database coincidentally ended up with exactly the same read errors, or that it's a hash collision (not entirely unlikely given the size of the checksum).

It's reasonable to assume op got a perfect rip.

wang_li · on July 31, 2018

He didn't get a match against the AccurateRip database:

> That single AccurateRip entry for this album matched my CRCs for all tracks except track #3 – they had 0x84B9DD1A, vs my result of 0xA595BC09. I suspect that original ripper didn’t realize their disk was bad.

iforgotpassword · on July 31, 2018

Huh you're right; I stand corrected then.

(Btw did you see the addition at the beginning of the article? Funky stuff going on...)

zkms · on July 31, 2018

> This is the 19th version of Touhou Lossless Music Collection torrent, current total file size ~1.75 TiB.

I didn't know this existed but I am grateful for you changing this and now I need to buy a new hard drive.

gwern · on July 31, 2018

You're welcome! I've seeded it for years because it's an amazing project. One day I hope to listen through the whole thing.

hitekker · on July 31, 2018

Thanks for offering the OGG copy, gwern.

Were you a fan of Touhou back in its heyday, or is it just one of the many that rouse your curiousity?

gwern · on July 31, 2018

I've enjoyed the Touhou music fandom for a good many years now. I don't play the games, I just like the music - it's fascinating how big and diverse it is while still being unified around a canon of Zun themes/melodies.

It reminds me of traditional Japanese court poetry, where half the esthetic work is done by knowledge of allusions and borrowings and the poet is trying to express a familiar theme in a slightly better way rather than seeking for novelty like most forms of literature.

sigzero · on July 31, 2018

What is "Touhou Lossless Music Collection"? Besides "music" a google search didn't get me much info.

Nadya · on July 31, 2018

A very large lossless (.ogg) collection of doujin [0] Touhou [1] music. Googling the name should have taken you to the website [2] which might provide a bit more information. It's split into three torrents: One for just the music (1.65TB in size), one for Album Scans (76.5GB in size), and another for extras like lyrics, bundled item scans, and other misc. extras (24.2GB in size).

Or in simple terms: A large collection of cover songs for the soundtracks of a popular Japanese bullet-hell game series.

[0] https://en.wikipedia.org/wiki/D%C5%8Djin

[1] https://en.wikipedia.org/wiki/Touhou_Project

[2] http://www.tlmc.eu/

gwern · on July 31, 2018

> (.ogg)

TTA usually. There is an Ogg Vorbis torrent about a tenth the size, but I'm not sure it's being kept in sync with the original TLMC.

ThatPlayer · on July 31, 2018

The tlmc v19 post has a link to the Ogg Opus (rather than Vorbis) which is updated.

Nadya · on July 31, 2018

Thanks for the correction. I converted everything to .flac to maintain consistency with my personal library, so I must have forgotten which other lossless format I had converted from.

sigzero · on July 31, 2018

It did take me to the site but I guess if you have no idea about the genre it can draw a blank. Thanks for the reply!

azinman2 · on July 31, 2018

What is this? I can’t tell from the url, other than it being a lot of music?

anilakar · on July 31, 2018

A torrent with a huge collection of Japanese self-published game music remixes focused on one specific shmup game series.

It used to be the largest torrent in circulation and also the one that broke some client programs due to its huge size.

amelius · on July 31, 2018

But what is the real checksum? Did the author get it right?