2024 lecture videos are on YouTube: https://youtube.com/playlist?list=PLoROMvodv...

rllearner · 2025-11-26T21:14:25 1764191665

One of my favorite parts of the 2024 series on Youtube was when Prof B explained her excitement just before introducing UCB algorithms (Lecture 11): "So now we're going to see one of my favorite ideas in the course, which is optimism under uncertainty... I think it's a lovely principle because it shows why it's provably optimal to be optimistic about things. Which is kind of beautiful."

Those moments are the best part of classroom education. When a super knowledgeable person spends a few weeks helping you get to the point where you can finally understand something cool. And you can sense their excitement to tell you about it. I still remember learning Gauss-Bonnet, Stokes Theorem, and the Central Limit Theorem. I think optimism under uncertainty falls in that group.

storus · 2025-11-26T16:35:47 1764174947

Those don't have DPO/GRPO which arguably made some parts of RL obsolete.

nafizh · 2025-11-26T20:27:06 1764188826

check out cs 336 stanford, they cover DPO/GRPO and relevant parts needed to train LLMs.

storus · 2025-11-27T01:23:27 1764206607

It's also covered by CS329H.

upbeat_general · 2025-11-26T21:06:57 1764191217

I can assure you that lacking knowledge in DPO (and especially GRPO it’s just stripped down PPO) is not a dealbreaker.