Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



One of my favorite parts of the 2024 series on Youtube was when Prof B explained her excitement just before introducing UCB algorithms (Lecture 11): "So now we're going to see one of my favorite ideas in the course, which is optimism under uncertainty... I think it's a lovely principle because it shows why it's provably optimal to be optimistic about things. Which is kind of beautiful."

Those moments are the best part of classroom education. When a super knowledgeable person spends a few weeks helping you get to the point where you can finally understand something cool. And you can sense their excitement to tell you about it. I still remember learning Gauss-Bonnet, Stokes Theorem, and the Central Limit Theorem. I think optimism under uncertainty falls in that group.


Those don't have DPO/GRPO which arguably made some parts of RL obsolete.


check out cs 336 stanford, they cover DPO/GRPO and relevant parts needed to train LLMs.


It's also covered by CS329H.


I can assure you that lacking knowledge in DPO (and especially GRPO it’s just stripped down PPO) is not a dealbreaker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: