Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Asked it to write PyTorch code which trains an LLM and it produced 23 steps in 62 seconds.

With gpt4-o it immediately failed with random errors like mismatched tensor shapes and stuff like that.

The code produced by gpt-o1 seemed to work for some time but after some training time it produced mismatched batch sizes. Also, gpt-o1 enabled cuda by itself while for gpt-4o, I had to specifically spell it out (it always used cpu). However, showing gpt-o1 the error output resulted in broken code again.

I noticed that back-and-forth iteration when it makes mistakes has worse experience because now there's always 30-60 sec time delays. I had to have 5 back-and-forths before it produced something which does not crash (just like gpt-4o). I also suspect too many tokens inside the CoT context can make it accidentally forget some stuff.

So there's some improvement, but we're still not there...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: