Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(I work at π.)

Happy to answer any questions on the model, hardware, etc



I saw your foundation model is trained on data from several different robots. Is the plan to eventually train a foundation model that can control any robot zero shot? That is, the effect of actuations on video/sensor input is collected and understood in-context and actuations are corrected to yield intended behavior. All in-context. Is this feasible?

More specifically, has your model already exhibited this type of capability, in principle?


Nearly 2 years ago I bet a roboticist $10 that we’d have “sci-fi” robots in 2 years.

Now, we didn’t set good criteria for the bet (it was late at night). However, my personal criteria for “scifi” are twofold: 1. Robots that are able to make peanut butter sandwiches without explicit training 2. Robots able to walk on sand (eg Tatooine)

Based on your current understanding, who won the bet? Also, what kind of physical benchmarks do you associate with “sci-fi robots”?


Coincidentally just saw robots walking on sand today: https://www.youtube.com/watch?v=KRVR0E7AN0A


You did not win the bet :)


Is there a web page where we can see bloopers? I want to see the problems you had to solve.

Also, could you please consider adding googly eyes [1] to the robot(s) in future videos?

[1] https://en.wikipedia.org/wiki/Googly_eyes


Hi! Very cool results. Are you able to share some numbers about the slope of the scaling curve you found, i.e. how performance responds to a growing nr of demonstrations?

Academically I'd also be very interested how much of a data efficiency improvement you achieved with the pretrained model + task specific post-training versus from-scratch task specific training - like, if post training requires say 50 additional demos, and from-scratch on smaller model requires say 250 demos (or whatever) to match performance, that would be an interesting quntification of the efficiency benefit of using the big foundation model


First of all - incredible work. Do you guys plan to integrate frameworks like ROS to help manage this robot?


How does the post-training step work? In the case of t-shirt folding, does a supervisor perform the folding first, many times? Or is the learning interactive, where a supervisor corrects the robot if it does something wrong?


As a committed AI skeptic, this demo is very impressive. Bravo




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: