Hello, amazing robotics people 😍 😍 😍 We have FINALLY delivered on your major request! Ark just got a major upgrade:
We’ve now integrated Vision-Language-Action Models (VLAs) into Ark 🎉 VLAs = models that connect vision + language → robot actions (see image)
What does this mean?
🗣️ Give robots natural language instructions → they act 👀 Combine perception + language for real-world control 🦾 Powered by pi0 pretrained models for fast prototyping ⚡ Supports easy data collection and fine-tuning within Ark within a couple of lines of code
Next, we plan to go into the world of designing worlds 😉 Who knows, maybe those video models are actually zero-shot learners and reasoners?