VLAs - Pixels to Tokens
VLAs - Pixels to Actions On the cover: Pixels to actions In our last post, we taught a Transformer to see and speak. We bolted a vision encoder onto an LLM, projected pixel-patches into the language space, and tricked the text model into halluci...