VLMs - Pixels to Tokens

On the cover: Pixels to tokens In our last post, we spoke about how the Vision Transformer (ViT) took the candy away from CNNs. We learned that if you slice an image into patches and flatten them, you can treat an image just like a sentence. And ...

ViT - Pixels to Tokens

On the cover: Decorative For nearly a decade, if you wanted a computer to identify a cat in a picture, you had one reliable tool: The Convolutional Neural Network (CNN). CNNs were the undisputed kings of Computer Vision. They were dependable, the...

Cross Entropy Loss

On the cover: A weather forecasting stone meme Imagine you live, work and vacation in three different cities. In City A (home), it is sunny every single day. No clouds. No rain. Ever. You wake up, glance outside, and already know the answer. Bor...

MoE - Is Attention All You Really Need?

On the cover: A bunch of different robots who are expert at different tasks It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of ...

RoPE - Is Attention All You Really Need?

On the cover: A RoPE, essentially It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of natural language processing. The self-atte...