Cross Entropy Loss

On the cover: Cross Entropy Loss Cross Entropy Loss is a loss function that is used to measure the difference between the predicted probability distribution and the true probability distribution. It is a measure of the uncertainty of the model’s ...

MoE - Is Attention All You Really Need?

On the cover: A bunch of different robots who are expert at different tasks It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of ...

RoPE - Is Attention All You Really Need?

On the cover: A RoPE, essentially It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of natural language processing. The self-atte...

KV Caching & MLA - Is Attention All You Really Need?

On the cover: MLA Architecture. Credits: Welch Labs It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of natural language process...

Markov Chain - The connection between Nuclear Bombs, Google Search, and Perplexity

On the cover: A Nuclear Bomb Explosion Recently I watched this video by Veritasium talking about Markov Chains. I really liked the examples he used to explain the usage of Markov Chains, ranging from Nuclear Bombs to Google Search and Perplexity ...