Blog

Cross Entropy Loss

On the cover: Cross Entropy Loss Cross Entropy Loss is a loss function that is used to measure the difference between the predicted probability distribution and the true probability distribution. It is a measure of the uncertainty of the model’s ...

posted in Cross Entropy Loss Information Theory KL Divergence on October 13, 2025 by Narendiran Chembu

MoE - Is Attention All You Really Need?

On the cover: A bunch of different robots who are expert at different tasks It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of ...

posted in DeepSeek MoE Mixture of Experts LLMs on September 21, 2025 by Narendiran Chembu

RoPE - Is Attention All You Really Need?

On the cover: A RoPE, essentially It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of natural language processing. The self-atte...

posted in DeepSeek RoPE Rotary Positional Encoding LLMs on September 9, 2025 by Narendiran Chembu

KV Caching & MLA - Is Attention All You Really Need?

On the cover: MLA Architecture. Credits: Welch Labs It’s been 8 years since the landmark paper “Attention is all you need” was published. The paper introduced the attention mechanism, which has revolutionized the field of natural language process...

posted in DeepSeek Attention MLA KV Caching MoE on August 22, 2025 by Narendiran Chembu

Markov Chain - The connection between Nuclear Bombs, Google Search, and Perplexity

On the cover: A Nuclear Bomb Explosion Recently I watched this video by Veritasium talking about Markov Chains. I really liked the examples he used to explain the usage of Markov Chains, ranging from Nuclear Bombs to Google Search and Perplexity ...

posted in markov chain pagerank perplexity on August 2, 2025 by Narendiran Chembu