shubham's notes
Blog About

LLMs

All posts →
LLMs Jul 13, 2025 - 7 min

Multi-Head vs Multi-Query vs Grouped Query Attention

The title may be long, but the explanation isn't. Since the original Transformer, researchers have continuously tweaked its architecture, aiming to cut inference costs. Today, we’ll compare Multi-Head vs Multi-Query vs Grouped Query Attention to understand...

LLMs Mar 8, 2025 - 6 min

What is KV-Cache?

If you have been reading articles on LLMs, you would have often come across an interesting term called KV-Cache and how developers are trying to do all sorts of trickery to speed up LLMs. And that is what we are going to do today– talk about KV-Cache to...

LLMs Mar 2, 2025 - 11 min

Fine-Tune Embeddings for RAG to Improve Retrieval

Does your RAG application throw up bizarre, irrelevant content? The issue might not be your LLM but your retrieval. Fine-tune embeddings for RAG can significantly improve retrieval quality. When your RAG application doesn't work, you don't need to throw a...