Darshan Fofadiya

Senior Applied Scientist at Amazon | AI Researcher
Designing Novel Transformer Architectures

LLM Inference at Scale Zero to Quantum Research Paper Deep Dives Open Source

The Illustrated Guide to LLM Inference at Scale

14 parts across 3 phases · 4 published

A visual, math-heavy deep dive into running Llama-70B with a 1 million token context window on GPU clusters. From memory constraints through parallelism strategies to production serving — with step-by-step calculations and animations.

Read the series →

Phase 1: Parallelism

Phase 2: Quantization

○ Part 6: Why Quantization — The Memory Multiplier
○ Part 7: Post-Training Quantization (PTQ)
○ Part 8: Weight-Only vs Weight-Activation Quantization
○ Part 9: Advanced Quantization — GGUF, AQLM, QuIP#

Phase 3: Production

○ Part 10: Throughput & Latency
○ Part 11: Serving Mixed Workloads
○ Part 12: KV Cache Management
○ Part 13: Multi-Node Deployment
○ Part 14: Code Walkthrough

Zero to Quantum

14 parts across 4 phases · 1 published

From knowing nothing about quantum computing to understanding and designing quantum algorithms. Every concept from first principles — qubits, gates, Grover's search, Shor's algorithm, variational methods, error correction — with math, code, and Colab notebooks you can run in your browser.

Read the series →

Phase 1: Foundations

✓ Part 1: Zero to Grover's Search
○ Part 2: Entanglement
○ Part 3: Quantum Fourier Transform

Phase 2: Algorithms

○ Part 4: Shor's Algorithm
○ Part 5: Quantum Phase Estimation
○ Part 6: Quantum Walks
○ Part 7: Variational Quantum Eigensolver (VQE)

Phase 3: Applications

○ Part 8: Quantum Machine Learning
○ Part 9: Quantum Optimization (QAOA)
○ Part 10: Quantum Simulation
○ Part 11: Quantum Cryptography

Phase 4: Hardware & Error Correction

○ Part 12: Quantum Error Correction
○ Part 13: Quantum Hardware Deep Dive
○ Part 14: The Road to Fault Tolerance

Research Paper Deep Dives

Illustrated, math-heavy breakdowns of important papers

✓ TurboQuant: The KV Cache Compression That Crashed Memory Stocks — PolarQuant + QJL traced from first principles. Every number derived, every step explained.
✓ How 500,000 Qubits Could Break Bitcoin (Part 1) — Google's ECDLP paper: elliptic curves from scratch, Shor's algorithm intuition, and the path to 90 million Toffoli gates.

Open Source

Libraries and tools I'm building in the open

✓ Padded-CSR: A Sparse Storage Format for Dynamic Sparse Training on CPU — The full story behind SparseLab: why sparse matters (lottery ticket, mask-on-dense), what sparsity actually means, how CSR came to be, why it breaks under training, how Padded-CSR fixes it, the NEON SIMD inner loop, and what ideal sparse hardware would look like.

Get notified when I publish new deep dives

Illustrated guides on LLM inference, quantum computing, and AI research papers.

⚠ After subscribing, check your inbox for a confirmation email. You won't receive posts until you confirm.