Jinn's Hub
about / blog / projects / ZH /
2026
  • Attention Mechanisms — Full, Sparse, Linear, NSA & GLA
    Breaking down Full, Sparse, and Linear Attention, all the way to DeepSeek NSA and Gated Linear Attention
2025
  • Benchmark: Qwen3-Coder-30B-A3B + EAGLE3 Speculative Decoding
    EAGLE3 speculative decoding benchmarks on Qwen3-Coder — 1.87x speedup for code generation
  • NeMo-RL vs slime: RL Training Framework Comparison
    Deep comparison of two RL training frameworks: algorithms, engineering quality, MoE support, and ROCm compatibility
  • TritonForge: Server-based Multi-turn RL for Triton Kernel Generation
    End-to-end server-based RL training and evaluation system for Triton kernel generation across NVIDIA and AMD, built on slime + Megatron
  • SFT & RL Training Guide
    Complete guide from SFT vs RL fundamentals, loss computation, dataset construction to RLHF in practice
  • KV Cache & Model Weights
    Understanding KV Cache vs Model Weights — the first step to LLM inference optimization
  • Memory Calculation of LLM on GPU
    Detailed GPU memory estimation for LLM training and inference, covering DP/TP/PP/EP parallelism strategies
  • Transformer Deep Dive (Math + Code)
    Deconstructing Transformer's Self-Attention, LayerNorm, and MLP from math, code, and architecture perspectives
© 2026 • Jinn's Hub 🔬
Press Esc or click anywhere to close