Skip to content

YichenZW/llm-arch-table

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLM Architecture Comparison Table

A living reference table comparing the internal architectural choices of large language models from the original Transformer (2017) through the latest frontier models. Ordered newest β†’ oldest.

What's tracked

Each model row covers:

Column What it captures
Norm LayerNorm vs RMSNorm
Parallel Layer Attention + FFN in parallel vs. serial
Pre-norm Norm before sub-layer (Pre) / after (Post) / both
Pos. Embedding Sine / Absolute / Relative / RoPE / ALiBi / NoPE / iRoPE
Activation ReLU / GeLU / SwiGLU / GeGLU / SqReLU
Attn. Type MHA / MQA / GQA / MLA / SWA
Context Len. Native and extended token limits
MoE Dense or Sparse (total / active params, expert count)
Bias Terms Bias in attention projections and/or norms
Tied Emb. Input embedding tied to output projection
QK-Norm RMSNorm on Q and K inside attention
Sliding Window Local sliding-window attention in some layers
Stability Tricks Z-loss, MTP, logit capping, etc.
Ref Primary paper / technical report link

Files

table.md          β€” the main comparison table
MAINTENANCE.md    β€” how to add or update models

Sources

  • Original figure: Harm de Vries (2017–2024 models)
  • Extended coverage: Sebastian Raschka, The Big LLM Architecture Comparison (updated Mar 6, 2026)
  • Individual technical reports and HuggingFace model configs for each model

About

Living comparison table of LLM architectural choices (norm, attention, MoE, positional encoding, and more) from the Original Transformer (2017) to frontier models (2026). Based on Harm de Vries's figure, Sebastian Raschka's Big LLM Architecture Comparison, and Tatsunori Hashimoto's Stanford CS 336 lecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors