LLM Architecture Comparison Table

A living reference table comparing the internal architectural choices of large language models from the original Transformer (2017) through the latest frontier models. Ordered newest → oldest.

What's tracked

Each model row covers:

Column	What it captures
Norm	LayerNorm vs RMSNorm
Parallel Layer	Attention + FFN in parallel vs. serial
Pre-norm	Norm before sub-layer (Pre) / after (Post) / both
Pos. Embedding	Sine / Absolute / Relative / RoPE / ALiBi / NoPE / iRoPE
Activation	ReLU / GeLU / SwiGLU / GeGLU / SqReLU
Attn. Type	MHA / MQA / GQA / MLA / SWA
Context Len.	Native and extended token limits
MoE	Dense or Sparse (total / active params, expert count)
Bias Terms	Bias in attention projections and/or norms
Tied Emb.	Input embedding tied to output projection
QK-Norm	RMSNorm on Q and K inside attention
Sliding Window	Local sliding-window attention in some layers
Stability Tricks	Z-loss, MTP, logit capping, etc.
Ref	Primary paper / technical report link

Files

table.md          — the main comparison table
MAINTENANCE.md    — how to add or update models

Sources

Original figure: Harm de Vries (2017–2024 models)
Extended coverage: Sebastian Raschka, The Big LLM Architecture Comparison (updated Mar 6, 2026)
Individual technical reports and HuggingFace model configs for each model

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
MAINTENANCE.md		MAINTENANCE.md
README.md		README.md
table.md		table.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Architecture Comparison Table

What's tracked

Files

Sources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

LLM Architecture Comparison Table

What's tracked

Files

Sources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages