ESP32-S3 LLM Architecture Comparison

Same 8 MB PSRAM budget, same training data, different width/depth tradeoffs

dim=128

HW1HelpAgent_slim — 22 layers, 4 heads

Token Embedding 4096 × 128

↓

Layer 0

FFN: 128→720→128

Layer 1

FFN: 128→720→128

Layer 2

FFN: 128→720→128

⋮ Layers 3 – 19

Layer 20

FFN: 128→720→128

Layer 21

FFN: 128→720→128

↓

LM Head 128 → vocab

dim=192

HW1HelpAgent192 — 12 layers, 6 heads

Token Embedding 4096 × 192

↓

Layer 0

FFN: 192→768→192

Layer 1

FFN: 192→768→192

Layer 2

FFN: 192→768→192

⋮ Layers 3 – 9

Layer 10

FFN: 192→768→192

Layer 11

FFN: 192→768→192

↓

LM Head 192 → vocab

dim=256

HW1HelpAgent256 — 8 layers, 8 heads

Token Embedding 4096 × 256

↓

Layer 0

FFN: 256→768→256

Layer 1

FFN: 256→768→256

Layer 2

FFN: 256→768→256

⋮ Layers 3 – 5

Layer 6

FFN: 256→768→256

Layer 7

FFN: 256→768→256

↓

LM Head 256 → vocab

What "dim" means: each token is a vector of N numbers

dim=128

128 dimensions to encode meaning

dim=192

192 dimensions — 2.25× more room

dim=256

256 dimensions — 4× more room to separate concepts

How dim affects topic separation

More dimensions = more room to keep similar concepts apart

dim=128 — cramped

WiFi

MQTT

ESP-NOW

BLE

BME

OLED

IMU

GPS

Topics overlap → wrong answers

dim=192 — breathing room

WiFi

MQTT

ESP-NOW

BLE

BME

OLED

IMU

GPS

Better separation

dim=256 — well separated

WiFi

MQTT

ESP-NOW

BLE

BME

OLED

IMU

GPS

Clear boundaries → right answers

Inside one transformer layer

dim=128 layer

Attention QKV128×384

Attention Out128×128

Heads4 × 32d

FFN Up128→720

FFN Down720→128

Weights/layer~250 KB

KV cache/layer64 KB

× 22 layers5500 KB + 1408 KB

dim=192 layer

Attention QKV192×576

Attention Out192×192

Heads6 × 32d

FFN Up192→768

FFN Down768→192

Weights/layer~442 KB

KV cache/layer96 KB

× 12 layers5308 KB + 1152 KB

dim=256 layer

Attention QKV256×768

Attention Out256×256

Heads8 × 32d

FFN Up256→768

FFN Down768→256

Weights/layer~655 KB

KV cache/layer128 KB

× 8 layers5243 KB + 1024 KB

Full comparison

	dim=128	dim=192	dim=256
Preset	HW1HelpAgent_slim	HW1HelpAgent192	HW1HelpAgent256
Embedding dim	128	192	256
Layers	22	12	8
Attention heads	4	6	8
Head dim	32	32	32
FFN inner	720	768	768
Parameters	~5.8M	~6.1M	~6.3M
INT8 weights	~6092 KB	~6142 KB	~6336 KB
KV cache	1408 KB	1152 KB	1024 KB
Total PSRAM	~7822 KB	~7654 KB	~7640 KB
Repr. capacity	1×	2.25×	4×
Processing depth	22 passes	12 passes	8 passes
Best for	Pattern matching	Balanced	Topic separation