Layer 10 is trained on layer 9’s output distribution. Layer 60 is trained on layer 59’s. If you rearrange them — feeding layer 60’s output into layer 10 — you’ve created a distribution the model literally never saw during training.
The good ones were subtly but noticeably sharper. More coherent reasoning, better at holding long context, more natural conversational flow. The kind of difference where you can’t quite articulate what changed, but the model feels more present. Or maybe that’s just my imagination; vibe checks are hard to define.
。新收录的资料对此有专业解读
Москвичам назвали срок продолжения оттепели14:39
这,怎么可能?我说,我可从没干过。他疲惫地眨眨眼:还不一定第二波,我说是“万一”,你只要一针扎下,它会自动推送。
,这一点在新收录的资料中也有详细论述
FT Videos & Podcasts。PDF资料对此有专业解读
Что думаешь? Оцени!