quaful 在 Yifan Zhang 透露了DeepSeek V4参数 中发帖
Yifan Zhang:本科北大元培、硕士清华姚班,现为普林斯顿大学 AI Lab Fellow,研究方向是大语言模型推理与强化学习
这哥们儿不是DeepSeek的员工,但估计和DeepSeek关系挺密切的,之前几次预测都说对了。
这次他披露的:
V4 1.6T, V4-Lite 285B
Attention: DSA2 (NSA + DSA),
head-dim 512 Sparse MQA + SWA
MoE: Fused MoE Mega-Kernel with 6 active in 384 experts
Residual: Hyper-Connections
Optimizer: Muon
Pretrain context length: 32K
RL: GRPO with corrected KL
Final Context Length: 1M
Mo...