quafulYifan Zhang 透露了DeepSeek V4参数 中发帖

Yifan Zhang:本科北大元培、硕士清华姚班,现为普林斯顿大学 AI Lab Fellow,研究方向是大语言模型推理与强化学习 
这哥们儿不是DeepSeek的员工,但估计和DeepSeek关系挺密切的,之前几次预测都说对了。 
这次他披露的: 
V4 1.6T, V4-Lite 285B 
Attention: DSA2 (NSA + DSA), 
head-dim 512 Sparse MQA + SWA 
MoE: Fused MoE Mega-Kernel with 6 active in 384 experts 
Residual: Hyper-Connections 
Optimizer: Muon 
Pretrain context length: 32K 
RL: GRPO with corrected KL 
Final Context Length: 1M 
Mo...
 
 
Back to Top