參閱鏈接 :
https://github.com/deepseek-ai/DeepSeek-V3/issues/849
https://www.reddit.com/r/LocalLLaMA/comments/1mzsg6v/deepseek_v31_getting_token_extreme_%E6%9E%81_%E6%A5%B5_out_of/
https://www.xiaohongshu.com/discovery/item/68ac166a000000001d012571?source=webshare&xhsshare=pc_web&xsec_token=CBxtz16cD7hBeyge2T9Q3r5OWhHWYeqpxRhRb2uIoNKxk=&xsec_source=pc_share
https://www.zhihu.com/question/1942934856603505597
https://www.zhihu.com/people/qiao-shi-zhan-66/answers
本文來自微信大眾號“AI前哨”,參數(shù) top_k=1,Qwen3 Coder 480B A35B Instruct 只要在被嚴峻量化后才呈現(xiàn)相同的問題