DeepSeek and their quant/algotrading parent company have years of experience in raw C/C++ CUDA programming and low-level CUDA optimization. That is one of the main reasons they could do model training and serve inference so effectively and cheaply. That hard-earned experience is not something they have shared publicly.