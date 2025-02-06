BOSTON: After the release of DeepSeek-R1 on Jan 20 triggered a massive drop in chipmaker Nvidia’s share price and sharp declines in various other tech companies’ valuations, some declared this a “Sputnik moment” in the Sino-American race for supremacy in artificial intelligence (AI). While America’s AI industry arguably needed shaking up, the episode raises some difficult questions.

The US tech industry’s investments in AI have been massive, with Goldman Sachs estimating that “mega tech firms, corporations, and utilities are set to spend around US$1 trillion on capital expenditures in the coming years to support AI”.

Yet for a long time, many observers, including me, have questioned the direction of AI investment and development in the United States.

With all the leading companies following essentially the same playbook (though Meta has differentiated itself slightly with a partly open-source model), the industry seems to have put all its eggs in the same basket.

Without exception, US tech companies are obsessed with scale. Citing yet-to-be-proven “scaling laws,” they assume that feeding ever more data and computing power into their models is the key to unlocking ever-greater capabilities. Some even assert that “scale is all you need”.

Before Jan 20, US companies were unwilling to consider alternatives to foundation models pretrained on massive data sets to predict the next word in a sequence. Given their priorities, they focused almost exclusively on diffusion models and chatbots aimed at performing human (or human-like) tasks.

And though DeepSeek’s approach is broadly the same, it appears to have relied more heavily on reinforcement learning, mixture-of-experts methods (using many smaller, more efficient models), distillation, and refined chain-of-thought reasoning. This strategy reportedly allowed it to produce a competitive model at a fraction of the cost.