MenosFios Office: Learn all about DeepSeek AI

1770

The Chinese startup DeepSeek launched an AI model capable of rivaling the technology of OpenAI e Google, but with a much lower budget. The technology calls into question the large investments in training artificial intelligence and the price of business models.

DeepSeek stands out for its quality, similar to what OpenAI, Google and Meta have presented to the world. However, it has the ability to reduce training costs and increase efficiency, which can redefine the true rules of the AI ​​game.

Everything indicates that the Western world was taken by surprise by the Chinese startup's capacity, causing even the major technology giants to take a tumble on Wall Street. Nvidia was worse off, which after meteoric growth, surpassing Apple in value, fell dramatically, losing 600 billion dollars in a single session, that is, 17% of its value, breaking the record for the biggest fall ever by a company in the United States, according to the CNBC.

This appears to be China's response, just days after the United States imposed restrictions on exports of artificial intelligence chips produced in the country. Russia and China are on a strict blacklist, but the Trump administration has decided to further block access to the rest of the world, except for a group of countries it considers strategic allies, including Portugal and many European Union states.

On the other hand, the US executive intends to invest 500 billion dollars in the technology. The question now is whether it will take so much money for the United States to achieve AI supremacy.

Who is DeepSeek after all? How much do the training models cost? And how did it shake up the technological world? In today's Consultório MenosFios, get to know better the Chinese startup that promises to continue to be the talk of the town.

Who is DeepSeek?

DeepSeek is a private Chinese company, founded only in July 2023 by Liang Wenfeng, a graduate of Zhejiang University in electronic engineering. According to the MIT Technology Review, his startup was incubated at High-Flyer, a hedge fund that he founded in 2015.

DeepSeek's goal, similar to Sam Altaman's OpenAI, is to build an AGI (Artificial General Intelligence) model, a form of AI capable of matching and even surpassing human intelligence in a variety of tasks.

The team is made up of young graduates from top Chinese universities, fostering a culture of innovation. It is noted that the company prioritizes technical skills over traditional work experience. This ensures that the group is made up of highly skilled individuals, but also with a refreshing perspective on the development of artificial intelligence.

How did DeepSeek circumvent US sanctions?

Incredibly, despite the announcement of the DeepSeek R1 model having sunk Nvidia on Wall Street, to obtain the processing capacity to train the model the startup is relying on Nvidia's A100 chips. Liang Wenfeng is said to have managed to secure a stock of processors before the United States banned Nvidia from exporting the chips to China in September 2022.

DeepSeek is estimated to have collected 10 A100 chips, but the number appears to be much higher, around 50, according to analyst Dylan Patel, founder of AI consulting firm SemiAnalysis.

How have DeepSeek's AI models evolved?

One of the highlights of DeepSeek R1, the model that has been on everyone's lips, is its improved learning capacity and greater efficiency in memory usage. But before getting to this point, the company has released other models. The first was DeepSeek Coder in November 2023, an open source model designed for programming tasks. This was followed by the DeepSeek LLM model with 67 billion parameters, created to compete with other large-scale language models.

In May 2024, the DeepSeek-V2 was launched, which had already been praised for its great performance and low cost. This model even generated a stir among competitors in China, where the disruptive price faced off against technology giants such as ByteDance, Tencent, Baidu and Alibaba, leading them to lower the price of their offerings to remain competitive.

The evolution of DeepSeek's models is palpable, with DeepSeek-Coder-V2 able to generate 236 billion parameters. As explained by Forbes, the model was designed to overcome complex programming challenges.

The company is currently developing its latest models, the DeepSeek-V3 and DeepSeek-R1. The V3 has a capacity of 671 billion parameters, and is considered to be very efficient compared to the competition and offers great performance.

DeepSeek-R1, launched this week, is on par with the performance of OpenAI 01. These are models from “another league”, those that seek to reach AGI: they are slower to process, but offer more efficient responses.

The company also has on its offer list the DeepSeek-R1 Distill, lighter but highly capable open source versions. The company offers models with up to 32 and 70 billion parameters, saying they are on par with the OpenAI 01 mini.

Unlike traditional methods that rely on supervised tuning, DeepSeek uses so-called reinforcement learning. Models learn through trial and error, automatically improving through algorithmic rewards. The model learns through interaction with its environment, receiving feedback from its actions, somewhat similar to the process in which humans learn through experience.

As Forbes points out, this format allows you to develop greater reasoning skills and adapt to new situations more efficiently. This technique is similar to new approach to training models  with inference-time computing, which may be the solution to the issue of the Internet running out of useful training data.

This new computation is a technique that slices requests into smaller tasks, turning each one into a new prompt for the model to solve. Each step requires a new request, which is known here as the inference phase.

In the case of DeepSeek-R1, it is explained that the model activates only a small fraction of its parameters for a given task, such as inference-time computation. This selective activation allows a significant reduction in computational costs, improving its efficiency.

LEAVE AN ANSWER

Please enter your comment!
Please enter your name here