Sber Releases a Family of Advanced Neural Models

Moscow, Nov 21: Sber is open-sourcing the weights of two new flagship MoE models in the GigaChat line – Ultra-Preview and Lightning. They were built from the ground up for Russian-language tasks. The release also includes a new generation of the open GigaAM-v3 models for speech recognition with punctuation and normalization.

Furthermore, all models from the new Kandinsky 5.0 line for image and video generation are now available. Video Pro, Video Lite, and Image Lite are proprietary advanced neural networks that natively understand prompts in Russian, know Russian culture, and can generate Cyrillic text on images and videos. The release also includes the K-VAE 1.0 models for visual data compression, which are essential for training visual content generation models, being the best in the world among their open-source analogues. The code and weights for all these models are now available to all users under the MIT License, allowing for commercial use.

Andrey Belevtsev, senior vice president, Head of Technology Development at Sber:

Creating a true world-class AI requires two things: colossal resources and, even more importantly, world-class R&D teams. Sber has both. But our fundamental position is not to build a ‘closed’ technology. Our strategy is to become an open foundation for the entire country. This is precisely why we are open-sourcing our model weights. This is a key point. When we provide these models, any company in Russia—from a bank to a startup—can deploy them within their own secure perimeter and fine-tune them on their sensitive data without exposing it to anyone. This is what genuine technological sovereignty looks like: when the entire nation has access to AI, and it becomes the basis for business transformation and a stimulus for economic growth. I would also like to note that the Ultra model will soon be available to corporate clients as well, optimized for total cost of ownership for on-premises deployment within a company’s perimeter.

GigaChat Ultra and GigaChat Lightning

The GigaChat family welcomes new additions: GigaChat Ultra Preview and GigaChat Lightning. GigaChat Ultra Preview is the most powerful and largest model in the GigaChat line. It is the first model of this scale in Russia whose training is still ongoing. But even at its current stage, it outperforms both DeepSeek-V3.1 in overall quality metrics for the Russian language (leading the MERA benchmark) and the previous flagship model, GigaChat Max 2.

GigaChat Ultra-Preview is being released under an open-source license. This will allow the model to be fine-tuned locally, for example, in secure corporate environments where full control over private data, compliance with information security requirements, and maximum quality are critically important. Despite its large size, the model remains sufficiently fast, outperforming GigaChat 2 MAX in speed.
GigaChat Lightning, in contrast, is the most compact and fastest MoE model in the line, optimized for local deployment on a laptop and for supporting rapid product iterations.

In terms of quality, it competes with global open-source leaders in its category: it outperforms Qwen2.5-4B in Russian-language tasks and matches its capabilities in dialogue, document analysis, and solving practical business problems.

With GigaChat Ultra, we are publishing not only model weights but also accelerated inference technology. Lightning not only outpaces its competitors in its class, but it also operates almost as fast as Qwen2.5-1.5B, despite being six times larger.

Both models effectively integrate a system for using external tools, with two key capabilities standing out: code and memory.

• Code: A tool for executing, analyzing, and visualizing programming operations. It allows for running code snippets, building graphs, performing calculations, and testing hypotheses in real-time.

• Memory: A system for personalized communication that remembers important details: goals, preferences, and discussion history. The models can provide users with personalized advice and adjust information throughout the dialogue. At the same time, outdated or sensitive information is deleted, and the user can manually adjust the models’ memory.

GigaAM-3

GigaAM-v3 is an open-source suite of five models for automatic Russian speech recognition (ASR), available for industrial application and commercial use. GigaAM-v3 is designed for voice assistants, contact centers, call analytics, voice message aggregators, and multimodal agents.

In the new version of the GigaAM acoustic models, the pre-training scale has been increased from 50,000 to 700,000 hours of audio. New domains were added to the training data: call centers, music queries, speech with peculiarities, and spontaneous speech, which significantly improved performance in these scenarios.

The unique, foundational GigaAM-v3 model can serve as the basis for any speech technology. At Sber, it is already used for speech recognition, speech synthesis, and enables GigaChat to process video and audio.

Kandinsky 5.0

The Kandinsky 5.0 line includes the Image Lite model, capable of text-to-image generation and editing, as well as two video generation models: the fast Video Lite and the powerful Video Pro. These can create videos from text descriptions and “bring images to life.”

The universal Kandinsky 5.0 Image Lite model operates at HD resolution, has a strong understanding of the Russian cultural code, natively comprehends prompts in both Russian and English, and generates text in both Latin and Cyrillic scripts. The Kandinsky 5.0 Video Pro model generates up to 10 seconds of HD video at 24 fps. It is the best open-source model available, outperforming Wan 2.2 A14B and achieving visual quality on par with Veo 3, one of the world’s most powerful proprietary models. To lower the barrier to entry for integration into applied projects, the Kandinsky 5.0 Video Lite model is optimized to run on consumer-grade video cards with 12 GB of VRAM or more.

The training of Kandinsky 5.0 was conducted on nearly a billion images and 300 million videos. To adapt it to the domestic cultural context, the developers used an additional million media materials. Working with such data volumes required the application of advanced approaches, some of which were created specifically for this project. The final stages of training utilized a super-high-quality dataset curated by a large team of designers and artists. Experts selected materials with impeccable composition, style, and visual quality.

The Kandinsky models unlock opportunities for creating a wide spectrum of products—from services for personal creativity to professional tools for industry. Based on these open-sourced neural networks, developers and companies can build solutions that allow users to easily generate personalized video greetings, animate photographs, or create original visual stories. For professional directors, designers, marketers, animators, products built on Kandinsky 5.0 will become powerful tools for producing promotional materials, content, and visual projects for commercial scenarios. All of this will contribute to the development of an open ecosystem around Russian generative technologies.

K-VAE 1.0

Generative models like Kandinsky 5.0 synthesize media content in a “latent” space, which is unreadable to the human eye. This is essential for more efficient, faster, and less memory-intensive training and application of such models. Sber is releasing its own, from-scratch trained autoencoders, K-VAE 1.0, for images (2D) and video (3D), which convert media into these latent representations and back again.
The K-VAE 1.0 models are the best in the world among their open-source counterparts. Making them publicly available will elevate generative AI technology to a new level of quality.