Tencent releases a versatile open source Hunyuan AI model

Date:

Tencent has expanded its family of open source Hunyuan AI models that are versatile enough for wide range of uses. This new family of models is designed to provide powerful performance in computing environments, from small edge devices to tough high current production systems.

This release includes a comprehensive set of pre-trained and instruction-tuned models available on the developer platform. The model comes in several sizes, in particular the parameter scales of 0.5b, 1.8b, 4b, and 7b, providing substantial flexibility for developers and businesses.

Tencent shows that these models can be developed using similar training strategies to the more powerful Hunyuan-A13B model and inherit performance characteristics. This approach allows users to choose the model that best suits their needs, whether it is a small variant of resource-constrained edge computing or a larger model of high-throughput production workloads, while ensuring powerful capabilities.

One of the most notable features of the Hunyuan series is native support for the extremely long 256K context windows. This allows the model to handle and maintain key features for stable performance of long text tasks, complex document analysis, extended conversations, and detailed content generation. This model supports what Tencent calls “hybrid reasoning.” This allows for both fast and slow thinking modes that allow users to choose according to their specific requirements.

The company also focuses on the capabilities of its agents. The model has been optimized for agent-based tasks and shows key results with established benchmarks such as BFCL-V3, τ benches, and C3 benches, suggesting high degree of proficiency in complex, multi-step problem solving. For example, on the C3 bench, the Hunyuan-7B-Instruct model achieves a score of 68.5, while the Hunyuan-4B-Instruct model achieves a score of 64.3.

The performance of the series focuses on efficient reasoning. Tencent’s Hunyuan model utilizes Grouped Query Attention (GQA), a technique known to improve processing speeds and reduce computational overhead. This efficiency is further enhanced by Advanced Quantation Support, a key component of the Hunyuan architecture, designed to lower deployment barriers.

Tencent has developed its own compression toolset, Angleslim, to create more user-friendly and effective model compression solutions. Using this tool, the company offers two major quantizations to its Hunyuan series.

The first is FP8 static quantum, which uses an 8-bit floating point format. This method uses small amounts of calibration data to predetermine quantization scales without the need for full retraining, and converts the model weights and activation values to FP8 format to increase inference efficiency.

The second method is int4 Quantation. This achieves quantization of W4A16 via GPTQ and AWQ algorithms.

  • gptq Approach process calibration data is used to model the model weights per layer to minimize errors in quantized weights. This process avoids the need for model re-training and increases the speed of inference.
  • awq The algorithm works by statistically analyzing the amplitude of the activation values from a small set of calibration data. Next, we calculate the scaling factor for each weight channel and extend the numerical range of important weights to preserve more information during the compression process.

Developers can use the Angleslim tool or download pre-quantified models directly.

Performance benchmarks check the powerful features of the Tencent Hunyuan model across a variety of tasks. For example, the pre-trained Hunyuan-7B model achieves a score of 79.82 on the MMLU benchmark, 88.25 on the GSM8K and 74.85 on the mathematics benchmark, showing solid inference and mathematical skills.

The instruction-tuned variants show impressive results in special fields. In mathematics, the Hunyuan-7B-Instruct model scores 81.1 on the AIME 2024 benchmark, while the 4B version scores 78.3. In science, the 7B model reaches 76.5 on Olympiadebench and 42 on LiveCodebench in coding.

Quantum benchmarks minimize performance degradation. In the drop benchmark, the Hunyuan-7B-Instruct model scores 85.9 in the base B16 format, 86.0 in the FP8 and 85.7 in the INT4 GPTQ.

For deployment, we recommend that you provide Hunyuan models using established frameworks such as Tensort-llm, Vllm, and Sglang, and create Openai-compatible API endpoints to ensure smooth integration into your existing development workflows. This combination of performance, efficiency and deployment flexibility positions the Hunyuan series as a continuous strong competitor for open source AI.

reference: Deep Cogito V2: Open Source AI to honour its reasoning skills

Want to learn more about AI and big data from industry leaders? Check out the AI & Big Data Expo in Amsterdam, California and London. The comprehensive event will be held in collaboration with other major events, including the Intelligent Automation Conference, Blockx, Digital Transformation Week, and Cyber Security & Cloud Expo.

Check out other upcoming Enterprise Technology events and webinars with TechForge here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Share post:

Subscribe

spot_imgspot_img

Popular

More like this
Related

Spike Lee documentation for Colin Kaepernick is more needed than ever

Colin Kaepernick's Spike Lee Docuseries dropped on ESPNESPN dropped...

Trump gave birth to Governor Lisa Cook. What that means.

Trump calls on Fed Governor Cook to resign from...

“Ted Lasso” actor Keely Hazel says, “Everyone saw my boobs.”

New York - The cafe is hidden on Mercer...