New NPU technology boosts AI performance

2025-07-09
2 min read.
A breakthrough in low-power, high-efficiency neural processing for generative AI clouds.
New NPU technology boosts AI performance
Credit: Tesfu Assefa

Generative artificial intelligence (AI) models like ChatGPT-4 and Gemini 2.5 need a lot of memory and fast processing to work well. These models help create text, images, and other content by learning patterns from data. To run these models, companies like Microsoft and Google buy many NVIDIA GPUs. However, GPUs use a lot of energy and require large memory systems, making them expensive to operate. Researchers have developed NPU, or Neural Processing Units. An NPU is a special chip designed to process AI tasks quickly and efficiently. This new NPU improves AI performance by over 60% while using about 44% less power compared to the latest GPUs.

A step forward in AI infrastructure

The NPU was developed by researchers from KAIST and HyperAccel Inc. The goal was to make AI systems faster and less costly by improving how they process data. The researchers made the process lighter and tackled memory bottlenecks. By designing both the chip and the software together, they created a system that works better for large AI setups. Instead of needing many GPUs, their NPU can do the same job with fewer chips, thanks to a technique called KV cache quantization. This method shrinks the size of temporary data storage, called the KV cache, used during AI tasks. Smaller data sizes mean less memory is needed, which cuts costs.

The NPU integrates with existing memory systems easily and uses a method called page-level memory management to make better use of limited memory space. This approach organizes memory like a computer’s CPU does, ensuring smooth data access. The researchers also added a new way to encode data, making the system even more efficient. Compared to GPU-based systems, this NPU-based setup is cheaper to run and uses less power, which could lower the cost of AI services. The technology shows promise not only for cloud-based AI but also for new kinds of AI that act more independently, like Agentic AI. This research marks a step toward building smarter, more efficient AI systems for the future.

#ComputingPlatforms

#LargeLanguageModels(LLMs)



Related Articles


Comments on this article

Before posting or replying to a comment, please review it carefully to avoid any errors. Reason: you are not able to edit or delete your comment on Mindplex, because every interaction is tied to our reputation system. Thanks!

Mindplex

Mindplex is an AI company, a decentralized media platform, a global brain experiment, and a community dedicated to the rapidly unfolding future. Our platform empowers our community to share and discuss futurist content while showcasing AI and blockchain tools that enhance the media experience. Join us and shape the future of digital media!

ABOUT US

FAQ

CONTACT

Editors

© 2025 MindPlex. All rights reserved