Arm has unveiled its Lumex Compute Subsystem (CSS) platform, aiming to bring more advanced artificial intelligence capabilities directly to consumer devices like smartphones and personal computers. The platform features a new suite of SME2-enabled Arm C1 CPUs and Mali-G1 GPUs, promising significant boosts in on-device AI performance, reduced latency, and improved efficiency for tasks ranging from voice assistants to generative AI applications.
The core of the Lumex platform lies in its SME2-enabled Armv9.3 CPU cluster, with specific designs like the C1-Ultra for peak performance, C1-Premium for area efficiency in sub-flagship devices, and C1-Pro for sustained efficiency. Alongside these CPUs, the new Mali G1-Ultra GPU is introduced, designed for enhanced graphics and AI performance, including advancements in ray tracing. Arm claims its SME2-enabled CPUs can deliver up to 5x faster AI performance and a 4.7x reduction in latency for speech-based workloads, alongside a 2.8x improvement in audio generation.
Read More: Anthropic AI valuation could reach $1.5 trillion by late 2026
This strategic push by Arm emphasizes moving AI processing from the cloud to the "edge"—closer to the data source. Developers are expected to find easier integration through the KleidiAI libraries, now incorporated into major mobile operating systems and AI frameworks such as PyTorch ExecuTorch, Google LiteRT, Alibaba MNN, and Microsoft ONNX Runtime.
Core Components and Performance Claims
The Lumex platform introduces several key components designed for the AI era:
Next-generation SME2-enabled Armv9.3 CPU cluster:
C1-Ultra: Aimed at flagship devices, offering enhanced single-thread performance for large-model inference, computational photography, and generative AI.
C1-Premium: Provides similar performance to C1-Ultra but with greater area efficiency, targeting the sub-flagship market for applications like voice assistants.
C1-Pro: Focuses on sustained efficiency, ideal for video playback and streaming inference.
C1-Nano: Built for maximum efficiency and small form factors, suitable for wearables.
Mali G1-Ultra GPU: A premium GPU based on the Mali Valhall architecture, designed for advanced graphics, gaming, and accelerated AI inference. It scales from 10 to 24 cores and offers a claimed 20% faster AI inference performance and a 20% improvement across graphics benchmarks compared to previous generations.
DynamIQ Shared Unit (DSU) C1-DSU: Designed for flexibility and power awareness, optimized for 3nm process nodes.
Arm indicates that the Lumex platform achieves what it calls "unprecedented six years of double-digit IPC performance gains" for flagship devices.
Shifting AI Landscape
The announcement arrives amid a broader industry trend toward edge AI. Companies are developing specialized hardware like ASICs, NPUs, and enhanced GPUs to handle AI tasks efficiently on edge devices. Innovations like NVIDIA's Jetson AGX Orin, Hailo-8, and SiMa.ai MLSoC are already prominent in the edge AI hardware market, each offering different balances of performance, power efficiency, and cost.
Read More: OpenClaw Adds Grok AI With OAuth or API Key Login
The distinction between different AI accelerators—NPUs, GPUs, and TPUs—is becoming increasingly important for edge applications. While GPUs are often lauded for their flexibility, specialized processors like NPUs are frequently highlighted for their efficiency and cost-effectiveness in specific AI inference tasks. The Lumex platform's emphasis on CPU-driven AI acceleration through SME2 represents a particular approach within this evolving hardware landscape.