Task-specific computing is driving a hardware renaissance
In September 2013, Apple launched its iPhone 5S with a new kind of brain called the A7. The chip featured an innovative architecture integrating different types of processor on a single die. A die is a small semiconductor block housing a functional circuit, and where there had only been Central Processing Units (CPUs), there were now also Graphics Processing Units (GPUs), enabling real-time graphics for HDR photos and immersive games like Infinity Blade III. That move was a subtle yet consequential shift into a new approach called “heterogeneous computing.”
The latest shift is only the third in over five decades of microprocessor architecture. The Intel 4004 set the first standard in 1971 with a single-core CPU handling all computing requirements, functioning as a jack-of-all-trades. That worked for over three decades – until Dennard Scaling plateaued. Dennard Scaling relates to Moore’s law. Moore predicted that the number of transistors (tiny electronic switches) that fit onto a microchip would double every two years because transistors get smaller. Dennard added that smaller transistors use the same power in proportion to their area, so they can run faster without overheating. That means singe-core CPU performance improves just because of Moore’s law. It worked – but only until the mid-2000s.
Trying to maintain the performance trajectory, Intel broke the single-core paradigm in 2005 with the dual-core Pentium D. The chip has two CPUs on a single die, enabling parallel processing that effectively doubles computing power without relying on transistor density. That works for speed but not complexity – the real obstacle to better functionality. Addressing that problem requires not just more cores, but additional specialised processors for specific tasks, like the A7’s heterogeneous architecture with GPUs.
SPECIALIST PROCESSORS
The Apple A15 powering the latest iPhones has at least two specialised processors. Dual CPUs coordinate the iPhone’s operation, made faster by two L3 memory caches storing frequently accessed information. The GPU and Neural Processing Unit (NPU) are task-specific. Up to five GPUs work in parallel with the CPU to deliver optimal graphics for multimedia and games. The 16 NPUs accelerate machine learning and AI, including facial recognition and voice activation.
Others use specialised processors, too. Google’s Pixel 8 phone has a Tensor Processing Unit (TPU) for machine learning and AI, even letting users select facial expressions in photos.
Such processors aren’t just for phones. Saavan Patel, while completing his PhD at the University of California, Berkeley, worked at Meta Labs researching Vision Processing Units (VPUs). Those can “read faces and hand gestures” and are small enough to fit in regular glasses. Patel is now CTO of InfinityQ, a company solving impossible problems made possible by an Ising machine: a quantum-inspired processor specialised in NP-hard optimisation problems.
He explains that “Moore’s law made us lazy” during the 1980s and 90s. Developers could rely on the adage ‘if it doesn’t work now, just wait until next year’. Money flowed to shrinking transistors to keep pace with Moore’s prediction, not to researching alternative approaches. Yet now, heterogeneous computing is catching up and used in a growing variety of applications.
But manufacturing specialist dies comes with challenges. Every unique configuration needs a dedicated production line, driving up costs and time to market. Chiplets could change that. Chiplets are modular: each has a dedicated function, like GPU or memory, and can connect to become a system. That means Apple and Google can share common chiplets and add proprietary ones.
MODULAR PRODUCTION
Today chiplets are used in high-performance computing, data centres, and AI, but high costs keep them out of consumer electronics. That presents an opportunity for China. The Chinese government has been propping up the domestic chiplet industry because restrictions limit its ability to import or export advanced microchip technology. Single-function chiplets, however, evade regulation: they are components, not complete advanced microchips.
If China can capitalise on global shortages in GPU and AI chip production capabilities with its chiplets, it could significantly influence both emerging technologies. While intended to keep the domestic market on par with the rest of the world, China’s investment may yet yield dividends beyond its borders.
For more about new developments in tech and AI, click here.