
Google Cloud has unveiled its newest generation Tensor Processing Unit (TPU) artificial intelligence (AI) chip, dubbed Ironwood, that the company says is specially designed for the compute demands of AI inferencing.
Currently in its 7th generation, Google has long offered the TPU chips series to cloud customer for an array of AI applications. The Ironwood is a significant leap ahead of the previous generations. The company notes that while its previous TPU chip offered 350x total FP8 peak flops performance relative to TPU v2m (the first TPU chip available for public use), this spec leaps to 3600x for the Ironwood.
Due to be available later this year, Google will offer Ironwood in two configurations – a 256 chip cluster and a 9,216 chip cluster, which customers will select based on the depth and demands of their AI workloads.
Most impressive about the new chip is, when scaled to 9,216 chips per pod for a total of 42.5 Exaflops, Google claims Ironwood drives more than 24x the compute power of El Capitan, currently the world’s largest supercomputer, which itself offers an ultra-fast 1.7 Exaflops per pod. (This claim has been disputed by some observers, yet even if Ironwood supports nearly this much compute power, it’s a remarkably fast AI accelerator chip.)
The Ironwood semi includes an enhanced SparseCore, which is an accelerator designed to crunch the massive embeddings used in ranking and recommendation engines.
Even as it offers this robust compute power, the chip is energy-efficient. In a sign of how quickly semiconductor design is evolving, Google claims that the Ironwood is capable of twice the performance for the same level of energy consumption required by the Trillium chip, which the company debuted just last year.
While the earlier generations of TPUs were built to support both AI training and inference, the Ironwood is the first semi designed to support the speed and efficiency required of inferencing.
The need to support the inference process is growing as users deploy more and more generative AI models; every time someone queries models like ChatGPT, Claude or Google Gemini, the process of retrieving and shaping the answer from the underlying large language model (LLM) uses inferencing. While inferencing is less compute-intensive than AI model training, it’s the final, all-important step in the output of an AI model. Current-day inference has begun to incorporate a more advanced level of “reasoning,” and intelligent decision-making, increasing the need for specially-designed chips like Ironwood.
The other factor supporting the need for better chips to support inferencing is that generative AI is now powering a larger number of business and consumer applications. “Ironwood is built to support this next phase of generative AI and its tremendous computational and communication requirements,” said Amin Vahdat, Google’s Vice President and General Manager of ML, Systems, and Cloud AI, in an online media event ahead of Google Cloud Next ’25. “This is what we call the ‘age of inference’ where AI agents will proactively retrieve and generate data to collaboratively deliver insights and answers, not just data.”
While much of the generative AI sector runs on NVIDIA chips, Google trains and deploys its generative AI Gemini platform with chips it designs itself. That such a leading generative AI platform is powered by Google-designed chips could play an incremental role in challenging the supremacy of NVIDIA in the chip sector, or at the very least, bolster the profile of the new Ironwood chip.
Though Google designed the Ironwood, it obviously did not fabricate the chip itself. The company did not specify where the chip was built, but most likely Ironwood was fabricated in TSMC’s facilities, which has turned out many of the world’s fastest chips.