Inside OpenAI's record-breaking 9-month chip that marks the dawn of self-optimizing hardware.
Semiconductor design traditionally takes years of meticulous planning. Yet, OpenAI and Broadcom shattered this timeline, taking their new Jalapeño chip from concept to tape-out in only nine months. By using AI to design the very silicon meant to run it, they bypassed traditional human bottlenecks.
This is the dawn of self-optimizing hardware. OpenAI leveraged its own prior-generation LLMs to automate and accelerate complex chip layout and optimization tasks. This feedback loop allows algorithms to design their own physical architectures, accelerating the evolution of compute.
The shift to custom silicon was driven by stark economic realities. In FY2025, OpenAI generated 13.07 billion dollars in revenue but lost a staggering 20.92 billion dollars. A massive portion of these losses went directly to renting expensive, general-purpose GPUs for model inference.
General-purpose GPUs are built for both heavy training and everyday inference, making them highly versatile but incredibly expensive. In a standard H100 cluster, hardware depreciation accounts for 71 percent of the total cost. Stripping out training logic allows custom ASICs like Jalapeño to focus purely on efficient execution.
Jalapeño is a physical marvel. Manufactured on TSMC's advanced 3nm process, the computational die measures roughly 840 square millimeters. This massive size approaches the physical limit of extreme ultraviolet lithography, packing unprecedented inference power onto a single slice of silicon.
Raw compute power is useless if data cannot reach it fast enough. To solve the notorious memory bottleneck, Jalapeño flanks its central logic tile with high-bandwidth memory stacks. This design secures ultra-high-speed bandwidth directly on the package, keeping the processor constantly fed.
AI chips do not work in isolation; they must talk to each other to process massive models. Jalapeño integrates Broadcom's Tomahawk networking silicon. This ensures ultra-low-latency communication across server racks, allowing seamless model parallelism.
In the lab, early engineering samples are already active. OpenAI is testing Jalapeño with its GPT-5.3-Codex-Spark model—an ultra-fast, text-only coding assistant. The chip successfully runs this model at a blistering speed of 1,000 to 1,200 tokens per second.
What does this mean for the industry? Broadcom CEO Hock Tan estimates that Jalapeño can deliver a 50 percent reduction in inference costs per token compared to current state-of-the-art GPUs. For developers, this means faster, significantly cheaper API access.
Production is scaling fast. With partner Celestica handling system integration, OpenAI plans to deploy these processors at gigawatt scale starting in late 2026. Alongside Microsoft, they are preparing infrastructure to power the next generation of frontier AI.
Jalapeño is just the first step in a multi-generation roadmap spanning 3nm and 2nm nodes. As AI models grow smarter, they will design even more efficient chips to run their successors. We have officially entered the era of self-evolving silicon.
Discover more curated stories