Tenstorrent’s Galaxy Blackhole AI servers escape the event horizon
Tenstorrent has launched its Galaxy Blackhole AI compute platform, a RISC-V-based system with 32 Blackhole accelerators per 6U chassis, offering 23 petaFLOPS of FP8 performance at a price of $110,000. The systems feature high-bandwidth memory and a scalable mesh network, enabling clustering up to 32 nodes for larger AI workloads. Performance claims include sub-four-second processing of 100,000-token prompts on a four-node cluster and real-time 720p video generation. The software stack has improved significantly since earlier hardware evaluations, with broader model support and optimized performance.
- ▪Each Galaxy Blackhole system integrates 32 Blackhole accelerators, 1 TB of GDDR6 memory, and delivers 23 petaFLOPS of FP8 performance in a 6U form factor priced at $110,000.
- ▪The accelerators are connected via a 100 Tbps Ethernet mesh, allowing scalability across multiple nodes for large language models and high-throughput AI tasks.
- ▪A four-node Galaxy Supercluster can process a 100,000-token prompt in under four seconds and generate 720p video faster than real time.
- ▪Tenstorrent claims 90% of Hugging Face models run on its platform, supported by a Python-based interface for kernel optimization.
- ▪The hardware is available through providers like Cirrascale, Equinix, and ai&, with further details expected at the TT-Deploy event on May 1.
Full article excerpt tap to expand
AI + ML 2 Tenstorrent’s Galaxy Blackhole AI servers escape the event horizon 2 RISC-V-based systems pack 32 Blackhole accelerators in a 6U, $110K chassis Tobias Mann Tue 28 Apr 2026 // 13:00 UTC Tenstorrent on Tuesday announced the general availability of its Galaxy Blackhole AI compute platform. Each of the startup's 6U systems is packed with 32 of the Blackhole accelerators we looked at last fall. The chips are interconnected in a dense Ethernet mesh by 100 Tbps of aggregate bandwidth. Combined, Tenstorrent says each Galaxy system features 1 TB of GDDR6, 16 TB/s of memory bandwidth, and 23 petaFLOPS of dense FP8 performance, all in a system that'll set you back only $110,000. <a href="https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0" target="_blank"> <img src="https://pubads.g.doubleclick.net/gampad/ad?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=2&c=2afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D2%26raptor%3Dcondor%26pos%3Dtop%26test%3D0" alt=""> </a> To put that in perspective, Nvidia's eight-way DGX boxes, while faster and higher capacity, will set you back somewhere between three and five times that. <a href="https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D4%26raptor%3Dfalcon%26pos%3Dmid%26test%3D0" target="_blank"> <img src="https://pubads.g.doubleclick.net/gampad/ad?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=4&c=44afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D426raptor%3Dfalcon%26pos%3Dmid%26test%3D0" alt=""> </a> <a href="https://pubads.g.doubleclick.net/gampad/jump?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0" target="_blank"> <img src="https://pubads.g.doubleclick.net/gampad/ad?co=1&iu=/6978/reg_software/aiml&sz=300x50%7C300x100%7C300x250%7C300x251%7C300x252%7C300x600%7C300x601&tile=3&c=33afDMuR6nwSkgaCfOevSOawAAAI0&t=ct%3Dns%26unitnum%3D3%26raptor%3Deagle%26pos%3Dmid%26test%3D0" alt=""> </a> However, Tenstorrent's mesh network isn't limited to a single node. Much like Google's TPU or Amazon's Trainium2 clusters, it can be extended to support larger models, higher throughput, or more interactive user experiences by adding more systems and adjusting the ratio of tensor and pipeline parallelism. Tenstorrent's base Galaxy Supercluster will set you back $440,000 and features four Blackhole systems, but the architecture can support up to 32 nodes with more than a thousand chips. Curious about Tenstorrent's Blackhole chips? Check out our hands-on review here. Jasmina Vasiljevic, senior fellow at Tenstorrent, tells us the software stack has improved considerably since we first went hands-on with the hardware. At the time, model support was quite limited and what did run hadn't been optimized for the hardware yet. This mismatch resulted in generally poor performance…
This excerpt is published under fair use for community discussion. Read the full article at The Register.