Tracel AI Raised $3M in Funding

Flame digital art generated by stable diffusion.

Last week, we unveiled Burn-LM: a training and inference engine built for large models like LLMs. Just days before, we released a multiplatform matrix multiplication engine that brings state-of-the-art performance to all hardware.

And today, we're excited to announce that Tracel AI, the company behind the Burn deep learning framework, has raised $3M USD, led by Golden Ventures and OSS Capital, to advance its mission of democratizing and accelerating AI development.

The new funding will help optimize Burn and CubeCL, solidifying the multiplatform high-performance computing (HPC) ecosystem benefitting everyone. It will also support the development of a new cloud platform to enable Burn users to train and deploy the best open source models, alongside their own creations, efficiently.

As Nick Chen, Partner at Golden Ventures, puts it,

“At a time when compute is the limiting factor for AI innovation, Tracel's relentless pursuit of performance and flexibility sets them apart from the competition. Burn has already delivered results that frankly exceed what anyone thought possible - from exceeding industry benchmarks to ubiquitously enabling high performance AI to run across every major hardware platform. Their trajectory is proof that a small, determined team can shape the future. Tracel's mission to frictionlessly transform compute into intelligence will redefine how the world builds, deploys, and benefits from AI.”

The Rise of a Compute Economy

The AI revolution began in 2012 with AlexNet[1], where researchers implemented neural networks on two GPUs, outperforming all other approaches at the time. This breakthrough demonstrated the power of combining deep neural networks, large datasets, and GPU compute.

Since then, scaling compute has been directly linked to model performance. The scaling laws[2] observed with GPT-3 and later refined in DeepMind's Chinchilla paper[3] confirmed this correlation, showing that larger models trained on more data achieve superior results. These discoveries paved the way for large-scale production models like ChatGPT, Grok, and Claude.

Compute has thus become the most critical resource in AI development. Nvidia, a pioneer in optimizing general-purpose GPU (GPGPU) computing, has capitalized on this demand, reaching a market capitalization approaching $5 trillion as of mid-2025.

While Nvidia's contributions to AI are undeniable, reliance on a single hardware vendor naturally limits competition and innovation. Other AI chips are capable of being optimized to deliver strong performance rivaling Nvidia's GPUs, but these often lack adequate software support to compete effectively. These alternatives are also less adaptable, struggling to keep up with new research. As a result, most advanced AI breakthroughs continue to rely on Nvidia hardware, leaving competitors constantly playing catch-up.

When one company controls most of the specialized chips that AI requires, they can charge a premium. And they do, with 70% profit margins[4] on hardware that costs $25,000 per unit[5]. Unless you're part of a well-funded, prestigious research lab or big tech, you typically don't have access to enough compute. It's not just researchers who suffer from this. The entire industry feels the impact, with most AI startups losing money to large LLM providers like OpenAI and Anthropic. Even those struggle to make a profit because most revenue is consumed by the underlying infrastructure.

The Impact of a Proper Abstraction Layer

Looking back at the history of computing, the invention of high-level programming languages like C and C++ was critical to building the software infrastructure still used to this day. Operating systems and browsers were crucial in democratizing computing by abstracting hardware complexities, facilitating the distribution and development of new products and services. These advancements enabled developers to iterate more quickly, sparking an explosion of economic growth.

To democratize AI, we must learn from the past and create effective abstractions between software and hardware. This will reduce the costs of AI research and development, empowering more people to contribute new methods quickly, thereby realizing the promise of AI sooner. Companies should be able to train and deploy their own models, whether in the cloud or on the edge (e.g. robotics, embedded IoT, mobile) to solve large-scale problems beyond narrow chatbot and agentic use cases, sparking an explosion of economic growth.

Software Should Stay Soft, Hardware Should Stay Hard

The key to advancing AI is optimizing how new research leverages compute to enhance model intelligence. Researchers contribute new methods by accelerating training convergence, increasing data sampling efficiency, or improving hardware utilization. They don't start from scratch but instead build on the existing software infrastructure designed for Nvidia GPUs. This approach is necessary because it is difficult to prove the efficiency of a new method without a well-optimized implementation.

Looking back at the history of software development, initial prototypes were often deployed directly to production. If the prototype only works on Nvidia's hardware, you're forced to deploy it on the same hardware for production. How to not stay tied to one type of chip forever? The solution is a flexible, efficient, and portable deep learning framework that enables researchers to test ideas at speed and scale without needing a specific chip.

Divide and Conquer: The Core of Deep Learning Optimization

Is it too good to be true? No! Achieving a flexible, optimal and portable solution is possible because deep learning optimization mostly relies on a divide-and-conquer approach. Large neural networks are split into tensors, tiled to leverage specific hardware instructions and memory layouts. Workloads are distributed strategically to minimize data movement and hide I/O latency. All of those strategies can be derived using generic algorithms, parameterized by hardware specifications, and automated by compilers and runtimes, freeing users from hardware-specific constraints. This is the foundation of our work at Tracel.

AI Deserves Better Software

AI is no longer confined to university labs. It's making its way into robots, consuming increasing amounts of energy in data centers, and will always be evolving. Given its potential to reshape society, we can't afford to overlook the software infrastructure that powers it.

Joseph Jacks, General Partner at OSS Capital, was the first to believe in our vision, writing:

“As AI continues to take over the world, the toolchain for designing and scaling neural networks has failed to keep pace with the rate of model improvement. Burn and CubeCL have already demonstrated orders of magnitude improvements in compute efficiency at compile and inference times. With such a small team, the rate of progress demonstrated by Tracel is spectacular and we are privileged to support their mission to massively upgrade the SOTA in AI development.”

We're Hiring

We're hiring exceptional developers in compilation, machine learning and web to help us transform compute into intelligence. See our open positions and apply here.

References

[1]ImageNet Classification with Deep Convolutional Neural Networks
[2]Scaling Laws for Neural Language Models
[3]Training Compute-Optimal Large Language Models
[4]NVIDIA Announces Financial Results for Fourth Quarter and Fiscal 2025
[5]NVIDIA H100 Price Guide 2025: Detailed Costs, Comparisons & Expert Insights
[6]Can AI Scaling Continue Through 2030

Join the mailing list

Join our community! We'd love to keep you in the loop with our newsletter.

unsubscribed

Copyright 2025 © Burn | Tracel Inc. All rights reserved. Design by Perdomo