OMEGA Labs was founded with a bold vision: to create a decentralized AI research collective pushing the frontiers of Any-to-Any models through open-source innovation. By building on Bittensor’s blockchain network, OMEGA has rallied miners and validators worldwide to collaborate on multimodal AI systems that can handle text, audio, image, and video in a unified model. Our approach is not just about training the best models today, but creating an engine about continuously incentivizes the best models, absorbing every new opensource breakthrough and putting new advancements back into the open, Open AI if you will. Simple, really.
Every miner on SN21 contributes to a self-sustaining research lab where great ideas and compute power are rewarded in a permissionless, crypto-powered ecosystem. Our ultimate goal is to accelerate progress toward Artificial General Intelligence (AGI) by leveraging the diversity of data and talent only a decentralized network can offer.
Today, we are thrilled to announce a major breakthrough on Subnet 21, our Any-to-Any model subnet. The latest model – aizhibin/omega_titanic_0x03, submitted on January 31st – has achieved an unprecedented leap in performance on our internal benchmark (codenamed CHANT). This model’s CHANT score skyrocketed from the previous range of ~2.5–4% all the way up to 25%, a 10x improvement that marks a new record for OMEGA’s multimodal models. In practical terms, this means a huge gain in the model’s ability to reason over speech inputs. While plentiful strong opensource text reasoning models are abundant, strong speech reasoning models are rare and limited to frontier labs (such as GPT-4o Realtime and Gemini 2.0 Flash Live). For the first time, we’re seeing a voice-to-voice model on our subnet demonstrate a level of comprehension that we previously thought was months away. The significance of this jump can’t be overstated – it shows that the collective efforts of our community and the open-ended experimentation we encourage are paying off meaningfully.
Note the sharp increase in CHANT score on January 31st, just 4 days after the launch of Generalized Architecture Eval on January 27th.
The impact of this new model isn’t limited to our own metrics. On external benchmarks in the “Speech In, Text Out” category (where the model listens to spoken audio and generates written responses), we saw an impressive performance jump as well – from roughly 27% average accuracy for the previous ronan401/magic_baby model to about 32% for aizhibin/omega_titanic_0x03. In other words, our open-source model shows demonstrable gains in complex speech understanding and reasoning quality.
This result is remarkable for a community-driven model competing against industry and Big Tech systems. It validates our approach of combining multimodal training and decentralized collaboration. Keep in mind, OMEGA’s ethos has always been to outperform expectations – our previous multimodal release has even beaten state-of-the-art models like Alibaba’s Qwen-VL (7B) on challenging image understanding benchmarks. Now, with omega_titanic_0x03, **the same collaborative innovation can achieve state-of-the-art results in the open-source audio and speech domain as well. Each external evaluation provides a valuable reality-check, signalling that our CHANT incentive mechanism is incentivizing strong voice model improvements. This accomplishment reinforces something we have long known to be true: open, decentralized AI can compete at the highest level.
See the benchmark scores below:
Comparison of aizhibin/omega_titanic_0x03 against frontier closed-source voice-to-voice on standard academic benchmarks testing model intelligence