NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development

Granary: Expanding Language Coverage in AI Voice Translation

Globally, there are over 7,000 languages, yet current mainstream AI voice translation technologies support only a small fraction of these. To enhance the recognition of minority languages, NVIDIA has launched the Granary multilingual audio data repository, covering 25 European languages and rare languages. Alongside this, two new AI models, “Canary-1b-v2” and “Parakeet-tdt-0.6b-v3,” have been introduced to provide development teams with more accurate and efficient solutions for speech recognition and translation.

Granary Covers Rare Language Translation

The Granary voice database is the result of collaboration between NVIDIA, Carnegie Mellon University, and the Bruno Kessler Foundation. To address the challenges faced in AI development for rare languages, the research team utilized NVIDIA NeMo’s speech data processing tools to convert vast amounts of unlabeled public audio data into structured, high-quality training samples, enabling effective learning for AI models without a significant amount of manual labeling.

Granary encompasses approximately 650,000 hours of speech recognition files and over 350,000 hours of speech translation data, covering 25 European languages, including relatively underrepresented languages such as Estonian, Croatian, and Maltese, as well as support for Russian and Ukrainian. This allows developers to train ASR (Automatic Speech Recognition) and AST (Automatic Speech Translation) models for most official EU languages more rapidly and efficiently, further enhancing the diversity and inclusivity of language AI.

Research Findings on Granary’s Efficiency

Research reports indicate that, compared to other popular databases, Granary requires only half the training data to achieve similar recognition and translation accuracy, making it particularly suitable for development efforts focused on underrepresented languages. The Granary dataset has been published as open-source on GitHub and will present related research findings at the Interspeech conference on speech technology in the Netherlands from August 17 to 21.

Canary-1b-v2: High-Precision Multilingual Speech Translation

To demonstrate the application potential of Granary, NVIDIA has introduced two speech models, with Canary-1b-v2 featuring a one-billion-parameter architecture designed for high-accuracy speech transcription and translation tasks. This model ranks highly on Hugging Face’s multilingual speech recognition leaderboard, supporting speech transcription in 25 languages and English translations, achieving speech processing quality comparable to models three times its size, while boasting tenfold faster inference speeds.

Parakeet-tdt-0.6b-v3: High-Throughput Real-Time Speech Model

The Parakeet-tdt-0.6b-v3 model emphasizes high speed and throughput capabilities, featuring a streamlined architecture with 600 million parameters that can handle audio lengths of up to 24 minutes in a single inference. It automatically detects the input language for transcription without additional prompt settings. Its performance is also leading on Hugging Face, making it particularly suitable for applications requiring low latency and real-time responses.

AI Evolution in Speech Translation and Subtitling

Both models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3, provide complete automatic punctuation, tagging, and timestamp capitalization features, along with word-level timestamps, making them applicable for subtitle generation, multilingual customer service, speech translation, and virtual assistant scenarios. Developers can fine-tune or retrain the models based on application needs, extending their capabilities to other languages and domains.

NVIDIA NeMo Platform Accelerating Speech Translation Development

The innovation in speech translation is driven by NVIDIA’s modular AI development platform, NeMo, designed for the lifecycle management of AI models. The NeMo Curator tool aids in selecting suitable samples from source data, ensuring the quality and consistency of model training data, while the NeMo speech data processor converts speech data into formats required by the models, including speech alignment and data cleaning.

Promoting Accessibility and Linguistic Diversity in AI

Through the open-source Granary and speech models, along with the underlying data processing and model construction methods, NVIDIA’s new technology aims to accelerate the pace of global speech AI development, particularly in establishing more inclusive technological infrastructures in regions where translation resources are scarce. The simultaneous release of Granary, Canary, and Parakeet not only broadens the linguistic boundaries of speech AI but also provides a solid foundation for creating global, multilingual AI dialogue and translation systems.

Data Repository and Model Availability

The database and models are now available for download. For access to the datasets and models, please visit GitHub and Hugging Face platforms to explore how these resources can propel the future of speech technology.

Risk Warning

Investing in cryptocurrencies carries high risks, and their prices may fluctuate dramatically, potentially resulting in the loss of your entire principal. Please carefully assess the risks involved.

Hot News

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Trump Under Investigation? SEC Reportedly Probes ALT5 Sigma President of WLFI Reserve Company for Suspected Insider Trading

Wyoming Issues the Nation’s First Official State Stablecoin, FRNT, Integrating Seamlessly with Visa and Apple Pay

Is an AI Bubble on the Horizon? OpenAI CEO Altman Warns: Startup Valuations are Overinflated

Leave A Reply Cancel Reply

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Trump Under Investigation? SEC Reportedly Probes ALT5 Sigma President of WLFI Reserve Company for Suspected Insider Trading

Popular

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

Our selection

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Hot News

NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development

Granary: Expanding Language Coverage in AI Voice Translation

Granary Covers Rare Language Translation

Research Findings on Granary’s Efficiency

Canary-1b-v2: High-Precision Multilingual Speech Translation

Parakeet-tdt-0.6b-v3: High-Throughput Real-Time Speech Model

AI Evolution in Speech Translation and Subtitling

NVIDIA NeMo Platform Accelerating Speech Translation Development

Promoting Accessibility and Linguistic Diversity in AI

Data Repository and Model Availability

Risk Warning

Related Posts

Leave A Reply Cancel Reply