• Home
  • Podcast
  • For Beginners
    • Beginner’s Guide
    • Cryptocurrency Scam
  • Latest Current Affairs
    • Product Technologies
    • Applications
    • Policies
    • Opinions
    • Events
    • CBDC
  • Featured Topics
  • Investment Finance
    • Bitcoin
    • Ethereum
    • Trading Market
    • NFT
    • DeFi
    • GameFi
    • CeFi
  • All Posts
Hot News

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Aug. 26, 2025

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

Aug. 20, 2025

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Aug. 20, 2025
Facebook X (Twitter) Instagram
DecentronistDecentronist
  • Home
  • Podcast
  • For Beginners
    • Beginner’s Guide
    • Cryptocurrency Scam
  • Latest Current Affairs
    • Product Technologies
    • Applications
    • Policies
    • Opinions
    • Events
    • CBDC
  • Featured Topics
  • Investment Finance
    • Bitcoin
    • Ethereum
    • Trading Market
    • NFT
    • DeFi
    • GameFi
    • CeFi
  • All Posts
Facebook X (Twitter) Instagram
DecentronistDecentronist
Home » NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development
Featured Topics

NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development

Aug. 16, 2025No Comments4 Mins Read
Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development
NVIDIA Launches Granary Open-Source Speech Database and AI Multilingual Training Models to Accelerate Translation and Speech Dialogue Development
Share
Facebook Twitter LinkedIn Pinterest Email

Granary: Expanding Language Coverage in AI Voice Translation

Globally, there are over 7,000 languages, yet current mainstream AI voice translation technologies support only a small fraction of these. To enhance the recognition of minority languages, NVIDIA has launched the Granary multilingual audio data repository, covering 25 European languages and rare languages. Alongside this, two new AI models, “Canary-1b-v2” and “Parakeet-tdt-0.6b-v3,” have been introduced to provide development teams with more accurate and efficient solutions for speech recognition and translation.

Granary Covers Rare Language Translation

The Granary voice database is the result of collaboration between NVIDIA, Carnegie Mellon University, and the Bruno Kessler Foundation. To address the challenges faced in AI development for rare languages, the research team utilized NVIDIA NeMo’s speech data processing tools to convert vast amounts of unlabeled public audio data into structured, high-quality training samples, enabling effective learning for AI models without a significant amount of manual labeling.

Granary encompasses approximately 650,000 hours of speech recognition files and over 350,000 hours of speech translation data, covering 25 European languages, including relatively underrepresented languages such as Estonian, Croatian, and Maltese, as well as support for Russian and Ukrainian. This allows developers to train ASR (Automatic Speech Recognition) and AST (Automatic Speech Translation) models for most official EU languages more rapidly and efficiently, further enhancing the diversity and inclusivity of language AI.

Research Findings on Granary’s Efficiency

Research reports indicate that, compared to other popular databases, Granary requires only half the training data to achieve similar recognition and translation accuracy, making it particularly suitable for development efforts focused on underrepresented languages. The Granary dataset has been published as open-source on GitHub and will present related research findings at the Interspeech conference on speech technology in the Netherlands from August 17 to 21.

Canary-1b-v2: High-Precision Multilingual Speech Translation

To demonstrate the application potential of Granary, NVIDIA has introduced two speech models, with Canary-1b-v2 featuring a one-billion-parameter architecture designed for high-accuracy speech transcription and translation tasks. This model ranks highly on Hugging Face’s multilingual speech recognition leaderboard, supporting speech transcription in 25 languages and English translations, achieving speech processing quality comparable to models three times its size, while boasting tenfold faster inference speeds.

Parakeet-tdt-0.6b-v3: High-Throughput Real-Time Speech Model

The Parakeet-tdt-0.6b-v3 model emphasizes high speed and throughput capabilities, featuring a streamlined architecture with 600 million parameters that can handle audio lengths of up to 24 minutes in a single inference. It automatically detects the input language for transcription without additional prompt settings. Its performance is also leading on Hugging Face, making it particularly suitable for applications requiring low latency and real-time responses.

AI Evolution in Speech Translation and Subtitling

Both models, Canary-1b-v2 and Parakeet-tdt-0.6b-v3, provide complete automatic punctuation, tagging, and timestamp capitalization features, along with word-level timestamps, making them applicable for subtitle generation, multilingual customer service, speech translation, and virtual assistant scenarios. Developers can fine-tune or retrain the models based on application needs, extending their capabilities to other languages and domains.

NVIDIA NeMo Platform Accelerating Speech Translation Development

The innovation in speech translation is driven by NVIDIA’s modular AI development platform, NeMo, designed for the lifecycle management of AI models. The NeMo Curator tool aids in selecting suitable samples from source data, ensuring the quality and consistency of model training data, while the NeMo speech data processor converts speech data into formats required by the models, including speech alignment and data cleaning.

Promoting Accessibility and Linguistic Diversity in AI

Through the open-source Granary and speech models, along with the underlying data processing and model construction methods, NVIDIA’s new technology aims to accelerate the pace of global speech AI development, particularly in establishing more inclusive technological infrastructures in regions where translation resources are scarce. The simultaneous release of Granary, Canary, and Parakeet not only broadens the linguistic boundaries of speech AI but also provides a solid foundation for creating global, multilingual AI dialogue and translation systems.

Data Repository and Model Availability

The database and models are now available for download. For access to the datasets and models, please visit GitHub and Hugging Face platforms to explore how these resources can propel the future of speech technology.

Risk Warning

Investing in cryptocurrencies carries high risks, and their prices may fluctuate dramatically, potentially resulting in the loss of your entire principal. Please carefully assess the risks involved.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Aug. 26, 2025

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

Aug. 20, 2025

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Aug. 20, 2025

Trump Under Investigation? SEC Reportedly Probes ALT5 Sigma President of WLFI Reserve Company for Suspected Insider Trading

Aug. 20, 2025

Wyoming Issues the Nation’s First Official State Stablecoin, FRNT, Integrating Seamlessly with Visa and Apple Pay

Aug. 20, 2025

Is an AI Bubble on the Horizon? OpenAI CEO Altman Warns: Startup Valuations are Overinflated

Aug. 19, 2025

Leave A Reply Cancel Reply

Top Posts

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Aug. 3, 2021

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

Sep. 25, 2021

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

Oct. 26, 2022
Don't Miss

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Aug. 26, 2025

Bridging humor, token culture, and Elon Musk’s vision—MacroHard is now trackable on the world’s leading…

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

Aug. 20, 2025

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Aug. 20, 2025

Trump Under Investigation? SEC Reportedly Probes ALT5 Sigma President of WLFI Reserve Company for Suspected Insider Trading

Aug. 20, 2025
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews
Popular

Decoding Cryptography: It’s Actually Easier to Grasp Than You Think!

Aug. 3, 2021

Insider’s Guide to CoinMarketCap: What Veteran Cryptocurrency Enthusiasts Don’t Know

Sep. 25, 2021

NFT Unveiled: A Comprehensive Guide to 6 Prominent Categories of NFTs

Oct. 26, 2022
Our selection

MacroHard Coin Hits CoinMarketCap: Meme-Driven Token Goes Official

Aug. 26, 2025

Canva Initiates Employee Stock Sale Amid Valuation Surge Exceeding $42 Billion, IPO Speculations Resurface

Aug. 20, 2025

WebX2025: The Largest Web3 Event in Asia Set to Take Place in Tokyo! Exclusive 20% Discount Code from Chain News

Aug. 20, 2025
Copyright © 2025 Decentronist. All Rights Reserved.
  • Home
  • Podcast
  • For Beginners
    • Beginner’s Guide
    • Cryptocurrency Scam
  • Latest Current Affairs
    • Product Technologies
    • Applications
    • Policies
    • Opinions
    • Events
    • CBDC
  • Featured Topics
  • Investment Finance
    • Bitcoin
    • Ethereum
    • Trading Market
    • NFT
    • DeFi
    • GameFi
    • CeFi
  • All Posts

Type above and press Enter to search. Press Esc to cancel.