Skip to content

English

Acoustic Models

Model SR (kHz) Tokenizer Dataset
FastSpeech2 EN v2 22.05 Character-based Azure
FastSpeech2 EN v3 44.10 g2p_en (ARPA) Azure
FastSpeech2 MFA EN v2 22.05 g2p_en (ARPA) Azure
FastSpeech2 MFA EN v3 44.10 gruut (IPA) Azure
FastSpeech2 MFA EN v4 44.10 gruut (IPA) Azure (Mastered)
FastSpeech2 MFA EN ESD Angry 44.10 gruut (IPA) Emotional Speech Dataset - Angry
LightSpeech MFA EN 44.10 gruut (IPA) Azure (Mastered)
LightSpeech MFA EN v2 44.10 gruut (IPA) Azure (Mastered)
LightSpeech MFA EN v3 44.10 gruut (IPA) Azure (Mastered)
LightSpeech MFA EN ESD 44.10 gruut (IPA) Emotional Speech Dataset - 0013

Vocoder Models

Model SR (kHz) Dataset
MB-MelGAN EN 22.05 Azure
MB-MelGAN HiFi EN 22.05 Azure
MB-MelGAN HiFi PostNets EN 22.05 Azure
MB-MelGAN HiFi PostNets EN v2 22.05 Azure
MB-MelGAN HiFi PostNets EN v3 44.10 Azure
MB-MelGAN HiFi PostNets EN v5 44.10 Azure
MB-MelGAN HiFi PostNets EN v6 44.10 Azure (Mastered)
MB-MelGAN HiFi PostNets EN v7 44.10 Azure (Mastered)
MB-MelGAN HiFi PostNets EN v8 44.10 Azure (Mastered)
MB-MelGAN HiFi PostNets EN ESD Angry 44.10 Emotional Speech Dataset - Angry
MB-MelGAN HiFi PostNets EN ESD 44.10 Emotional Speech Dataset - 0013