Submission Guidelines

1. TTS Track

  • The Goal of the TTS track is to develop high-quality TTS systems using in-the-wild speech data. The training data is from the TITW dataset, and the evaluation set will be TITW-KSKT and TITW-KSUT (following the instructions here, we will add more detail soon).
  • We will evaluate generated samples with a variety of metrics to comprehensively understand the performance, including MCD, UTMOS, DNSMOS, WER, SPK-sim, and spoofing likelihood scores
  • No framework or model restriction in the TTS-Acoustic+Vocoder challenge, we will also release the baseline training scripts and the pretrained baseline model.
  • Submission package details: The synthesized voice of TITW-KSKT and TITW-KSUT test set (with at least 16kHz, in zipped format).

2. SASV Track

  • The goal of the SASV track is to build robust SASV systems, utilizing source data from real-world conditions and spoofing attacks generated by TTS systems also trained on the same real-world data. This Spoofceleb dataset will be the training set and part of the test set for the SASV track. We will also use the synthesized speech from the TTS track as another test set.
  • We will evaluate the SASV performance using the architecture-agnostic detection cost function (a-DCF), a newly proposed metric for this task that can be applied to both end-to-end and traditional fusion-based SASV frameworks.
  • No framework or model restriction in the SASV track. We will also release the baseline training scripts and the pre-trained baseline model.
  • Submission package details: a text file where each line contains an SASV model detection score for an evaluation utterance (zipped format).